Parallel Processing

Parallel Processing is a way to improve performance on high-end machines.

What is Parallel Processing?

Each FME translation is usually a single process (fme.exe) on your computer. Parallel processing is when you transform your data into several simultaneous processes. The fact that they run simultaneously means the whole translation can run several times quicker than it used to.

Parallel processing allows FME to make use of multiple cores on a computer. There are four levels of parallel processing in FME, and each maps to the number of cores in this way:

ParameterProcessesQuad-Core Machine
No parallelism1 Process1 Process
MinimalCores / 22 Processes
ModerateCores4 Processes
AggressiveCores x 1.56 Processes
ExtremeCores x 28 Processes

So, as in the above example, on a quad-core machine, minimal parallelism results in two simultaneous FME processes. Extreme parallelism would result in eight (assuming there are eight tasks that can be processed simultaneously).

There is also a hard cap for each license level:

FME EditionProcess CapQuad-Core Machine
Base Edition4 processesMaximum 4 processes
Professional Edition8 processesMaximum 8 processes
All Other Editions16 processesMaximum 8 processes

So, if you have a Base Edition license, you are never going to get more than four processes at one time, regardless of machine type and the parallelism parameter. The quad-core machine in the above example can never have more than eight processes since that is the maximum 'extreme' parallel processing allows.


Jake Speedie says…
Parallel Processing is very effective when you are offloading a task elsewhere – for example calling a Server with the HTTPFetcher – as each process is a tiny impact on the FME system resources. However, be aware, each parallel process involves starting and stopping an FME engine, and this takes time. So, don’t parallelize your processes when the task already takes less than the time to stop/start FME!

Transformers and Parallel Processing

A number of FME transformers have built-in options for parallel processing. Parallel processes work on groups of features, so the transformer must be group-based and have a group-by parameter for the user to define the parallel processing groups.

For example, this Bufferer transformer is set up to buffer a set of street features:

Each street (i.e., each feature with the same street name) is processed as a separate group. To speed up the translation, each group is handled as a separate process (sadly the user cannot confirm that the source data is already ordered by group, which could improve performance even more).

When a translation is run in parallel mode, then a number of “worker” processes appear in your process manager:

Parallel Processing Groups

Best performance gains occur with a small number of groups and a large amount of data. With many groups and only a few features, then performance gains will not be large and, in fact, the whole process might even be slower.

Because each group is processed independently, there can be no relationship between features in different groups. If features are related, and their results depend on each other, then they must be in the same group.

However, if all data is unrelated and the contents of the group are unimportant, then it’s possible to make artificial groups using a ModuloCounter or RandomNumberGenerator transformer.

For example, here the user has a large number of line features to buffer (separately) and uses a ModuloCounter to assign them to one of four groups for parallel processing. Note the Group By parameter in the Bufferer is set to the _modulo_count attribute:


Jake Speedie says…
See this blog article for more information about - and some special techniques for - generating parallel processing groups.

results matching ""

    No results matching ""