Parallel Processing

Parallel Processing is a way to improve performance on high-end machines.

What is Parallel Processing?

Each FME translation is usually a single process (fme.exe) on your computer. Parallel processing is when you transform your data as several simultaneous processes. The fact that they run simultaneously means the whole translation will run several times quicker than it used to.

Parallel processing allows FME to make use of multiple cores on a computer. There are four levels of parallel processing in FME, and each maps to the number of cores in this way:

ParameterProcessesQuad-Core Machine
No parallelism1 Process1 Process
MinimalCores / 22 Processes
ModerateCores4 Processes
AggressiveCores x 1.56 Processes
ExtremeCores x 28 Processes

So, as in the above example, on a quad-core machine, minimal parallelism will result in two simultaneous FME processes. Extreme parallelism would result in eight (assuming there are eight tasks that can be processed simultaneously).

There is also a hard cap for each license level:

FME EditionProcess CapQuad-Core Machine
Base Edition4 processesMaximum 4 processes
Professional Edition8 processesMaximum 8 processes
All Other Editions16 processesMaximum 8 processes

So, if you have a Base Edition license you are never going to get more than four processes at one time, regardless of machine type and the parallelism parameter. The quad-core machine in the above example can never have more than eight processes, since that is the maximum 'extreme' parallel processing allows.


Jake Speedie says…
Parallel Processing is very effective when you are offloading a task elsewhere – for example calling a Server with the HTTPFetcher – as each process is a tiny impact on the FME system resources. However, be aware, each parallel process involves starting and stopping an FME engine, and this takes time. So, don’t parallelize your processes when the task already takes less than the time to stop/start FME!

Transformers and Parallel Processing

A number of basic FME transformers have built in options for parallel processing. Parallel processes work on groups of features, so the transformer must be group-based and have a group-by parameter in order for the user to be able to define the parallel processing groups.

For example, this Bufferer transformer is set up to buffer a set of street features:

Each street (i.e. each feature with the same street name) will be processed as a separate group. To speed up the translation, each group is being handled as a separate process (sadly the user cannot confirm that the source data is already ordered by group, which would improve performance even more).

When a translation is run in parallel mode, then a number of “worker” processes appear in your process manager:

Parallel Processing Groups

Best performance gains are when you have a small number of groups with a large amount of data. When there are many groups with only a few features then any performance gain will not be great and, in fact, the whole process might even be slower. Disk access can be a big bottleneck there.

Because each group gets processed independently, there can be no relationship between features in different groups. If features are related, and their results dependent on each other, then they must be in the same group.

However, if all data is unrelated and the contents of the group are unimportant, then it’s possible to make artificial groups using a ModuloCounter or RandomNumberGenerator transformer.

For example, here the user has millions of lines to buffer (separately) and uses a ModuloCounter to assign them to one of four groups for parallel processing. Note the Group By parameter in the Bufferer is set to the _modulo_count attribute:


Jake Speedie says…
See this blog article for more information about - and some special techniques for - generating parallel processing groups.

results matching ""

    No results matching ""