Transformer Selection
If you’ve used FME for any length of time, you’ll know that it’s possible to do almost any task in several different ways.
For example, if you were isolating all road features that pass through a park you could use either the Clipper with the Inside port connected, like so:
Or the LineOnAreaOverlayer, with a test for _overlaps => 1, like so:
The performance for the LineOnAreaOverlayer would be this:
Translation was SUCCESSFUL with 0 warning(s) (134 feature(s) output) FME Session Duration: 3 minutes 57.6 seconds. (CPU: 196.2s user, 37.6s system) END - ProcessID: 38036, peak process memory usage: 3129576 kB
While the Clipper (in Multiple Clippers mode) would be this:
Translation was SUCCESSFUL with 0 warning(s) (134 feature(s) output) FME Session Duration: 23.5 seconds. (CPU: 22.2s user, 0.8s system) END - ProcessID: 14236, peak process memory usage: 1471896 kB
And in Clippers First mode would be this:
Translation was SUCCESSFUL with 0 warning(s) (134 feature(s) output) FME Session Duration: 21.4 seconds. (CPU: 20.4s user, 0.6s system) END - ProcessID: 30123, peak process memory usage: 183344 kB
So the result is the same, but the performance vastly different.
Obviously in the above example the Clipper is the fastest (and be sure to note how the Clippers First mode has reduced memory use by nearly 90%).
But each transformer has different functionality, and if you wanted to output park features with a list of roads or a count of the roads passing through the park, then the LineOnAreaOverlayer would be the transformer of choice, because it has a specific list parameter.
Basically, each transformer works in a different way, has subtle variances in functionality, and will have different performance for any given task. Therefore a translation will benefit in performance if the author is careful in their choice of transformers, and maybe carries out some performance testing first.
Attributes and Transformation
As mentioned (in Reader Performance) reducing data helps performance because it saves FME from either holding it in memory or caching it to a disk.
However, this isn’t just helped by reducing the number of features; it is also helped by reducing the size of each individual feature.
One aspect of this is attributes. Carrying attributes through a translation impacts performance, so if the attributes are not required in the output, it’s best to remove them as early as possible in the translation.
For example, the incoming schema looks like this:
But the outgoing schema looks like this:
Since so many of the source attributes are not required in the output, it makes sense to remove them from the translation, and as early as possible. There are two ways to do this. Some reader formats (but not all) have a setting in the reader feature type to avoid reading excess attributes:
With that you can ensure that only attributes exposed are read. The other way to remove attributes is by using a transformer (AttributeManager, AttributeRemover, or AttributeKeeper) directly after the source feature type:
This ensures that none of the extra attributes become a drain on resources by being processed by any further transformers.
NEW |
The Attributes to Read parameter on reader feature types is new functionality for FME2017. |
Lists
One specific type of attribute to beware of is a List. A list in FME is an attribute that can have multiple values. Because of this it can be a big drain on resources.
For example, use a Joiner to join a feature to 1,000 records and the list for that feature will have 1,000 sets of records. This is bad enough, but if the list is exploded and all of the original attributes kept, then there will be 1,000 features each with 1,000 sets of attributes!
In general, beware of creating lists unnecessarily and of keeping them in a workspace beyond the point at which they are still of use.
Geometry and Transformation
Like attributes, geometry can be removed from a feature, in this case using the GeometryRemover transformer.
Many FME users create translations that handle tabular – non-spatial – data. If you are reading a spatial dataset, then writing it to a tabular format, be sure to remove the geometry early in the workspace, just as you would an attribute.
Another particular problem is carrying around spatial data as attributes. Spatial database formats - for example Oracle or GeoMedia - usually store geometry within a field in the database; for example GEOM. When FME reads the data it converts the GEOM field into FME geometry and drops the field from the data.
However, if you read a geometry table with a non-geometry reader, the translation could end up with the geometry stored as an FME attribute. A similar thing could happen when a workspace reads only one geometry column of a multiple geometry table.
Geometry will create very large and complex attributes, which take up a great deal of resources. If you don’t need them, then it’s worth removing them.
Basically, you should only carry through the translation any geometry and attributes you need for the output of your workspace. If the data is not required, then it can and should be removed as early in the workspace as possible.
Jake Speedie says… |
If you aren’t debugging a translation, then avoid using the Run with Full Inspection option. It stashes all data for every connection in the workspace, meaning performance is significantly reduced.
Similarly, if you aren’t debugging a translation there’s no need for Logger or Inspector transformers. Remove or disable them and your workspace will run more efficiently. |