Alteryx Engine and AMP: Main Differences
In the first article we have covered what is the Alteryx Engine and the new Alteryx Multi-threaded Processing (AMP). Now let’s go deeper into the main differences between the two.
Data Processing Differences
The original engine architecture allows for mostly single-threaded processing, where your data are processed record-by-record sequentially. On the other hand, the new AMP concept allows for massively multi-threaded processing. Records process in 4Mb packets for a faster run time, and in parallel, which can affect the output record order.
Several tools might output records in a different order than the original engine when running a workflow with the AMP engine. Some of those tools include the following:
- Cross Tab
- Dynamic Input
- Join Multiple
- Poly Build
- Running Total
Specific functionality or configuration that has not been converted to AMP reverts to the original engine tool to work. Therefore workflows that contain both AMP-converted and non-converted tools will run seamlessly with AMP.
If you have questions as to which tools might have been converted to AMP, see: Tool Use with AMP.
A .yxdb file written with the AMP engine will read in faster than a .yxdb written with the original engine.
The .yxdb file written with the original engine will be read slower with AMP enabled. The formats are still compatible with each other.
Use .csv and .yxdb files with AMP - they both support multi-threaded, read in data.
To improve the original engine performance by making AMP write an YXDB file created with the original engine, go to the Output Data - Configuration menu, where you have an option to create the version of YXDB file compatible with Designer 18.1 and older.
Performance Profiling per Tool with AMP is available with Designer 2021.3 and newer.
Text Input Tool and AutoField
AMP addresses a historic issue where the size of the field may not be large enough when processed by a downstream tool. You don't need to add Select tools to change data types when resulting data will exceed the length of the original data type. AMP creates the maximum size field for strings and integers so that subsequent operations will have the necessary room to hold larger downstream values.
Although the Throttle tool was not fully converted to AMP, you can use it together with the Download tool (Throttle first).
Fuzzy Match may have different results between the original engine and AMP. AMP records are matched using an alternative method. The order of match might be different and the output may be in reverse order as well.
There is a known performance issue with Fuzzy Match being less performant with AMP than the original engine.
AMP uses Unicode and Perl encoding standards, where characters $+<=>^|~ do not qualify as punctuation. When using the formula function REGEX_Replace or the REGEX tool to filter punctuation using the RegEx set [[:punct:]], with AMP you need to change the expression, for example:
Grouping Tools - Blocking Tools
The Join algorithm with the original engine is based on sort-merge join method, where the records always come in a sorted order. The new Join algorithm with AMP is based on a hash join method, so the record order comes out disordered.
If we join by CustomerID with the original Engine, records order will be sorted by CustomerID field:
While with AMP, records will be the same but in a different order:
If you need to have sorted order in join output, add the Sort tool after Join.
The Join Multiple tool throws an error in the case of using duplicated fields for grouping.
The difference between the original Engine and AMP can occur when a tool inside the macro reports an error. Being single-threaded, the original Engine stops if an error occurs in the macro. AMP works until the iterative output is empty or the maximum number of iterations occur.
You can encounter following situations, due to higher number of iterations:
- The number of errors (if any) can be higher with AMP.
- The number of records could be higher with AMP.
- The output schema could be different with AMP.
ConvertToCodePage functions in the Formula tool accept string as a parameter and return string as a result, so it is not possible to distinguish how the string is encoded. There is a difference in the output of the Formula tool with these functions used with the original engine and AMP.
A different binary representation of the input data is caused by AMP internally use of UTF-8 encoded strings. When the data with a different encoding is imported, there is no way back to restore the original data.
The original engine stores strings as Latin-1 or UTF-16 encoded strings that were used as a buffer and allow to convert data back correctly.
Formula Add-Ins are not yet supported with AMP. If you need to run a workflow containing Formula Add-In functionality, run it using the original engine.
Apps using the Map tool to select from a spatial reference layer in an Analytic App should continue using original engine.
With the original engine, Expect Equal remains a CReW macro. With AMP it runs as a Native Tool.
Parallel Branch Execution and the Tool Run Order
Some workflows read from a file and then write back to it. This requires sequencing to ensure that the read is complete before the write can start. Similarly, a workflow that wants to write several sheets in one .xlsx file needs to write the sheets one at a time. Alteryx Designer provides a Block Until Done (BUD) tool to help partition the work into phases that won’t get in each other’s way.
When working on a workflow with multiple branches (largely separate streams from inputs to outputs), place the BUD tool in the workflow branch with the lowest numbered Input tool ID. This ensures every subsequent branch waits to run until the previous branch is done and the tool works as expected.
For more information regarding specific tool functionality, see: Tool Use with AMP.