Use the Input Data tool to read uncompressed and Deflate-compressed Avro files and use the Output Data tool to write Avro files.
Input
Only Deflate compression is supported.
Most of the 14 native Avro data types are supported. The type mapping on import from is as follows:
String: UTF-8 converted to V_WString (UTF-16)
Bytes: Maintained as blob (use Blob Tool to convert as necessary)
Int: Maintained as Int32
Long: Maintained as Int64
Float: Maintained as Float
Double: Maintained as Double
Boolean: Maintained as Bool
Null: Not supported
Enum: Converted to String equivalent
Union: Alteryx supports unions with two sub-types. Both sub-types must be equivalent (for example, both int or both double) or one of them must be Null.
The Alteryx field type will be the type of the non-null branch (or both branches in the case that both are non-null)
If the non-null branch is active, the Alteryx field will contain that value
If the null branch is active, the Alteryx field will be set to null
Invalid unions are imported as JSON into an V_WString (use JSON Parse Tool to convert as necessary). For example, a union with an int as its active branch may be represented as â{âintâ:123}â.
Fixed: Maintained as blob (use Blob Tool to convert as necessary)
The following Avro types are not supported natively, but are imported as JSON into a String (use the JSON Parse Tool to convert as necessary):
Record: For example, â{âSubField1â:7,âSubField2â:âField2â} for a record containing both int and string fields
Array: For example, â[1,2,3,4,5]â for an array of ints
Map: For example, â{âKey1â:Value1,âKey2â:Value2}â for a map of string to double
Output
When writing Avro files, there are two options:
Enable Compression (Deflate): Enabling compression will increase output time but, with larger files, will also reduce network time. The supported compression uses the DEFLATE algorithm (essentially gzip) and should be supported natively by other Avro-capable tools such as Hive.
Support Null Values: Selecting this option will write _all_ fields as Unions with a null branch and a value branch. If the Alteryx value is null the output Avro union will have its null branch selected, otherwise it will have its value branch selected.
If this option is not selected, all output fields will be written as their native Avro types (non-union). Alteryx fields that are null will be written as their default value (for example, the number 0 for an int32 and an empty string for a string field).
Consider using a Formula Tool to handle Null values with a 'known' value so they can be handled in Hadoop.
The type mapping from Alteryx to Avro is as follows:
Bool: Maintained as Boolean
Byte, Int16, Int32: Maintained Int (32-bit)
Int64: Maintained as Long (64-bit)
Float: Maintained as Float
Double: Maintained as Double
FixedDecimal: Converted to Double
String, V_String, Date, Time, DateTime: Maintained as String (UTF-8)