Avro Data Types
Use the Input Data Tool to read uncompressed and Deflate-compressed Avro files and use the Output Data Tool to write Avro files.
Input
Only Deflate compression is supported.
Most of the 14 native Avro data types are supported. The type mapping on import from is as follows:
- String: UTF-8 converted to V_WString (UTF-16)
- Bytes: Maintained as blob (use Blob Tool to convert as necessary)
- Int: Maintained as Int32
- Long: Maintained as Int64
- Float: Maintained as Float
- Double: Maintained as Double
- Boolean: Maintained as Bool
- Null: Not supported
- Enum: Converted to String equivalent
- Union: Alteryx supports unions with two sub-types. Both sub-types must be equivalent (for example, both int or both double) or one of them must be Null.
- The Alteryx field type will be the type of the non-null branch (or both branches in the case that both are non-null)
- If the non-null branch is active, the Alteryx field will contain that value
- If the null branch is active, the Alteryx field will be set to null
- Invalid unions are imported as JSON into an V_WString (use JSON Parse Tool to convert as necessary). For example, a union with an int as its active branch may be represented as “{“int”:123}”.
- Fixed: Maintained as blob (use Blob Tool to convert as necessary)
The following Avro types are not supported natively, but are imported as JSON into a String (use the JSON Parse Tool to convert as necessary):
- Record: For example, “{“SubField1”:7,”SubField2”:”Field2”} for a record containing both int and string fields
- Array: For example, “[1,2,3,4,5]” for an array of ints
- Map: For example, “{“Key1”:Value1,”Key2”:Value2}” for a map of string to double
Output
When writing Avro files, there are two options:
- Enable Compression (Deflate): Enabling compression will increase output time but, with larger files, will also reduce network time. The supported compression uses the DEFLATE algorithm (essentially gzip) and should be supported natively by other Avro-capable tools such as Hive.
- Support Null Values: Selecting this option will write _all_ fields as Unions with a null branch and a value branch. If the Alteryx value is null the output Avro union will have its null branch selected, otherwise it will have its value branch selected.
If this option is not selected, all output fields will be written as their native Avro types (non-union). Alteryx fields that are null will be written as their default value (for example, the number 0 for an int32 and an empty string for a string field).
Consider using a Formula Tool to handle Null values with a 'known' value so they can be handled in Hadoop.
The type mapping from Alteryx to Avro is as follows:
- Bool: Maintained as Boolean
- Byte, Int16, Int32: Maintained Int (32-bit)
- Int64: Maintained as Long (64-bit)
- Float: Maintained as Float
- Double: Maintained as Double
- FixedDecimal: Converted to Double
- String, V_String, Date, Time, DateTime: Maintained as String (UTF-8)
- WString, V_WString: Converted to String (UTF-8)
- Blob, SpatialBlob: Maintained as Bytes