Skip to main content

Avro Data Types

Use the Input Data tool to read uncompressed and Deflate-compressed Avro files and use the Output Data tool to write Avro files.

Input

Only Deflate compression is supported.

Most of the 14 native Avro data types are supported. The type mapping on import is as follows:

  • String: UTF-8 Converted to V_WString (UTF-16)

  • Bytes: Maintained as Blob (use Blob Tool to convert as necessary)

  • Int: Maintained as Int32

  • Long: Maintained as Int64

  • Float: Maintained as Float

  • Double: Maintained as Double

  • Boolean: Maintained as Bool

  • Null: Not Supported

  • Enum: Converted to String Equivalent

  • Union: Alteryx supports unions with two sub-types. Both sub-types must be equivalent (for example, both int or both double) or one of them must be Null.

    • The Alteryx field type is the type of the non-null branch (or both branches in the case that both are non-null).

    • If the non-null branch is active, the Alteryx field contains that value.

    • If the null branch is active, the Alteryx field is set to null.

    • Invalid unions are imported as JSON into a V_WString (use JSON Parse Tool to convert as necessary). For example, a union with an int as its active branch may be represented as "{"int":123}".

  • Fixed: Maintained as Blob (use Blob Tool to convert as necessary)

These Avro types are not supported natively but are imported as JSON into a String (use the JSON Parse tool to convert as necessary):

  • Record: For example, "{"SubField1":7,"SubField2":"Field2"} for a record containing both int and string fields.

  • Array: For example, "[1,2,3,4,5]" for an array of ints.

  • Map: For example, "{"Key1":Value1,"Key2":Value2}" for a map of string to double.

Output

When writing Avro files, there are 2 options:

  1. Enable Compression (Deflate): Enabling compression increases output time but, with larger files, also reduces network time. The supported compression uses the DEFLATE algorithm (essentially gzip) and should be supported natively by other Avro-capable tools such as Hive.

  2. Support Null Values: Selecting this option writes _all_ fields as Unions with a null branch and a value branch. If the Alteryx value is null the output Avro union has its null branch selected, otherwise, it has its value branch selected.

    If this option is not selected, all output fields are written as their native Avro types (non-union). Alteryx fields that are null are written as their default value (for example, the number 0 for an int32 and an empty string for a string field).

    Consider using a Formula tool to handle Null values with a 'known' value so they can be handled in Hadoop.

The type mapping from Alteryx to Avro is as follows:

  • Bool: Maintained as Boolean

  • Byte, Int16, Int32: Maintained Int (32-bit)

  • Int64: Maintained as Long (64-bit)

  • Float: Maintained as Float

  • Double: Maintained as Double

  • FixedDecimal: Converted to Double

  • String, V_String, Date, Time, DateTime: Maintained as String (UTF-8)

  • WString, V_WString: Converted to String (UTF-8)

  • Blob, SpatialBlob: Maintained as Bytes