BigQuery Data Type Conversions
This section covers data type conversions between the Trifacta Application and BigQuery.
Note
The Alteryx data types listed in this page reflect the raw data type of the converted column. Depending on the contents of the column, the Transformer Page may re-infer a different data type, when a dataset using this type of source is loaded.
Note
JSON data stored in BigQuery is not supported for ingest or publication.
Access/Read
When a BigQuery data type is imported, its data type is remapped according to the following table.
Tip
Data precision may be lost during conversion. You may want to generate min and max values and compute significant digits for values in your BigQuery tables and then compute the same in Dataprep by Trifacta.
Source Data Type | Supported? | Alteryx Data Type | Notes |
---|---|---|---|
String | Y | String | |
Bytes | Y | String | |
Integer | Y | Integer | |
Float | Y | Decimal | |
Boolean | Y | Boolean | |
Timestamp | Y | Datetime | Millisecond and microsecond precision in BigQuery is dropped during the import process. |
Date | Y | Datetime | If imported data contains microsecond precision, values are handled differently depending on the area of the product:
|
Time | Y | Datetime | |
Datetime | Y | Datetime | |
Array | Y | Array | |
Record | Y | Object | Nested values can be ingested into Dataprep by Trifacta. |
Date values:
When type inference is disabled, data for the following BigQuery types may be formatted according to the table:
Imported data type | Formatting |
---|---|
DATE | ['Datetime', 'yy-mm-dd', 'yyyy*mm*dd'] |
TIME | ['Datetime', 'hh:mm:ss', 'HH:MM:SS'] |
DATETIME | ['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SS'] |
TIMESTAMP | ['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SSX'] |
Publish/Write
Note
Alteryx data types not listed here are written as string types in the target database.
Alteryx Data Type | BigQuery Data Type | Notes |
---|---|---|
String | String | |
Integer | Integer | |
Decimal | Float | |
Boolean | Boolean | |
Datetime | See below. | |
Datetime (time data only) | See below. | |
Array | Array for primitive data types String for other data types | See below. |
Object | Record |
Publishing errors
Strict data type matching is enabled by default when publishing to BigQuery. When strictly matching, job runs result in the following:
Big Query data type matching | Result |
---|---|
INVALID Alteryx type matching with BigQuery data type | Error |
Data is appended or truncated on publication | Warning |
Data is written as-is to BigQuery | Valid |
Publishing Datetime type columns to BigQuery
Note
Publishing of microseconds to BigQuery is not supported.
Depending on several factors, Datetime values in Dataprep by Trifacta are published to BigQuery in different ways. Factors include:
Method of publication:
create/drop, which creates a new table
append/truncate/merge, which updates an existing table
Date format of the Alteryx source
Note
The following publishing behaviors assume that the default behavior of strict rules for type matching has been enabled on the publishing action for the target BigQuery table. Otherwise, Datetime values are always published as Datetime. For more information on strict type matching, see BigQuery Table Settings.
Datetime for creating new tables
Alteryx Datetime value | BigQuery target table type | Create and drop operations |
---|---|---|
Datetime with date information only | Date | Date |
Datetime with time information only | Time | Time |
Datetime | Datetime | Datetime |
Datetime with timestamp information only | Timestamp | Timestamp |
Datetime with date information only and no timezone | Timestamp (no timezone) | Timestamp appends |
Datetime for updating existing tables
For append, truncate, and merge operations, these updates depend on the target table type in BigQuery
Alteryx Datetime value | Date table type | Time table type | Datetime table type | Timestamp table type |
---|---|---|---|---|
Datetime with date information only | Date | INVALID | Appends | Appends |
Datetime with time information only | INVALID | Time | Appends | Appends |
Datetime | Truncates Time info | Truncates Date info | Datetime | Appends |
Datetime with timestamp information only | Truncates Time and Timezone info | Truncates Date and Timezone info | Truncates Timezone info | Timestamp |
Datetime with date information only and no timezone | Timestamp appends | Truncates Timezone | Appends Date and truncates Timezone info | Appends |
Publishing limitations for Trifacta Photon:
When publishing to BigQuery for a job executed on Trifacta Photon, the following limitations apply:
A staging CSV file is created for the job, and publication to BigQuery happens through a separate, internal Publish job. Since a CSV has no declared schema, the schema must be retrieved from the target table or tables, where possible.
When writing to multiple BigQuery target tables:
The Trifacta Application assumes that the schema of all tables is the same.
The Trifacta Application selects one table to represent the schema for all, including representative datetime formats.
If a target table does not exist, then Dataprep by Trifacta typecasting is applied:
For create and drop actions, a new table is creating using Dataprep by Trifacta typecasting.
For append, truncate, and merge actions, the table may or may not exist.
Supported Datetime formats
Dataprep by Trifacta supports the following Datetime formats for writing dates to BigQuery:
yy-dd-mm yy-mm-dd dd-mm-yy mm-dd-yy dd-mm mm-dd mm-yy
For the last three formats in the above list, additional information is written to BigQuery, where appropriate:
Format | Written as |
---|---|
dd-mm | 1970-dd-mm |
mm-dd | 1970-mm-dd |
mm-yy | yy-mm-01 |
Publishing Array type columns to BigQuery
Note
Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.
You can publish Dataprep by Trifacta array type column in BigQuery as an array type.
In BigQuery, an array is defined by the fieldTypealong with the Mode. The following is the schema of array columns in BigQuery. You can publish the following primitive type columns as Arrays in BigQuery:
Integer
Float
Boolean
String
Field Name | Type | Mode |
---|---|---|
int_array | INTEGER | REPEATED |
float_array | FLOAT | REPEATED |
string_array | STRING | REPEATED |
bool_array | BOOLEAN | REPEATED |
Publishing Objects and Array of Objects to BigQuery
Note
Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.
You can publish Dataprep by Trifacta Objects and Arrays of Objects as complex types in BigQuery.
Field Name | Type | Mode |
---|---|---|
objects | RECORD | NULLABLE |
array_objects | RECORD | REPEATED |
Nested Arrays can be published in native format as long as the following conditions are met:
Data is homogeneous. Heterogeneous objects are not supported by BigQuery.
Nesting is not more than 15 levels deep.
Field Name | Type | Mode |
---|---|---|
array_arrays | RECORD | REPEATED |
For more information, see BigQuery Table Settings.