Skip to main content

BigQuery Data Type Conversions

This section covers data type conversions between the Trifacta Application and BigQuery.

Note

The Alteryx data types listed in this page reflect the raw data type of the converted column. Depending on the contents of the column, the Transformer Page may re-infer a different data type, when a dataset using this type of source is loaded.

Note

JSON data stored in BigQuery is not supported for ingest or publication.

Access/Read

When a BigQuery data type is imported, its data type is remapped according to the following table.

Tip

Data precision may be lost during conversion. You may want to generate min and max values and compute significant digits for values in your BigQuery tables and then compute the same in Dataprep by Trifacta.

Source Data Type

Supported?

Alteryx Data Type

Notes

String

Y

String

Bytes

Y

String

Integer

Y

Integer

Float

Y

Decimal

Boolean

Y

Boolean

Timestamp

Y

Datetime

Millisecond and microsecond precision in BigQuery is dropped during the import process.

Date

Y

Datetime

If imported data contains microsecond precision, values are handled differently depending on the area of the product:

  • Initial sampling, import: Uses the BigQuery API, which can return microseconds.

  • All other areas of the product, including job execution: Uses the Dataflow SDK, which returns milliseconds.

Time

Y

Datetime

Datetime

Y

Datetime

Array

Y

Array

Record

Y

Object

Nested values can be ingested into Dataprep by Trifacta.

Date values:

When type inference is disabled, data for the following BigQuery types may be formatted according to the table:

Imported data type

Formatting

DATE

['Datetime', 'yy-mm-dd', 'yyyy*mm*dd']

TIME

['Datetime', 'hh:mm:ss', 'HH:MM:SS']

DATETIME

['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SS']

TIMESTAMP

['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SSX']

Publish/Write

Note

Alteryx data types not listed here are written as string types in the target database.

Alteryx Data Type

BigQuery Data Type

Notes

String

String

Integer

Integer

Decimal

Float

Boolean

Boolean

Datetime

See below.

Datetime (time data only)

See below.

Array

Array for primitive data types

String for other data types

See below.

Object

Record

Publishing errors

Strict data type matching is enabled by default when publishing to BigQuery. When strictly matching, job runs result in the following:

Big Query data type matching

Result

INVALID Alteryx type matching with BigQuery data type

Error

Data is appended or truncated on publication

Warning

Data is written as-is to BigQuery

Valid

Publishing Datetime type columns to BigQuery

Note

Publishing of microseconds to BigQuery is not supported.

Depending on several factors, Datetime values in Dataprep by Trifacta are published to BigQuery in different ways. Factors include:

  • Method of publication:

    • create/drop, which creates a new table

    • append/truncate/merge, which updates an existing table

  • Date format of the Alteryx source

Note

The following publishing behaviors assume that the default behavior of strict rules for type matching has been enabled on the publishing action for the target BigQuery table. Otherwise, Datetime values are always published as Datetime. For more information on strict type matching, see BigQuery Table Settings.

Datetime for creating new tables

Alteryx Datetime value

BigQuery target table type

Create and drop operations

Datetime with date information only

Date

Date

Datetime with time information only

Time

Time

Datetime

Datetime

Datetime

Datetime with timestamp information only

Timestamp

Timestamp

Datetime with date information only and no timezone

Timestamp (no timezone)

Timestamp appends 1970-01-01

Datetime for updating existing tables

For append, truncate, and merge operations, these updates depend on the target table type in BigQuery

Alteryx Datetime value

Date table type

Time table type

Datetime table type

Timestamp table type

Datetime with date information only

Date

INVALID

Appends 00:00:00

Appends UTC, 00:00:00

Datetime with time information only

INVALID

Time

Appends 1970-01-01

Appends UTC, 1970-01-01

Datetime

Truncates Time info

Truncates Date info

Datetime

Appends UTC

Datetime with timestamp information only

Truncates Time and Timezone info

Truncates Date and Timezone info

Truncates Timezone info

Timestamp

Datetime with date information only and no timezone

Timestamp appends1970-01-01

Truncates Timezone

Appends Date and truncates Timezone info

AppendsUTC, 1970-01-01

Publishing limitations for Trifacta Photon:

When publishing to BigQuery for a job executed on Trifacta Photon, the following limitations apply:

  • A staging CSV file is created for the job, and publication to BigQuery happens through a separate, internal Publish job. Since a CSV has no declared schema, the schema must be retrieved from the target table or tables, where possible.

  • When writing to multiple BigQuery target tables:

    • The Trifacta Application assumes that the schema of all tables is the same.

    • The Trifacta Application selects one table to represent the schema for all, including representative datetime formats.

    • If a target table does not exist, then Dataprep by Trifacta typecasting is applied:

      • For create and drop actions, a new table is creating using Dataprep by Trifacta typecasting.

      • For append, truncate, and merge actions, the table may or may not exist.

Supported Datetime formats

Dataprep by Trifacta supports the following Datetime formats for writing dates to BigQuery:

yy-dd-mm
yy-mm-dd
dd-mm-yy
mm-dd-yy
dd-mm
mm-dd
mm-yy

For the last three formats in the above list, additional information is written to BigQuery, where appropriate:

Format

Written as

dd-mm
1970-dd-mm
mm-dd
1970-mm-dd
mm-yy
yy-mm-01

Publishing Array type columns to BigQuery

Note

Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.

You can publish Dataprep by Trifacta array type column in BigQuery as an array type.

In BigQuery, an array is defined by the fieldTypealong with the Mode. The following is the schema of array columns in BigQuery. You can publish the following primitive type columns as Arrays in BigQuery:

  • Integer

  • Float

  • Boolean

  • String

Field Name

Type

Mode

int_array

INTEGER

REPEATED

float_array

FLOAT

REPEATED

string_array

STRING

REPEATED

bool_array

BOOLEAN

REPEATED

Publishing Objects and Array of Objects to BigQuery

Note

Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.

You can publish Dataprep by Trifacta Objects and Arrays of Objects as complex types in BigQuery.

Field Name

Type

Mode

objects

RECORD

NULLABLE

array_objects

RECORD

REPEATED

Nested Arrays can be published in native format as long as the following conditions are met:

  • Data is homogeneous. Heterogeneous objects are not supported by BigQuery.

  • Nesting is not more than 15 levels deep.

Field Name

Type

Mode

array_arrays

RECORD

REPEATED

For more information, see BigQuery Table Settings.