Google BigQuery
Type of Support: | Read & Write |
Validated On: | -- |
Connection Type: | Alteryx Tool |
Driver Details: | Administrator action in Google Cloud Platform may be necessary before configuring a connection in the Alteryx Google BigQuery tools. |
Alteryx tools used to connect
Use the Google BigQuery Input tool to query a table from Google BigQuery and read it into Designer. See Google BigQuery for information about known limitations. More information about Google BigQuery can be found on the Google Big Query Documentation site.
Gallery tool
This tool is not automatically installed with Designer. To use this tool, download it from the Alteryx Analytics Gallery.
You can log in to Google BigQuery with service-to-service or end user credentials.
You can obtain a service account JSON key file from the Google Cloud Console or you can create a new key for an existing service account. More information about Google BigQuery can be found on the Google Big Query Documentation site under Creating and Managing Service Account Keys.
- In Authentication mode, select Service-to-Service.
- Click Select a file to select your Google service account file.
- In Open File, browse to and select the JSON file that contains your Google service account key.
- In Authentication mode, select End user.
- Enter your Client ID and Client Secret.
- Click Log in.
- In Sign in with Google, enter your Google email and password then click Allow to approve the connector's permission to access Google BigQuery on your behalf.
- Click Next.
- In BigQuery Table Input Config, click Select table.
Designer displays the tables you have access to.
- Select a project, expand a dataset, and then select a BigQuery table. Once you select a table, Designer, displays the table's fully qualified identifier. To refresh the metadata, click the refresh icon.Click Change table to select a different table
-
Select options.
- Limit the number of results returned: Sets the number of rows in the report to return.
- Use a custom query: Select a SQL type and then Select a project to query. Enter your custom query including any LIMIT clause.
- Infer max repeated record limit: Not available with custom query. Select this option and then indicate a Max field length to set the repeated record limit to the maximum number of repeated records in the results returned from your Google BigQuery table. If you limit the number of records, then Designer will only inspect that set of records to infer the max repeated record limit.
- Run the workflow.
Use the Google BigQuery Output Tool to write data from Designer to tables in Google BigQuery. See Google BigQuery for information about known limitations. More information about Google BigQuery can be found on the Google Big Query Documentation site.
Gallery tool
This tool is not automatically installed with Designer. To use this tool, download it from the Alteryx Analytics Gallery.
You can log in to Google BigQuery with service-to-service or end user credentials.
You can obtain a service account JSON key file from the Google Cloud Console or you can create a new key for an existing service account. More information about Google BigQuery can be found on the Google Big Query Documentation site under Creating and Managing Service Account Keys.
- In Authentication mode, select Service-to-Service.
- Click Select a file to select your Google service account file.
- In Open File, browse to and select the JSON file that contains your Google service account key.
- In Authentication mode, select End user.
- Enter your Client ID and Client Secret.
- Click Log in.
- In Sign in with Google, enter your Google email and password then click Allow to approve the connector's permission to access Google BigQuery on your behalf.
- Click Next.
- In BigQuery Table Input Config, click Select table.
- Select a project, expand a dataset, and then select a BigQuery table. Designer displays the tables you have access to. To refresh the metadata, click the refresh icon.Once you select a table, Designer, displays the table's fully qualified identifier. Click Change table to select a different table
-
In Insert batch size, select a batch size. This number specifies the number of rows that will be written to the Google BigQuery Table at each call to the insert API. Google sets limits on the number of API requests per second, the number of rows that may be inserted per second and per request, the size of the rows to be inserted, and size of HTTP requests.
More information about limits and quotas can be found on the Google BigQuery Quotas and Limits Documentation site. You can also review the Streaming Data into BigQuery Documentation to understand how your data will be inserted into the Google BigQuery table.
- Run the workflow
Authentication and Permissions
Contact your Google Cloud Administrator for help with your Google BigQuery credentials.
Service-to-Service Authentication Setup
You can obtain a service account JSON key file from the Google Cloud Console or you can create a new key for an existing service account. More information about Google BigQuery can be found on the Google Big Query Documentation site under Creating and Managing Service Account Keys.
End User Authentication Setup
Obtain an OAuth client ID and secret from the credentials page of the Google Cloud Console. Ensure the client type is set to Other.
Access Permissions
Permissions to read and write data within a Google BigQuery project service account must be granted via the Google Cloud Console. More information about permissions can be found on the Google Big Query Documentation site.
Known Limitations
FLOAT BigQuery Fields
FLOAT type fields in a BigQuery table are automatically promoted to double types in the Alteryx engine. Therefore, you may see extra decimals in values input from and then output back to Google BigQuery.
Max Nested, Repeated Record Depth
Nested, repeated records are supported at a depth of one level. Nexted repated records that are nested more than one level deep are not supported. More information about nested, repeated records, can be found on the Google BigQuery Documentation site.
Nested, Repeated Record Flattening
When pulling nested or repeated records from a Google BigQuery table, the Alteryx workflow will flatten the nexted and/or repeated records according to the following naming scheme:
A nested record nested_attr of the top-level column top_attr will create a new column named nr_top_attr_nexted_attr.
When pushing records from the Alteryx workflow to an existing Google BigQuery table, the need to fold record fields into nested and/or repeated records is specified by renaming record fields flowing into the ouput connector to follow these naming conventions.
A record nested_attr that should be nested onto the top-level column top_attr must be renamed to nr_top_attr_nested_attr.
A record nested_attr that should be nested and repeated an arbitrary number of times under the top-level column top_attr must be renamed to nrr_top_attr_{repetition_index}_nested_attr. Any repetitions that contain all NULL values on the nested fields are dropped before the API call to insert new rows to the Google BigQuery table This implies that records in your designer workflow can have an arbitrary number of maximum repetitions, and not every record must have the same number of repetitions.
Output Connector Schemas
Field names and types passed to the Google BigQuery Output Tool must match those in the destination Google BigQuery table. Columns that exist in the Google BigQuery table but do not exist in the workflow field list will produce Null values for those columns in the inserted records. This will cause an error in the case where the destination column is not configured as NULLABLE.
Input Progress Indicator
In the case that there are rows in the streaming buffer for the selected table of an input tool without a custom query, the input tool will use the estimated number of rows in the streaming buffer reported by Google while updating the percentage of the progress indicator in a running workflow. If Google's estimate is incorrect, then the value reported in the progress indicator could be incorrect as well.
In the case that the table being queried is empty and the streaming buffer for that table is non-empty, then it is possible for Google to estimate the number of rows in the streaming buffer as zero. However, Google may still return rows from the streaming buffer. When this happens, you will not see the progress indicator percentage update and a warning will display.