Google BigQuery

バージョン:
2022.3
Last modified: April 28, 2022
Type of Support

読み込みと書き込み

Validated On

ODBC クライアントバージョン 2.4.2.1010

Driver Details

Administrator action in Google Cloud might be necessary before configuring a connection in the Alteryx Google BigQuery tools.
OAuth authentication supported with the Simba ODBC driver on DNS mode only.
Google BigQuery requires OAuth as the authentication method. We support OAuth with the Simba ODBC driver through the DSN or in Alteryx using DCM.

接続に使用する Alteryx ツール

Link
Google BigQuery Input Tool Icon

Google BigQuery 入力ツール

Link
Google BigQuery Output Tool Icon

Google BigQuery出力ツール

認証と権限

Google BigQuery 認証情報については、Google Cloud 管理者にお問い合わせください。

サービス間認証のセットアップ

サービスアカウントの JSON キーファイルは、Google Cloud Consoleから取得することも、 既存のサービスアカウント用に新しいキーを作成することもできます。Google BigQuery の詳細については、BigQuery ドキュメント にアクセスし、「サービス アカウントキーの作成と管理」を参照してください。

エンドユーザー認証のセットアップ

Google Cloud Console の認証情報ページから OAuth クライアント ID とシークレットを取得します。クライアントタイプが [その他] に設定されていることを確認します。

可能な限り、エンドユーザー認証に対してサービス間認証を使用します。同じサービスアカウントを認証する様々なアプリケーションの個別のキーを管理することにより、他のアプリケーションが認証資格情報を更新する必要なく、特定のアプリケーションのアクセスを取り消すことができます。

アクセス権限

Google BigQuery プロジェクトサービスアカウント内のデータの読み取りおよび書き込み権限は、Google Cloud Console から付与する必要があります。権限の詳細については、Big Query ドキュメント を参照してください。

Standard Workflow Processing

Link
Input Data Tool Icon

データ入力ツール

Link
Output Data Tool Icon

データ出力ツール

In-database Workflow Processing

Link
Blue icon with database being plugged in.

接続In-DBツール

Link
Blue icon with a stream-like object flowing into a database.

データストリーム入力ツール

In-database Workflow Processing is only supported using DCM. 
The Google BigQuery Dataset used for the InDB Read and Write connections must be the same.

 

Configure a Google BigQuery Bulk Connection

Using Google BigQuery Bulk is only supported while also using DCM. To use the bulk connection via the Output Data tool:

  1. Make sure the Data Connection Manager is enabled.

  2. Select Set Up a Connection and select Data Sources - Google BigQuery Bulk.

  3. Select Add a Data Source

    1. Enter a Data Source Name,

    2. Enter a Catalog (Project). This is the Google BigQuery Project ID that contains both the Dataset where the table will be written and the Google Cloud Storage bucket used for staging.

    3. Enter a Dataset. This is where the final table will be written.

    4. Enter a Bucket Name. This is the Google Cloud Storage staging location for bulk loading. The bucket has to be in the same location as the Project.

    5. Select Save.

  4. Select + Connect Credential. The same credential is used for staging in Cloud Storage as the Google BigQuery project.

    1. Select an Authentication Method,

    2. follow the instructions for using OAuth with DCM.

  5. Select Connect.

  6. Enter the table name and select OK.

OAuth Using DCM

You can set up authentication for Google BigQuery using OAuth integrated with DCM. This way you don't have to manually copy tokens to the ODBC driver.

Prerequisites

To set up OAuth you need:

  • A valid DSN prepared with the Authenticator option set to OAuth,

  • The OAuth authority URL, Port, Client ID, and Client Secret from the administrator.

Authentication Set Up

In Designer, drag the Input tool onto the canvas.

  1. Select the Use Data Connection Manager (DCM) checkbox.

  2. Select Set Up a Connection.

  3. In the Data Connections window, go to Data sources and choose Google BigQuery ODBC.

  4. The Connection Manager window opens. Select + Add Data Source.

  5. For Technology choose Google BigQuery ODBC with Simba.

  6. Fill in the Data Source Name and choose the prepared Google BigQuery DSN.

  7. Select Save.

  8. To link the credentials, select + Connect Credential.

  9. Choose OAuth as an Authentication Method.

  10. For OAuth Application choose to Create New Credential and fill in the information needed (get the OAuth authority URL, Port, Client ID, and Client Secret from the administrator).

  11. Choose to Create New Credential for OAuth Tokens.

  12. Fill in the Credential Name and select Create and Link.

  13. Select Connect.

  14. Sign in on the Google manager external browser window.

You may be asked to step through the Oauth authorization process twice due to the way tokens are managed in DCM.

 

Additional Informations
  • Google BigQuery always quotes tables

  • Appends happen by field name. If the name or data type don’t match, Google BigQuery throws an error

  • Check Support Null Values in the format specific options (checked by default) to allow loading null data into Google BigQuery. If the option is not checked, Google BigQuery will reject null values.

  • Extra columns in the target table will load as null if they are not included in Alteryx.

  • A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 300 characters. For more information, see Google BigQuery documentation.

Output Connector Schemas

Field names and types passed to the Google BigQuery Output Tool must match those in the destination Google BigQuery table. Columns that exist in the Google BigQuery table but do not exist in the workflow field list will produce Null values for those columns in the inserted records. This will cause an error in the case where the destination column is not configured as NULLABLE.

Input Progress Indicator

In the case that there are rows in the streaming buffer for the selected table of an input tool without a custom query, the input tool will use the estimated number of rows in the streaming buffer reported by Google while updating the percentage of the progress indicator in a running workflow. If Google's estimate is incorrect, then the value reported in the progress indicator could be incorrect as well.

In the case that the table being queried is empty and the streaming buffer for that table is non-empty, then it is possible for Google to estimate the number of rows in the streaming buffer as zero. However, Google may still return rows from the streaming buffer. When this happens, you will not see the progress indicator percentage update, and a warning will display.

Configure a Connection Using the Google BigQuery Tools

Authentication and Permissions

Contact your Google Cloud Administrator for help with your Google BigQuery credentials.

Since Designer version 22.1 there is an OAuth authentication supported with the Simba ODBC driver.

Service-to-Service Authentication Setup

You can obtain a service account JSON key file from the Google Cloud Console or you can create a new key for an existing service account. More information about Google BigQuery can be found on the Google Big Query Documentation site under Creating and Managing Service Account Keys.

End-User Authentication Setup

Obtain an OAuth client ID and secret from the credentials page of the Google Cloud Console. Ensure the client type is set to Other.

Use Service-to-service authentication over End user authentication whenever possible. Managing distinct keys for various applications authenticating for the same service account allows the revocation of access for a specific application without requiring the other applications to update their authentication credentials.

Access Permissions

Permissions to read and write data within a Google BigQuery project service account must be granted via the Google Cloud Console. More information about permissions can be found on the Google Big Query Documentation site.

Known Limitations

Max Nested, Repeated Record Depth

Nested, repeated records are supported at a depth of one level. Nested, repeated records that are nested more than one level deep are not supported. More information about nested, repeated records, can be found on the Google BigQuery Documentation site.

Nested, Repeated Record Flattening

On Input

When pulling nested or repeated records from a Google BigQuery table, the Alteryx workflow will flatten the nexted and/or repeated records according to the following naming scheme:

A nested record nested_attr of the top-level column top_attr will create a new column named nr_top_attr_nexted_attr.

On Output

When pushing records from the Alteryx workflow to an existing Google BigQuery table, the need to fold record fields into nested and/or repeated records is specified by renaming record fields flowing into the ouput connector to follow these naming conventions.

A record nested_attr that should be nested onto the top-level column top_attr must be renamed to nr_top_attr_nested_attr.

A record nested_attr that should be nested and repeated an arbitrary number of times under the top-level column top_attr must be renamed to nrr_top_attr_{repetition_index}_nested_attr. Any repetitions that contain all NULL values on the nested fields are dropped before the API call to insert new rows to the Google BigQuery table. This implies that records in your designer workflow can have an arbitrary number of maximum repetitions, and not every record must have the same number of repetitions.

このページは役に立ちましたか?

Alteryx 製品のご利用中に何か問題がございましたら、Alteryx コミュニティにアクセスするか、サポートにお問い合わせください。このフォームを送信できない場合E メールでお送りください。