Skip to main content

Databricks

Connection Type

ODBC (64-bit)

Driver Configuration Requirements

The host must be a Databricks cluster JDBC/ODBC Server hostname.

For optimal performance, you must enable the Fast SQLPrepare option within the driver Advanced Options to allow Alteryx to retrieve metadata without running a query.

The Enabled Translation for CTAS check box has to be deselected in the DSN. It is checked by default.

To use Visual Query Builder, select the Get Tables With Query option within the driver Advanced Options.

Supported for both AWS and Azure.

Type of Support

Read & Write, In-Database

Validated On

Databricks Interactive and SQL Endpoint cluster, Simba Apache Spark Driver 2.06.23.

Alteryx Tools Used to Connect

Standard Workflow Processing

In-Database Workflow Processing

If you have issues with reading or writing Unicode® characters, access the Simba Impala ODBC driver. Under Advanced Options, select Use SQL Unicode Types.

The string length is controlled by the driver. You can change it in the Advanced Options for the ODBC DSN or through the Advanced Options for the Driver Configuration, which you can find in the driver install folder.

Read Support

Install and configure the Apache Spark ODBC driver:

  • Spark Server Type: Select the appropriate server type for the version of Apache Spark that you are running. If you are running Apache Spark 1.1 and later, then select Apache SparkThriftServer.

  • Authentication Mechanism: See the installation guide downloaded with the Simba Apache Spark driver to configure this setting based on your setup.

To set up the driver Advanced Options, see the installation guide downloaded with the Simba Apache Spark driver.

Write Support

For both standard and in-database workflows, use the Data Stream In tool to write to Databricks. Write support is via the Databricks Bulk Loader. Go to Manage In-DB Connections - Write.

Configure the Write tab

  1. Select Databricks Bulk Loader (Avro) or DatabricksBulk Loader (CSV).To write a table with field names that total more than 4000 characters, use CSV instead of Avro. The delimiter used for CSV is the start of heading (SOH) character.

  2. Select the Connection String dropdown, and then select New Databricks connection.

  3. Select an existing ODBC data source, or select ODBC Admin to create one.

  4. Enter a username and password. These fields cannot be blank.

  5. Enter the Databricks URL

    https://abc-abc123-123a.cloud.databricks.com

    Warning

    Including a trailing “/” in the URL (e.g. https://abc-abc123-123a.cloud.databricks.com/) will result in error.

Databricks Delta Lake Bulk Connection

With Designer 2024.1 Redshift bulk connection supports AWS IAM (Identity and Access Management) authentication.

Follow below steps to configure Databricks Delta Lake bulk connection.

Important

Databricks Delta Lake Bulk Connection is only available in Designer version 2022.1 and higher.

  1. Select Databricks Delta Lake Bulk Loader (Avro) or Databricks Delta Lake Bulk Loader (CSV).To write a table with field names that total more than 4000 characters.

  2. Select the Connection String dropdown, and then select New database connection.

  3. Select an existing ODBC data source, or select ODBC Admin to create one.

  4. Enter a username and password. These fields cannot be blank. Alteryx supports personal access tokens. The username is “token”. The password is the personal access token.

  5. Select a Staging Method (supported for both AWS and Azure):

    1. For Amazon S3

      1. Enter the AWS Access Key and Secret Key to authenticate;

      2. Select an Endpoint or leave as Default;

      3. Select Use Signature V4 for Authentication;

      4. Select the level of Server-Side Encryption needed, None is the default;

      5. Select a Bucket Name to use as the staging location.

        With Designer 2024.1 Databricks Bulk loader supports AWS IAM (Identity and Access Management) authentication.

    2. For Azure ADLS

      Important

      For bulk loading for Azure there is only ADLS Gen 2 support.

      1. Select the ADLS Container;

      2. Enter the Shared Key;

      3. Enter the Storage Account;

      4. Enter an optional Temp Directory. When entering the Temp Directory, don’t repeat the Container name.

        example

        If the folder structure is Container/MyTempFolder/TempTables, only enter “MyTempFolder/TempTables”.

        If the directory entered here does not already exist, Alteryx will create one.

        Alteryx will create one sub-folder with the table name for each table that is staged.

    3. Select OK to apply.

      With Designer 2023.2, Databricks Bulk loader supports Azure AD authentication for ADLS staging. This feature improves the security of Databricks Bulk Loader by allowing organizations to granularly manage access to individual containers within a storage account.

      To learn more about how to use Azure AD authentication for ADLS staging with Databricks Bulk Loader, refer to this Community page.