Databricks Unity Catalog
Connection Type | ODBC (64-bit) |
Driver Configuration Requirements | The host must be a Databricks Unity Catalog cluster JDBC/ODBC Server hostname. Supported for both AWS and Azure. |
Type of Support | Read & Write, In-Database |
Validated On | Databricks Cluster and SQL Warehouse, Simba Apache Spark Driver 2.7.7.1017 |
Driver Details
In-Database processing requires 64-bit database drivers.
Alteryx Tools Used to Connect
Standard Workflow Processing
In-Database Workflow Processing
Notice
Databricks Unity Catalog support is only supported using DCM.
Databricks Unity Catalog is only supported using DSN-less connections.
Writing to Databricks Unity Catalog is only supported using the In-DB tools.
We support MergeInDB for Databricks Unity Catalog, go to Write Data In-DB Tool.
Configure Input tool
The tool uses the Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog connection Technology in DCM.
Make sure DCM is enabled.
In the Input tool, select Set up a Connection.
Select the Data Sources tab.
Select the Quick Connect option under Databricks Unity Catalog.
The DCM Connection Manager is pre-filtered to show only Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog connections.
Choose an existing DCM connection or select +New to create a new connection. See below for configuring a new connection using DCM.
The Choose Table or Specify Query window loads and allow the selection of tables.
Configure In-DB Connection
Open the In-DB Connections Manager.
Select Databricks Unity Catalog in the Data Source dropdown.
Select New to create a new connection.
Enter a Connection Name.
On the Read tab, select Setup Connection to open the DCM connection manager for Databricks Unity Catalog. The DCM Connection Manager is pre-filtered to show only Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog connections.
Select an existing connection or click +New to create a new connection. See below for configuring a new connection using DCM.
On the Write tab, select Setup Connection to open the DCM connection manager for the Databricks Connection. The DCM Connection Manager is pre-filtered to show only Apache Spark ODBC Bulk DSN-less with Simba Databricks Unity Catalog connections.
Select an existing connection or select +New to create a new connection. See below for configuring a new connection using DCM.
On the Write tab, select Setup Connection to open the DCM connection manager for the Delta Lake Connection. The DCM Connection Manager is pre-filtered to show only Delta Lake connections.
Select an existing connection or select +New to create a new connection. See below for configuring a new connection using DCM.
Select Apply and OK to save the connection and close the window.
If the In-DB Connections Manager was accessed through the Connect In-DB tool, the Choose Table or Specify Query window loads and allow the selection of tables.
Note: Databrick Unity Catalog requires the following permissions for least privileged access to READ:
Information Schema (Default)
USE CATALOG for CATALOG
USE SCHEMA for SCHEMA
BROWSE (Default) for Corresponding tables
SELECT for Corresponding tables
Configure Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog in DCM
This connection is used for reading data from Databricks Unity Catalog.
Open Data Connection Manager and navigate to Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog
From an Input tool or the In-DB Connection Manager, DCM is pre-filtered
From the File Menu, go to File > Manage Connections > +New > Apache Spark > Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog
Enter a Data Source Name.
Enter the Databricks Unity Catalog Host name.
The Port is set to 443 by default. Change as needed.
Enter the http path. The http path is the Databricks compute resources URL.
Select Save to save the Data Source.
Select +Connect Credential.
Select an Authentication Method.
To use a Personal Access Token, select Username and password as the authentication method and make the username “token”.
To use Azure AD, see Databricks Azure OAuth Authentication.
Select an Existing Credential or select Create New Credential to create a new credential and enter the Personal Access Token or the information for Azure AD.
Select Link to link the credential to the Data Source and select Connect.
Configure Apache Spark ODBC Bulk DSN-less with Simba Databricks Unity Catalog in DCM
This connection is used for writing data to Databricks Unity Catalog.
Open Data Connection Manager and navigate to Apache Spark ODBC Bulk DSN-less with Simba Databricks Unity Catalog.
From an Input tool or the In-DB Connection Manager, DCM is pre-filtered.
From the File Menu, go to File > Manage Connections > +New > Apache Spark > Apache Spark ODBC DSN-less with Simba Databricks Unity Catalog.
Enter a Data Source Name.
Enter the Databricks Unity Catalog Host name.
The Port is set to 443 by default. Change as needed.
Enter the http path. The http path is the Databricks compute resources URL.
Enter the Catalog. This sets the catalog that is used for writing data and creating tables.
Enter the Schema. This sets the schema that is used for writing data and creating tables.
Select Save to save the Data Source.
Select +Connect Credential to add a Credential.
Select an Authentication Method.
To use a Personal Access Token, select Username and password as the authentication method and make the username “token”.
To use Azure AD, see Databricks Azure OAuth Authentication.
Select an Existing Credential or select Create New Credential to create a new credential and enter the Personal Access Token or the information for Azure AD.
Select Link to link the credential to the Data Source.
Select Connect to connect.
Configure Delta Lake Connection in DCM
This connection is used for staging data in AWS S3 or ADLS.
Open Data Connection Manager and navigate to Delta Lake on AWS or Delta Lake on Azure
From an Input tool or the In-DB Connection Manager, DCM is pre-filtered.
From the File Menu, go to File > Manage Connections > +New > Delta Lake > Delta Lake on AWS/Delta Lake on Azure.
For Delta Lake on AWS
Enter a Data Source Name.
Enter an Endpoint or leave it as Default. When Default is used, Amazon determines the endpoint based on the selected bucket.
Make sure Use Signature V4 for Authentication is selected unless specifically instructed otherwise. If unchecked Signature V2 is used. Regions created after January 30, 2014 support only Signature Version 4. These regions require Signature Version 4 authentication:
US East (Ohio) Region,
Canada (Central) Region,
Asia Pacific (Mumbai) Region,
Asia Pacific (Seoul) Region,
EU (Frankfurt) Region,
EU (London) Region,
China (Beijing) Region.
Select the level of Server-Side Encryption needed, None is the default.
None (Default): An encryption method is not used.
SSE-KMS: Use server-side encryption with AWS KMS-managed keys. You can also provide a KMS Key ID. When you select this method, Use Signature V4 for Authentication must be selected.
Enter the Bucket name to be used for staging. The user needs to have read, write, and delete permissions for the bucket.
For Delta Lake on Azure
Enter Data Source Name, ADLS Container, and Storage Account.
Storage Temporary Directory is optional. When entering the Temp Directory, don’t repeat the Container name. If the directory entered here does not already exist, Alteryx creates one. Alteryx creates one sub-folder with the table name for each table that is staged.
Select Save to save the Data Source.
Select +Connect Credential.
Select an Authentication Method.
For Delta Lake on AWS, the only Authentication Method is AWS IAM access keys.
For Delta Lake on Azure, you may select between Shared Access Key or Azure AD authentication.
Select an Existing Credential or Create New Credential to create a new credential.
Enter a Credential Name and the AWS IAM access keys or the Azure Shared Key.
Select Link to link the credential to the Data Source.
Select Connect.