Apache Spark on Databricks
Use these instructions to learn how to connect to Databricks and create an Alteryx connection string.
|Type of Support:||In-Database|
|Validated On:||Apache Spark 2.0, 2.1, and 2.2|
|Distributions Validated On:||Databricks|
|Connection Type:||REST/HTML server|
|Server Details:||Databricks information can be found here.|
Alteryx tools used to connect
- Connect In-DB Tool, Data Stream In Tool, and Apache Spark Code Tool (in-database workflow processing)
Connect to Apache Spark by dragging a Connect In-DB tool or the Apache Spark Code tool onto the canvas. Create a new connection to Databricks using the Apache Spark on Databricks driver. Use the instructions below to configure the connection. See Databricks documentation for more information.
Configure the Databricks Connection window
To connect to Databricks and create an Alteryx connection string:
- Enter your Databricks Account ID.
- Paste the Databricks Token you generated in your Databricks user settings. Tokens can expire and be revoked.
- Click Connect. Designer displays a list of Databricks clusters to connect to. If the connection is not successful, try entering your credentials again.
- Select a Databricks Cluster to connect to.
- Select a Session Type. Select one of the following depending on the code you are writing:
Single-file importable distribution format for Python-related projects
Python Package Index is a repository of software for Python
a repository for files and artifacts.
R file package
Click the “+” icon to add a row. Click Save to save the library configuration settings to a file. Use the File folder icon to locate a saved configuration file. To delete a row, hover over it and then select the trash icon.