Apache Spark on Databricks

Connection Type	REST/HTML Server
Distributions Validated On	Databricks
Server Details	Databricks information can be found here.
Type of Support	In-Database
Validated On	ODBC Client Version: 2.6.23.1039

Alteryx Tools Used to Connect

In-database Workflow Processing

Connect In-DB Tool	Data Stream In Tool
Apache Spark Code Tool

Connect to Apache Spark by dragging a Connect In-DB tool or the Apache Spark Code tool onto the canvas. Create a new Livy connection using the Apache Spark Direct driver. Use the instructions below to configure the connection.

Configure the Databricks Connection window

To connect to Databricks and create an Alteryx connection string...

Enter your Account ID or Region.
- For Databricks hosted on AWS, enter your Databricks Account ID. You can retrieve the Account ID in the Databricks account console by selecting the down arrow next to your username in the upper right corner.
- For Databricks hosted on Azure, enter your Region. Region is the Azure Data Center location. It can be provided by your Databricks Admin.
Paste the Databricks Token you generated in your Databricks user settings. Tokens can expire and be revoked.
Select Connect. Designer displays a list of Databricks clusters to connect to. If the connection is not successful, try entering your credentials again.
Select a Databricks Cluster to connect to.
Select a SessionType. Select one of these, depending on the code you are writing:
- Scala
- Python
- R
Optionally, enter a descriptive Runname for the job so that you can identify it later. Run names help users distinguish one job from another on the server. The name defaults to Untitled if left blank.
Set the Timeout in number of minutes. This is the number of minutes of non-activity before the job stops. If you enter 15 minutes, the job can sit idle without any activity for 15 minutes before it times out. See Databricks documentation for more information.

Add Libraries in addition to the set of libraries that is already provided to write your own code.

File Type	Description
jar	Java ARchive
egg	Single-file importable distribution format for Python-related projects.
PyPi	Python Package Index is a repository of software for Python.
Maven	A repository for files and artifacts.
CRAN	R File Package

Select the "+" icon to add a row. Select Save to save the library configuration settings to a file. Use the Filefoldericon to locate a saved configuration file. To delete a row, hover over it and then select the trash icon.

In DatabricksConnection, select OK.
In ManageIn-DB Connections, select OK to create an Alteryx connection string.

In this section:

Apache Spark on Databricks

Alteryx Tools Used to Connect

In-database Workflow Processing

Configure the Databricks Connection window

Search results