Apache Spark on Databricks

Version:
2019.3
Last modified: September 26, 2019
Connection Type

REST/HTML server

Distributions Validated On

Databricks

Server Details

Databricks information can be found here.

Type of Support

In-Database

Validated On

Apache Spark 2.0, 2.1, and 2.2

Alteryx tools used to connect

In-database workflow processing

Link
Blue icon with database being plugged in.

Connect In-DB Tool

Link
Blue icon with a stream-like object flowing into a database.

Data Stream In Tool

Link

Apache Spark Code Tool

Connect to Apache Spark by dragging a Connect In-DB tool or the Apache Spark Code tool onto the canvas. Create a new connection to Databricks using the Apache Spark on Databricks driver. Use the instructions below to configure the connection. See Databricks documentation for more information.

 

Configure the Databricks Connection window

Configuration - Databricks Connection window

To connect to Databricks and create an Alteryx connection string:

  1. Enter your Databricks Account ID.
  2. Paste the Databricks Token you generated in your Databricks user settings. Tokens can expire and be revoked.
  3. Click Connect. Designer displays a list of Databricks clusters to connect to. If the connection is not successful, try entering your credentials again.
  4. Select a Databricks Cluster to connect to.
  5. Select a Session Type. Select one of the following depending on the code you are writing:
    • Scala
    • Python
    • R
  6. Optionally, type a descriptive Run name for the job so that you can identify it later. Run names help users distinguish one job from another on the server. The name defaults to Untitled if left blank.
  7. Set the Timeout in number of minutes. This is the number of minutes of non-activity before the job stops. If you enter 15 minutes, the job can sit idle without any activity for 15 minutes before it times out. See Databricks documentation for more information.
  8. Add Libraries in addition to the set of libraries that is already provided to write your own code.

     

    File type

    Description

    jar

    Java ARchive

    egg

    Single-file importable distribution format for Python-related projects

    PyPi

    Python Package Index is a repository of software for Python

    Maven

    a repository for files and artifacts.

    CRAN

    R file package

    Click the “+” icon to add a row. Click Save to save the library configuration settings to a file. Use the File folder icon to locate a saved configuration file. To delete a row, hover over it and then select the trash icon.

  9. In Databricks Connection, click OK .
  10. In Manage In-DB Connections, click OK to create an Alteryx connection string.

 

Was This Helpful?

Need something else? Visit the Alteryx Community or contact support.