Apache Spark Direct

Type of Support:	In-Database
Validated On:	Apache Livy 0.3; Apache Spark 1.6, 2.0, 2.1, and 2.2
Distributions Validated On:	Hortonworks 2.6+; Cloudera 5.7+
Connection Type:	REST/HTML server
Server Details:	Apache Livy download information can be found here.

Alteryx tools used to connect

Connect In-DB Tool, Data Stream In Tool, and Apache Spark Code Tool (In-database workflow processing)

Additional Details

Connect to Apache Spark by dragging a Connect In-DB tool or the Apache Spark Code tool onto the canvas. Create a new Livy connection using the Apache Spark Direct driver. Use the instructions below to configure the connection.

Configure the Livy Connection window

To connect to Livy Server and create an Alteryx connection string:

Add a new In-DB connection, setting Data Source to Apache Spark Direct. For more information on setting up an In-DB connection, see Connect In-DB Tool.

On the Read tab, Driver will be locked to Apache Spark Direct. Click the Connection String drop-down arrow and select New database connection.

Configure the Livy Connection window.

Livy Server Configuration: Select your security preference:

Optionally test the connection:

Select the Apache Spark Version used on your cluster.
Select the Kerberos connection type.
Click Test.

Set the Connection Mode to the coding language to use in the Apache Spark Code tool.

Advanced Options

Apache Spark version	Value
2.0 - 2.1	com.databricks:spark-avro_2.11:3.2.0;com.databricks:spark-csv_2.11:1.5.0
2.2	com.databricks:spark-avro_2.11:4.0.0;com.databricks:spark-csv_2.11:1.5.0

Select OK to create your Apache Spark Direct connection.