Apache Spark on Microsoft Azure HDInsight

Use these instructions to learn how to connect to Microsoft Azure HDInsight and create an Alteryx connection string.

Type of Support:	In-Database
Validated On:	Apache Spark 2.0+
Distributions Validated On:	Microsoft Azure HDInsight
Connection Type:	REST/HTML server
Server Details:	Microsoft Azure information can be found here.

Alteryx tools used to connect

Connect In-DB Tool, Data Stream In Tool, and Apache Spark Code Tool (in-database workflow processing)

Additional Details

Using the Microsoft Azure HDInsight Connection window, create a new connection to Microsoft Azure HDInsight using the Microsoft Azure HDInsight option. Use the instructions below to configure the connection.

Configure the Microsoft Azure HDInsight Connection window

To connect to Microsoft Azure HDInsight and create an Alteryx connection string:

Add a new In-DB connection, setting Data Source to Apache Spark on Microsoft Azure HDInsight. For more information on setting up an In-DB connection, see Connect In-DB Tool.
On the Read tab, the Driver is set to Apache Spark on Microsoft Azure HDInsight. Click the Connection String drop-down arrow and select New database connection.
Configure the Microsoft Azure HDInsight Connection window.

Microsoft Azure HDInsight Configuration:

Configure the Azure URL.

Click Test to test the connection.
Set the Connection Mode to the coding language to use in the Apache Spark Code tool.
Connect to your Microsoft Azure storage account.

Advanced Options

Apache Spark version	Value
2.0 - 2.1	com.databricks:spark-avro_2.11:3.2.0;com.databricks:spark-csv_2.11:1.5.0
2.2	com.databricks:spark-avro_2.11:4.0.0;com.databricks:spark-csv_2.11:1.5.0

Click OK to create your Apache Spark on Microsoft Azure HDInsight connection.