You are here:

Spark Code Tool

The Spark Code tool is a code editor that creates a Spark context and executes Spark commands directly from Designer.

For additional information, see Spark Direct.

Connect to Spark

Connect to your Spark cluster directly.

Drag a Connect In-DB Tool or Data Stream In Tool onto the canvas.
Click the Connection Name drop-down arrow and select Manage connection.

Alternatively, connect directly with the Spark Code tool.

Drag the Spark Code tool onto the canvas.
Under Data Connection, click the Connection Name drop-down arrow and select Manage connection.

Both methods bring up the Manage In-DB Connections window.

Add a new In-DB connection, setting Data Source to Spark Direct.

For more information on setting up an In-DB connection, see Connect In-DB Tool.

On the Read tab, Driver will be locked to Spark Direct. Click the Connection String drop-down arrow and select New database connection.

Configure the Livy Connection window.

Livy Server Configuration: Select your security preference:

Optionally test the connection:

Select the Spark Version used on your cluster.
Select the Kerberos connection type.
Click Test.

Set the Connection Mode to the coding language to use in the Spark Code tool.

Advanced Options

Set the Poll Interval (ms), the time between checks from Alteryx for Spark code execution requests. The default is 1,000 ms, or 1 second.

Set the Wait Time (ms), the time that Alteryx waits for execution requests to complete. Operations that take longer than the set wait time result in a time out error. The default is 60,000 ms, or 1 minute.

The Spark Configuration Options customize the created Spark context, and allow advanced users to override the default Spark settings.

By default, the Configuration Option is spark.jars.packages and the Value is com.databricks:spark-csv_2.10:1.5.0,com.databricks:spark-avro_2.10:2.0.1. Depending on your Spark version, you may need to override the default value.

Spark version	Value
2.0 - 2.1	com.databricks:spark-avro_2.11:3.2.0;com.databricks:spark-csv_2.11:1.5.0
2.2	com.databricks:spark-avro_2.11:4.0.0;com.databricks:spark-csv_2.11:1.5.0

Click (+ icon) to add another row to the configuration options table.

Click (save icon) to save the current advanced settings as a JSON file. The file can then be loaded into the advanced settings of another connection.

Click (open icon) to load a JSON file into the configuration options table.

Select OK to create your Spark Direct connection.

Code Editor

With a Spark Direct connection established, the Code Editor activates.

Use Insert Code to generate template functions in the code editor.

Use Import Code to pull in code created externally.

From File opens a File Explorer to browse to your file.
From Jupyter Notebook opens a File Explorer to browse to your file.
From URL provides a field to type or paste a file location.

Click the gear icon to change cosmetic aspects of the code editor.

Use the Text Size buttons to increase or decrease the size of the text in the editor.
Use Color Theme to toggle between a dark and light color scheme.
Select Wrap Long Lines causes long lines to remain visible within the code editor window instead of requiring a horizontal scroll.
Select Show Line Numbers to see line numbers for the editor.