Apache Spark Code Tool

The Apache Spark Code tool is a code editor that creates an Apache Spark context and executes Apache Spark commands directly from Designer. This tool uses the R programming language.

For additional information, see Apache Spark Direct, Apache Spark on Databricks, and Apache Spark on Microsoft Azure HDInsight.

Connect to Apache Spark

Connect to your Apache Spark cluster directly.

Drag a Connect In-DB Tool or Data Stream In Tool onto the canvas.
Click the Connection Name drop-down arrow and select Manage connection.

Alternatively, connect directly with the Apache Spark Code tool.

Drag the Apache Spark Code tool onto the canvas.
Under Data Connection, click the Connection Name drop-down arrow and select Manage connection.

Both methods bring up the Manage In-DB Connections window.

In Manage In-DB Connections , select a Data Source. See Supported Data Sources and File Formats.

Code Editor

With an Apache Spark Direct connection established, the Code Editor activates.

Use Insert Code to generate template functions in the code editor.

Scala

Import Library creates an import statement.

import package

Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

val dataFrame = readAlteryxData(1)

Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

writeAlteryxData(dataFrame, 1)

Log Message creates a logAlteryxMessage function to write a string to the log as a message.

logAlteryxMessage("Example message")

Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

logAlteryxWarning("Example warning")

Log Error creates a logAlteryxError functions to write a string to the log as an error.

logAlteryxError("Example error")

Python

Import Library creates an import statement.

from module import library

Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

dataFrame = readAlteryxData(1)

Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

writeAlteryxData(dataFrame, 1)

Log Message creates a logAlteryxMessage function to write a string to the log as a message.

logAlteryxMessage("Example message")

Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

logAlteryxWarning("Example warning")

Log Error creates a logAlteryxError functions to write a string to the log as an error.

logAlteryxError("Example error")

Import Library creates an import statement.

library(jsonlite)

Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.

dataFrame <- readAlteryxData(1)

Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.

writeAlteryxData(dataFrame, 1)

Log Message creates a logAlteryxMessage function to write a string to the log as a message.

logAlteryxMessage("Example message")

Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.

logAlteryxWarning("Example warning")

Log Error creates a logAlteryxError functions to write a string to the log as an error.

logAlteryxError("Example error")

Use Import Code to pull in code created externally.

From File opens a File Explorer to browse to your file.
From Jupyter Notebook opens a File Explorer to browse to your file.
From URL provides a field to type or paste a file location.

Click the gear icon to change cosmetic aspects of the code editor.

Use the Text Size buttons to increase or decrease the size of the text in the editor.
Use Color Theme to toggle between a dark and light color scheme.
Select Wrap Long Lines causes long lines to remain visible within the code editor window instead of requiring a horizontal scroll.
Select Show Line Numbers to see line numbers for the editor.

Output Metainfo

Select the output channel metainfo you want to manage.

Manually change the Apache Spark Data Type of existing data.

Click the plus icon to add a data row.

Type the Field Name.
Select the Apache Spark Data Type.
Type the Size in bits.