Apache Spark Code Tool
The Apache Spark Code tool is a code editor that creates an Apache Spark context and executes Apache Spark commands directly from Alteryx Designer. This tool uses the R programming language.
For additional information, go to Apache Spark Direct, Apache Spark on Databricks, and Apache Spark on Microsoft Azure HDInsight.
Connect to Apache Spark
Option 1
Connect to your Apache Spark cluster directly.
Drag a Connect In-DB tool or Data Stream In tool onto the canvas.
Select the Connection Name dropdown arrow and select Manage Connection.
Option 2
Alternatively, connect directly with the Apache Spark Code tool.
Drag the Apache Spark Code tool onto the canvas.
Under Data Connection, select the Connection Name dropdown arrow and select Manage connection.
Both methods bring up the Manage In-DB Connections window. In Manage In-DB Connections, select a Data Source.
Code Editor
With an Apache Spark Direct connection established, the Code Editor activates. Use Insert Code to generate template functions in the code editor.
Import Library creates an import statement.
import package
Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.
val dataFrame = readAlteryxData(1)
Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.
writeAlteryxData(dataFrame, 1)
Log Message creates a logAlteryxMessage function to write a string to the log as a message.
logAlteryxMessage("Example message")
Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.
logAlteryxWarning("Example warning")
Log Error creates a logAlteryxError function to write a string to the log as an error.
logAlteryxError("Example error")
Import Library creates an import statement.
from module import library
Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.
dataFrame = readAlteryxData(1)
Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.
writeAlteryxData(dataFrame, 1)
Log Message creates a logAlteryxMessage function to write a string to the log as a message.
logAlteryxMessage("Example message")
Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.
logAlteryxWarning("Example warning")
Log Error creates a logAlteryxError function to write a string to the log as an error.
logAlteryxError("Example error")
Import Library creates an import statement.
library(jsonlite)
Read Data creates a readAlteryxData function to return the incoming data as an Apache SparkSQL DataFrame.
dataFrame <- readAlteryxData(1)
Write Data creates a writeAlteryxData function to output an Apache SparkSQL DataFrame.
writeAlteryxData(dataFrame, 1)
Log Message creates a logAlteryxMessage function to write a string to the log as a message.
logAlteryxMessage("Example message")
Log Warning creates a logAlteryxWarning function to write a string to the log as a warning.
logAlteryxWarning("Example warning")
Log Error creates a logAlteryxError function to write a string to the log as an error.
logAlteryxError("Example error")
Import Code
Use Import Code to pull in code created externally.
From File opens a File Explorer to browse to your file.
From Jupyter Notebook opens a File Explorer to browse to your file.
From URL provides a field to type or paste a file location.
Select the gear icon to change the cosmetic aspects of the code editor.
Use the Text Size buttons to increase or decrease the size of the text in the editor.
Use Color Theme to toggle between a dark and light color scheme.
Select Wrap Long Lines causes long lines to remain visible within the code editor window instead of requiring a horizontal scroll.
Select Show Line Numbers to see line numbers for the editor.
Output Metainfo
Select the output channel metainfo you want to manage. Manually change the Apache Spark Data Type of existing data.
Select the plus icon to add a data row.
Enter the Field Name.
Select the Apache Spark Data Type.
Enter the Size in bits.