Python Tool
The Python tool is a code editor for users of Python. Proficiency in Python is recommended before using this tool.
from ayx import Alteryx
Alteryx.help().
Python support
Designer accepts custom Python code. Alteryx does not provide support for custom Python code.
Alteryx public gallery compatibility
Planning to publish your workflow to gallery.alteryx.com? You must first apply for an exemption. This restriction does not apply to private instances of Alteryx Server and Alteryx Gallery.
Getting started
The Python tool configuration window interface resembles a Jupyter Notebook. If you are unfamiliar with Jupyter Notebooks, go to Help > User Interface Tour or Help > Notebook Help. For code assistance, see the additional references that are available under the tool's Help option.
Install additional data science packages if needed
The Python tool includes the following more common data science packages:
- ayx: Alteryx API
- geopandas: Extends the data types used by pandas to allow spatial operations on geometric types.
- jupyter: Jupyter metapackage
- matplotlib: Python plotting package
- numpy: NumPy, array processing for numbers, strings, records, and objects
- pandas: Powerful data structures for data analysis, time series, and statistics
- requests: Python HTTP for Humans
- scikit-learn: A set of Python modules for machine learning and data mining
- scipy: SciPy, Scientific Library for Python
- six: Python 2 and 3 compatibility utilities
- SQLAlchemy: Database Abstraction Library
- statsmodels: Statistical computations and models for Python
Additional package installation
Depending on what installation type of Designer you're using, you can install additional packages using Alteryx.installPackages. The example below installs keras.
from ayx import Package
Package.installPackages("keras")
- If you are running Alteryx non-admin, you can install additional Python packages without any special permissions.
- If you are running Alteryx admin, you must first run Alteryx as administrator to install additional Python packages. If you are unable to run Alteryx as administrator, you cannot install additional Python packages.
Connect inputs
The Python tool accepts multiple inputs. Once inputs are connected, you must run the workflow to cache the incoming data streams.
To access an incoming data connection:
- Import the Alteryx library: from ayx import Alteryx
- Access the connection and provide a variable to use a data reference:
- Use the connection name: Alteryx.read("<connection name>")
- Read in all connections and reference the returned 0-index array: Alteryx.read(Alteryx.getIncomingConnectionNames()[<index number>])
-
Run your workflow before beginning to work with the Python tool. Running your workflow caches your data and makes it accessible to the Python tool. Your data is then treated as a pandas data frame. More information about pandas data frames can be found at pandas.pydata.org.
from ayx import Alteryx
data1 = Alteryx.read("#1")
from ayx import Alteryx
data2 = Alteryx.read(Alteryx.getIncomingConnectionNames()[1])
Configure the tool
Run your workflow before beginning to work with the Python tool.
Start development using Interactive mode. That way, all errors, warnings, and print statements will display in the Jupyter Notebook. Use Production mode to improve speed when you have completed development and want to run your code through a standard Python interpreter.
Need to print multibyte characters sets (MBCSs)? See Troubleshooting for steps to print multibyte character sets (MBCSs) in Production mode.
Use interactive mode when developing. This allows you to interact with the incoming data through a Jupyter Notebook without having to re-run the workflow to see the results of your code.
To set interactive mode:
- Click Interactive to set Interactive mode.
- Run the workflow. Alteryx caches a copy of the incoming data is and makes it available to the Python tool.
After making changes upstream, you should re-run the workflow to refresh the cached data. This will ensure the cached data is representative of the actual incoming data.
- The Jupyter shell executed the code in the Jupyter Notebook.
- If your code calls Alteryx.write(), the Jupyter shell sends the results through the output anchors.
- The Jupyter Notebook displays any errors, warnings, and print statements. This is the same as selecting Run All.
In Production mode, Alteryx consolidates all Python cells from the Jupyter Notebook into a single, read-only script. Alteryx uses this read-only script to pass your code to the Python interpreter.
To set production mode:
- Click Production to set Production mode.
- Run the workflow. Alteryx bypasses the Jupyter shell and runs the read-only script through a standard python interpreter. No results, errors, warnings, or print statements are printed to the Jupyter Notebook.
To edit the Production-mode script:
- Click Interactive mode and then edit the cells in the Jupyter Notebook. Once your edits are complete, click Production mode.
Set data storage format
The recommended and default back-end storage format is YXDB. Alternatively, you can select SQLite.
To use YXDB storage format:
- Click the Alteryx menu within the tool's configuration window
- Ensure Sqlite is not selected
To use SQLight storage format:
- Click the Alteryx menu within the tool's configuration window
- Select Sqlite
SQLite | YXDB | |
Blob | Not supported | Supported |
Spatial objects | Not supported |
Supports passing spatial objects between the Python tool and other tools. It is helpful to use the metadata tags when creating spatial object outputs from the Python tool. Spatial object columns are loaded into the pandas DataFrame as strings containing geojson. In order to send a spatial column (represented as geojson strings) back out of the Python tool, the optional third parameter of `Alteryx.write()` must be included, specifying the type as SpatialObj. (eg, `Alteryx.write(df, 1, {"my_spatial_field": {"type": "SpatialObj"}})`) In addition, several packages such as `geopandas` and its dependencies are included and can be imported into the Python tool to make further use of spatial data. |
Column limitation | Limit is 2000 | No limitation |
Null values note | Numeric/byte columns containing null values will be converted to a data type of float64 - double precision float. | YXDB supports null values in float64 using numpy.NaN and in integer types using pandas nullable integers. |
If you are not changing the arrangement of the rows or using GeoSpatial python, Alteryx recommends that you slice the GeoSpatial data off the dataset and rejoin it after the Python tool. The reason for this is the conversion to and from Alteryx Binary to GeoSpatial text is not speedy.
Import a file or directory
Depending on how much control you want over relative paths, you can use the Alteryx import function from the Alteryx menu or use the import command. You can import an existing Python script or Jupyter Notebook using the Alteryx import function. If you want to manage relative paths, use the import command in the cell. Import examples include using the import command to import a directory, or using the Alteryx import function to import a single script.
To import a Python script or Jupyter Notebook
- Click the Alteryx menu and then select Import Script.
- Click Choose File and then navigate to a
.py
or.ipynb
file. - Click Import.
Alteryx imports the file.
Use the Kernel menu
- Stop processing: Click the Kernel menu and then select Interrupt to stop processing.
- Restart processing: Click the Kernel menu and then select a Restart option to restart the processing of the interactive environment.
- Restart processing: Click the Kernel menu and then select Reconnect to clear the workbook of intermediate results.
- Change kernel does not provide functionality.
- It is recommended that you do not select Shutdown.
Output data from the tool
Use Alteryx.write to output data from the tool.
- To send data to other tools on the canvas, use Alteryx.write(<pandas data frame>, <output anchor number>).
Alteryx.write(df,1)
- Alteryx.write only accepts pandas data frames. If you have data in another format, use the pandas library to convert it to a pandas data frame. The pandas library is pre-installed with Designer and can be accessed in the Jupyter Notebook using import pandas.
- You can send up to five data frames to the output anchors.
Best practices
The following best practices will help you work with the Python tool successfully.
Use the Alteryx.getWorkflowConstant when referring to a workflow constant such as Engine.WorkflowDirectory. Otherwise, the result or output of the command permanently replaces the command in your Jupyter Notebook when you run your code. Avoid using % wrappers in workflow constants. For example, to call the Engine.WorkflowDirectory, use the following:
from ayx import Alteryx
Alteryx.getWorkflowConstant("Engine.WorkflowDirectory")
Troubleshooting
If you notice code is disappearing from Python tools in previously saved workflows when they are opened in Alteryx Designer, this may be caused by Auto Configure being disabled in User Settings. To resolve this issue:
- Enable Auto Configure: To do this, visit User Settings under Options.Navigate to Edit User Settings > Advanced and then deselect Disable Auto Configure. See: User Settings.
Or, - After opening the workflow, run the workflow without clicking on any of the Python tools. This will populate the Python tools with the pre-existing Python code.
To print multibyte character sets (MBCSs) in Production mode:
- Click Interactive mode.
- Type the following into a cell near the top of the Jupyter Notebook:
import sys import codecs if sys.stdout.encoding != 'UTF-8': sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer, 'strict') if sys.stderr.encoding != 'UTF-8': sys.stderr = codecs.getwriter('utf-8')(sys.stderr.buffer, 'strict')
- Click Production mode.
- Run the workflow.
Use the following code sample to help you get started using spatial data
import geopandas as gpd
from shapely.geometry import shape
from json import loads
# read in dataframe containing a geospatial column called "spatial"df = Alteryx.read("#1")
# convert spatial column from geojson strings into shapely objects
df["spatial"] = df["spatial"].apply(lambda x: None if x is None else shape(loads(x)))
# create a geopandas GeoDataFrame specifying the spatial column as the geometry
gdf = gpd.GeoDataFrame(df, geometry="spatial")