Skip to main content

Jupyter Flow

Laboratory Tool

This is a Laboratory tool and isn't for production use. It might have documented known issues, might not be feature complete, and is subject to change.

Use Jupyter Flow to manage Jupyter notebooks and Python environments in your workflows. The tool allows you to access, create, or modify specific sets of packages. It also allows you to create a data cache that you can reference for debugging.

Important

This tool is not automatically installed with Designer. To use this tool, download it from the Alteryx Community.

Tool Components

The Jupyter Flow tool has 2 kinds of anchors and 6 anchors total.

  • Multi-input anchor: Connect multiple upstream data sources to the multi-input anchor to use the data in your notebook.

  • Output anchors: Use any of the 5 output anchors to pass output data downstream.

Configure the Tool

Before You Use the Tool

Enable long paths on your Windows machine. To learn how to enable long paths, visit the Microsoft Documentation.

To use this tool for the 1st time...

  1. Drag the tool onto the canvas.

  2. Enter the path to the Notebook you want to use, or Browse to find it.

  3. Enter the path to the Packages you want to use, or Browse to find the folder that contains them.

  4. Run the workflow.

Important

Jupyter Flow is only compatible with zip apps created by the Jupyter Flow tool.

Required Syntax

To pass data into and out of Jupyter Flow, make sure your notebooks include the correct syntax. The correct syntax requires that you use certain tags to tell the tool what data you want to input or output.

Important

Jupyter Flow reads in upstream data as Pandas Dataframes, and can only output data as Pandas Dataframes. If you need to convert your data into any different data structures for use in the notebook, you have to convert the data back into Pandas Dataframes before you can pass any output downstream.

Input Data

To use data from an upstream data source when only 1 upstream data source is connected, use the #ayx_input tag, followed by the assignment of the relevant variable on a new line.

#ayx_input
input = pd.DataFrame(data)

To use data from 2 or more upstream data sources, use the #ayx_input= tag—including the name of the data connection in the tag—followed by the assignment of the relevant variables on new lines. If you're not sure what the name of a data connection is, you can select the connection itself on the canvas to view its name.

#ayx_input=#1
input_1 = pd.DataFrame(data_1)

#ayx_input=#2
input_2 = pd.DataFrame(data_2)

Output Data

To pass 1 output downstream through the 1st output anchor, use the #ayx_output tag, followed by the assignment of the relevant variable on a new line.

#ayx_output
output = df['column_1']

To pass 2 or more outputs downstream using any of the 5 output anchors, use the #ayx_output= tag—including the number of the output anchors you want to use as the names—followed by the assignment of the relevant variables on new lines.

#ayx_output=1
output_1 = df['column_1']

#ayx_output=2
output_2 = df['column_2']

Manage Packages

This tool allows you to manage what packages you use in the Jupyter notebooks you run in your workflows.

To specify what packages you want to use in your workflow, you have to include either...

  • The location of the packages you want to use (usually this is a folder called "site-packages") or...

  • A zip app (PYZ file) made by a Jupyter Flow tool that contains the packages you want to use.

If you don't have zip app, the tool creates one for you. Simply make sure to toggle on the switch next to Packages. Then the tool watches for differences between the packages in the site-packages folder and those in the zip app. If the difference is that the site-packages folder exists and a zip app doesn't, the tool creates a zip app with matching packages. If the difference is that the site-packages folder has different packages in it from the zip app, the tool creates a new zip app that matches.

Avertissement

The size of your site-packages folder greatly affects how long the tool needs to build the zip app. Sometimes, if you have a large number of packages in the site-packages folder, this process can take a while. Also, any changes to the site-packages file—no matter how few—require the tool to rebuild the entire zip app.

Advanced

This tool has several advanced options that allow you to have even more control over the Python environments you use in your workflows.

Data Cache Location

This option allows you to choose the location where the data cache related to a notebook is. If you select Auto, the tool creates a data cache in the default location. To create a data cache in a different location, select Custom, and enter the path to that location.

Back Up Data Cache

To preserve the data cache from the most recent run of your notebook, select this option. This option allows you to reference the data cache so you can debug the notebook.

Enter Custom Path to Zip App

If you have a zip app (PYZ file) that you want to use in your workflow, select this option. Remember—if you don't want to make changes to that zip app, you need to make sure to toggle off the switch next to Packages.

Share Workflows That Contain Zip Apps

Before You Share

Disable the toggle next to Packages. If the toggle is on and a user you've shared the workflow with runs it, the tool doesn't use the included zip app. Instead, the tool watches for the site-packages folder and attempts to create a new zip app. This can result in errors because the notebook doesn't have access to the correct packages.

To share your workflows that use Jupyter Flow zip apps, see how to export workflows with assets. Zip apps created by the tool are included as assets.

To share your workflows that use Jupyter Flow zip apps through Server, see how to manage workflow assets in Server.