Skip to main content

In-Database Overview

In-database processing enables blending and analysis against large sets of data without moving the data out of a database, which can provide significant performance improvements over traditional analysis methods that require data to be moved to a separate environment for processing.

Performing analysis in the database can save processing time. Using Designer as the interface, in-database workflows integrate easily with standard workflows for additional data blending and analysis.

Limitations

Spatial objects are currently not supported with In-Database tools.

If you have the same workflow open multiple times, change one of them, and then run them all, you might experience unexpected behavior. Close all versions of the same workflow before changing it.

Supported Databases

Driver Support

  • In-database processing requires 64-bit Alteryx with 64-bit database drivers.

  • To run workflows on Alteryx Server, the ODBC driver must be configured as a System DSN. For in-database processing, the Connection Type must be "System" in addition to the ODBC connection being configured as a System DSN.

Go to Supported Data Sources and File Formats for the full list of data platforms supported by Alteryx.

In-Database Tools

Review the below lists of In-Database tools.

Boosted Model Tool Icon
Decision Tree Tool Icon
Forest Model Tool Icon
Linear Regression Tool Icon
Logistic Regression Icon
Score Tool

Change Tool Versions

When a predictive tool with in-database support is placed on the canvas with another In-DB tool, the predictive tool automatically changes to the In-DB version. To manually change the version of the tool...

  1. Right-click the tool.

  2. Point to Choose Tool Version.

  3. Select a different version of the tool.

Go to Predictive Analytics for more about predictive in-database support.

In-Database Tool Anchors and Connections

Since in-database workflow processing occurs within the database, the In-Database tools are not compatible with the standard Alteryx tools. Several visual indicators show connection compatibility.

Visualization of Standard tool connection.

Standard tools use a green arrow anchor to connect to another tool. The connection displays as a single line.

Visualization of In-Database tool connection.

In-Database tools use a blue square database anchor to connect to another In-Database tool. The connection between 2 In-Database tools displays as a double line. Due to the nature of in-database processing, connection progress is not displayed.

To connect standard tools to In-Database tools, use the Dynamic Input and Output tools or the Data Stream tools.

How In-Database Works

Data is streamed into and out of an in-database workflow using the Data Stream In and Data Stream Out tools, or by connecting directly to a database using the Connect In-DB tool. The Data Stream In and Data Stream Out tools use an In-DB anchor to connect to In-DB tools, and a standard workflow anchor to connect to standard workflow tools.

Chart showing many different connections

How It Works

  1. Define the connection to the database using the Connect In-DB tool, or use the Data Stream In tool to stream data from a standard workflow into a temporary table in the database.

  2. Connect other In-Database tools to the workflow to process the data.

  3. Use the Write Data In-DB tool to create or update a table in the database, or use the Data Stream Out tool to stream the In-DB results to a workflow for standard processing.

  4. Run the workflow to process the data in the data platform. No results are returned to the Alteryx Engine until the complete in-database workflow has been processed.

In-Database Performance Benefits

Workflow processing can take longer when a large amount of data is streamed in and out of a database. You can use in-database processing to speed up a workflow.

For example, in a standard workflow, a large database table is pulled into memory to be joined with a small spreadsheet. The majority of run time is spent streaming in the database records. In an In-Database workflow, the small spreadsheet is streamed into the database, reducing run time substantially.

Frequently Asked Questions

Write privileges are required to create a table in the database.

The temporary tables are deleted at the end of the run. If Alteryx crashes while the Data Stream In tool is being run, then the next time that an in-database workflow is run, all temp files created by Alteryx in the database in the previous three days are cleaned out.

To allow visibility of a temporary table across sessions, Alteryx must create a permanent table that is eventually deleted at the end of a workflow. It is necessary to have CREATE permissions to stream in data from a database and write data to a database. The exception is Microsoft SQL Server.

The underlying rules are maintained during the process the same as with the database connections via the standard Input Data and Output Data tools. If there is a database timeout or if there is a limit to the number of queries per day that you can run, it will affect your connection to the database.

A SELECT statement is triggered by the Connect In-DB tool and additional queries are created by downstream tools and nested within this query. The addition of one of the following three tools completes the query and sends it to the underlying database: Write In-DB, Data Stream Out, and Browse In-DB.

You can input your own SQL statement in the Query box for the Connect In-DB tool, which also gets embedded within the SELECT statement.

The SQL query for the underlying database is triggered at runtime for each Browse In-DB, Data Stream Out, or Write Data In-DB tool.

The Browse Data In-DB tool can be configured to cache the data as a .yxdb file when the workflow is run.

Once the data is cached, if the workflow is re-run and the database connection or query (including the number of records to browse) has not changed, the query will not be re-run. Instead, the data will be pulled from the cache.

An output message indicates whether or not the data was cached. Clicking the link will open the data results as a .yxdb file in a separate window.

The in-database cache is used any time a workflow is re-run without changes to upstream tools. Making a change to any upstream tool will trigger a new query and a new cache will be created.

No, the “Browse first [100] records” option only limits the number of records displayed in the Browse In-DB tool. Other tools in the workflow will process the number of records that pass through at any given point.

The field has CLOB/LOB datatype and will not work with most of the comparison operators in the Filter or Formula tools. The error reflects that no columns are returned, even when the data matches the comparison. This is expected behavior with SQL and Oracle, as they do not support comparisons with LOB data