In-Database Overview

In-database processing enables blending and analysis against large sets of data without moving the data out of a database, which can provide significant performance improvements over traditional analysis methods that require data to be moved to a separate environment for processing.

Performing analysis in the database can save processing time. Using Designer as the interface, in-database workflows integrate easily with standard workflows for additional data blending and analysis.

Limitations

Spatial objects are currently not supported with In-Database tools.

If you have the same workflow open multiple times, change one of them, and then run them all, you might experience unexpected behavior. Close all versions of the same workflow before changing it.

Supported Databases

Driver Support

In-database processing requires 64-bit Alteryx with 64-bit database drivers.
To run workflows on Alteryx Server, the ODBC driver must be configured as a System DSN. For in-database processing, the Connection Type must be "System" in addition to the ODBC connection being configured as a System DSN.

In-Database Support	Predictive In-Database Support
Amazon Redshift
Apache Spark ODBC
Apache Spark Direct
Apache Spark on Databricks
Apache Spark on Microsoft Azure HDInsight
Apache Impala
Databricks
Denodo
EXASOL
Google BigQuery
Hive
IBM Netezza
Microsoft Analytics Platform System
Microsoft Azure SQL Database
Microsoft Azure SQL Data Warehouse
Microsoft SQL Server 2008, 2012, 2014, 2016 and 2019	Yes (2016)
MySQL
Oracle	Yes
Pivotal Greenplum
PostgreSQL
SAP HANA
Snowflake
Teradata	Yes
Vertica

Go to Supported Data Sources and File Formats for the full list of data platforms supported by Alteryx.

In-Database Tools

Review the below lists of In-Database tools.

In-Database Standard Tools

Browse In-DB Tool

Connect In-DB Tool

Blue icon with database being plugged in.

Data Stream In Tool

Blue icon with a stream-like object flowing into a database.

Data Stream Out Tool

Blue icon with a stream-like object flowing out of a database.

Dynamic Input In-DB Tool

Blue icon showing an input into a database.

Dynamic Output In-DB Tool

Blue icon showing an output from a database.

Filter In-DB Tool

Blue icon showing light filtering through a prism.

Formula In-DB Tool

Join In-DB Tool

Blue shape filled with two overlapping circles.

Macro Input In-DB Tool

White icon with an arrow pointing to a gear.

Macro Output In-DB Tool

Sample In-DB Tool

Select In-DB Tool

Blue icon with a checkmark selecting one of three options.

Summarize In-DB Tool

Transpose In-DB Tool

Union In-DB Tool

Write Data In-DB Tool

In-Database Predictive Tools

Boosted Model Tool

Decision Tree Tool

Forest Model Tool

Linear Regression Tool

Logistic Regression Tool

Score Tool

Change Tool Versions

When a predictive tool with in-database support is placed on the canvas with another In-DB tool, the predictive tool automatically changes to the In-DB version. To manually change the version of the tool...

Right-click the tool.
Point to Choose Tool Version.
Select a different version of the tool.

Go to Predictive Analytics for more about predictive in-database support.

In-Database Tool Anchors and Connections

Since in-database workflow processing occurs within the database, the In-Database tools are not compatible with the standard Alteryx tools. Several visual indicators show connection compatibility.

Visualization of Standard tool connection.

Standard tools use a green arrow anchor to connect to another tool. The connection displays as a single line.

Visualization of In-Database tool connection.

In-Database tools use a blue square database anchor to connect to another In-Database tool. The connection between 2 In-Database tools displays as a double line. Due to the nature of in-database processing, connection progress is not displayed.

To connect standard tools to In-Database tools, use the Dynamic Input and Output tools or the Data Stream tools.

How In-Database Works

Data is streamed into and out of an in-database workflow using the Data Stream In and Data Stream Out tools, or by connecting directly to a database using the Connect In-DB tool. The Data Stream In and Data Stream Out tools use an In-DB anchor to connect to In-DB tools, and a standard workflow anchor to connect to standard workflow tools.

Chart showing many different connections

How It Works

Define the connection to the database using the Connect In-DB tool, or use the Data Stream In tool to stream data from a standard workflow into a temporary table in the database.
Connect other In-Database tools to the workflow to process the data.
Use the Write Data In-DB tool to create or update a table in the database, or use the Data Stream Out tool to stream the In-DB results to a workflow for standard processing.
Run the workflow to process the data in the data platform. No results are returned to the Alteryx Engine until the complete in-database workflow has been processed.

In-Database Performance Benefits

Workflow processing can take longer when a large amount of data is streamed in and out of a database. You can use in-database processing to speed up a workflow.

For example, in a standard workflow, a large database table is pulled into memory to be joined with a small spreadsheet. The majority of run time is spent streaming in the database records. In an In-Database workflow, the small spreadsheet is streamed into the database, reducing run time substantially.

Frequently Asked Questions

What permissions are required to initiate an in-database workflow using the Connect In-DB tool?

Read privileges are required to access the underlying database.

What permissions are required to output data to the database using the Write In-DB tool?

Write privileges are required to create a table in the database.

What happens to the temporary tables created by the Data Stream In tool?

The temporary tables are deleted at the end of the run. If Alteryx crashes while the Data Stream In tool is being run, then the next time that an in-database workflow is run, all temp files created by Alteryx in the database in the previous three days are cleaned out.

Why can't I create temporary tables?

To allow visibility of a temporary table across sessions, Alteryx must create a permanent table that is eventually deleted at the end of a workflow. It is necessary to have CREATE permissions to stream in data from a database and write data to a database. The exception is Microsoft SQL Server.

Are the underlying rules for a database timeout maintained during an in-database process, or does Alteryx modify or circumvent those?

The underlying rules are maintained during the process the same as with the database connections via the standard Input Data and Output Data tools. If there is a database timeout or if there is a limit to the number of queries per day that you can run, it will affect your connection to the database.

How is the SQL statement created for an in-database workflow?

A SELECT statement is triggered by the Connect In-DB tool and additional queries are created by downstream tools and nested within this query. The addition of one of the following three tools completes the query and sends it to the underlying database: Write In-DB, Data Stream Out, and Browse In-DB.

You can input your own SQL statement in the Query box for the Connect In-DB tool, which also gets embedded within the SELECT statement.

When is the SQL statement processed?

The SQL query for the underlying database is triggered at runtime for each Browse In-DB, Data Stream Out, or Write Data In-DB tool.

How is Browse caching enabled?

The Browse Data In-DB tool can be configured to cache the data as a .yxdb file when the workflow is run.

Once the data is cached, if the workflow is re-run and the database connection or query (including the number of records to browse) has not changed, the query will not be re-run. Instead, the data will be pulled from the cache.

An output message indicates whether or not the data was cached. Clicking the link will open the data results as a .yxdb file in a separate window.

When is the in-database cache used?

The in-database cache is used any time a workflow is re-run without changes to upstream tools. Making a change to any upstream tool will trigger a new query and a new cache will be created.

Does the “Browse first [100] records” option impact the amount of data processed by the workflow?

No, the “Browse first [100] records” option only limits the number of records displayed in the Browse In-DB tool. Other tools in the workflow will process the number of records that pass through at any given point.

What does the error "Inconsistent datatypes: expected - got CLOB" mean?

The field has CLOB/LOB datatype and will not work with most of the comparison operators in the Filter or Formula tools. The error reflects that no columns are returned, even when the data matches the comparison. This is expected behavior with SQL and Oracle, as they do not support comparisons with LOB data

In this section: