Input Data Tool

Use the Input Data tool to bring data into your workflow by connecting to a file or database.

Configure the Tool

The Input Data Configuration window has 2 sections: Connect a File or Database and Options.

Connect a File or Database

The below steps describe the process with Data Connection Manager (DCM) disabled. When enabled, the Connect a File or Database dropdown is replaced with a Set Up a Connection button opening the Data Connections window displaying only data sources supported by DCM, and opening DCM by selecting a technology.

With the Input Data tool on the canvas, follow these steps:

In the Configuration window, select the Connect a File or Database dropdown.
Designer displays the Data connections window. Configure your data connection using one of these: Recent, Saved, Files, Data Sources, or Server.

An Output Data tool can be converted to an Input Data Tool.

Files

To connect to a file in a local or network directory, there are 4 options:

In File connections, click Select file to browse to a file,
Drag and drop a file onto File connections,
In All supported file types, click a file type extension to browse to a file of that type, or
Close Data connections and drag a file directly onto the canvas.

Select multiple files

In the file browse window, type a wildcard as part of the file path.

Consider a case where you have multiple data files with both:

Multiple files are read using the wildcard format such as *.csv or 2019*.csv in a single Input Tool, as long as the files all contain the same number of fields, and that the data types for each field are the same. Designer will set the number of fields and the file types based on the first file read. Any subsequent files that do not match will be skipped and a warning will be displayed. It is not possible to control which file will be read first when using a wildcard syntax like *.csv; it is up to the system which file will be designated as the first.
similar names in the same directory

Type the file name they have in common and add an * to substitute all subsequent characters or a ? to substitute one character. Remember to include the file extension that is common to all files when specifying the file names.

This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD.

data\datafiles\ABCD*.csv

It would bring in ABCD_4.csv and ABCD_012.csv.

This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD_ + 1 character.

data\datafiles\ABCD_?.csv

Supported File Types

Alteryx Calgary	.cydb
Alteryx Database	.yxdb
Alteryx Spatial Zip	.sz
Avro	.avro
Comma Separated Values	.csv
dBase	.dbf
ESRI Personal GeoDatabase	.mdb
ESRI Shapefile	.shp
Extensible Markup Language	.xml
Flat ASCII	.flat
GIS	.grc, .grd
Google Earth/Google Maps	.kml
Gzip	.gz, .tgz
IBM SPSS	.sav
JSON	.json
MapInfo Professional Interchange Format	.mif
MapInfo Professional Table	.tab
Microsoft Access 2000-2003	.mdb
Microsoft Access 2007, 2010, 2013, 2016	.accdb
Microsoft Excel Binary	.xlsb
Microsoft Excel 1997-2003	.xls
Microsoft Excel	.xlsx
Microsoft Excel Macro-Enabled	.xlsm
QlikView	.qvx
SAS	.sas7bdat
SQLite	.sqlite
SRC Geography	.geo
Text	.txt, .asc
Zip	.zip

Data Sources

Data sources displays supported and frequently used data sources.

Tools - If you select Quick connect for a tool you have not installed, a browser opens to the Alteryx gallery for you to download and install that tool. Read the instructions on the page carefully. Once the tool is installed, the Input Data tool will change on the canvas to the tool you selected from the Data sources tab.
Data sources
- ODBC launches the ODBC connection window that displays a filtered list of DSNs on the system that use that particular driver.
- OleDB launches the native Windows OleDB manager.
- OCI launches the Native Oracle OCI connection manager. From here, select the Net Service Name as defined in your tnsnames.ora file that you want to use for this connection as well as the username and password credentials.
- Bulk opens a special dialog allowing you set up a bulk connection for the selected connection type.
- Quick connect: For SQL or Oracle Quick connect - You can either use a pre-existing saved connection, or you can create a new saved connection. Refer to the following for details:
  - SQL
  - Oracle
- All other Quick connections are connections using another tool.

Hadoop

Click Quick connect under HDFS to create a new Hadoop database connection.

Alteryx connects to a Hadoop Distributed File System and reads .csv and .avro files. All Hadoop distributions implementing the HDFS standard are supported.

Configure HDFS connections

HDFS can be read using httpfs (port 14000), webhdfs (port 50070), or Knox Gateway (8443). Consult with your Hadoop administrator for which to use. If you have a Hadoop High Availability (HA) cluster, your Hadoop admin must explicitly enable httpfs.

MapR may not support webhdfs.

In the HDFS Connection window:

Select a server configuration: HTTPFS, WebHDFS, or Knox Gateway.
Host: Specify the installed instance of the Hadoop server. The entry must be a URL or IP address.
Port: Displays the default port number for httpfs (14000), webhdfs (50070), or Knox Gateway (8443), or enter a specific port number.
URL: The URL defaults based on the Host. The URL can be modified.
User Name: Depending on the cluster setup, specify the user name and password for access.

httpfs: A user name is needed, but it can be anything.
webhdfs: The user name is not needed.
Knox Gateway: A user name and password is needed.

Self-signed certificates are not supported in Alteryx. Use a trusted certificate when configuring Knox authentication.

Kerberos: Select a Kerberos authentication option for reading and writing to HDFS. The option you choose depends on how your IT admin configured the HDFS server:
- None: No authentication is used.
- Kerberos MIT: Alteryx uses the default MIT ticket to authenticate with the server. You must first acquire a valid ticket using the MIT Kerberos Ticket Manager.
- Kerberos SSPI: Alteryx uses Windows Kerberos keys for authentication, which are obtained when logging in to Windows with your Windows credentials. The User Name and Password fields are therefore not available.
(Recommended) Click Test to test the connection.
Click OK.
Specify the path of the file (for example, path/to/file.csv), or browse to the file and select it.
Select the Avro or CSV file format and click OK.

To connect to HDFS for in-database processing, use the Connect In-DB Tool.

Duplicate Column Names

If your input file contains multiple columns with the same name, Designer automatically renames the duplicate columns according to these rules:

Duplicate Name Ends with 1 or 9

If the last character in the duplicate column name is either 1 or 9, Designer appends an underscore (_) and a number, starting with 2 to the duplicate column name.

Original Column Name	Duplicate Column (Renamed by Designer)
A1	A1_2
A9	A9_2

Duplicate Name Ends with a Digit Between 2-8 (Inclusive)

If the last character in the duplicate column is a digit between 2 and 8 (inclusive), Designer increments that digit to rename the duplicate column name.

However, if the 2nd to last character is also a digit, Designer appends an underscore (_) and a number, starting with 2 to the duplicate column name.

Original Column Name	Duplicate Column (Renamed by Designer)
A2	A3
A5	A6
A22	A22_2

Duplicate Name Ends with a Letter or Special Character

If the last character in the duplicate column is a letter or special character, Designer adds a number (starting with 2) to rename the duplicate column name.

Original Column Name	Duplicate Column (Renamed by Designer)
age	age2
registered?	registered?2

Note

Visual Query Builder cannot display multi byte characters correctly. Use the tables tab instead.

Adobe	Adobe Analytics
Amazon	Amazon Athena
	Amazon Aurora
	Amazon Redshift
	Amazon S3
Apache	Cassandra
	Hadoop Distributed File System (HDFS)
	Hive
	Spark
Cloudera	Impala
	Hadoop Distributed File System (HDFS)
	Hive
Databricks	Databricks
ESRI	ESRI GeoDatabase
Exasolution	EXASOL
Google	Google Analytics
	Google BigQuery
Hortonworks	Hadoop Distributed File System (HDFS)
	Hive
IBM	IBM DB2
	IBM Netezza
Marketo	Marketo
MapR	Hadoop Distributed File System (HDFS)
	Hive
Microsoft	Microsoft Analytics Platform System
	Microsoft Azure Data Lake Store
	Microsoft Azure SQL Data Warehouse
	Microsoft Azure SQL Database
	Microsoft Cognitive Services
	Microsoft OneDrive
	Microsoft SharePoint
	Microsoft SQL Server
MongoDB	MongoDB
MySQL	MySQL
NetSuite	NetSuite
Oracle	Oracle
Pivotal	Pivotal Greenplum
PostgreSQL	PostgreSQL
Salesforce	Salesforce
SAP	SAP Hana
Snowflake	Snowflake
Teradata	Teradata
	Teradata Aster
Vertica	Vertica