Input Data Tool
Use the Input Data tool to bring data into your workflow by connecting to a file or database.
Configure the Tool
The Input Data Configuration window has 2 sections: Connect a File or Database and Options.
Connect a File or Database
The below steps describe the process with Data Connection Manager (DCM) disabled. When enabled, the Connect a File or Database dropdown is replaced with a Set Up a Connection button opening the Data Connections window displaying only data sources supported by DCM, and opening DCM by selecting a technology.
With the Input Data tool on the canvas, follow these steps:
-
In the Configuration window, select the Connect a File or Database dropdown.
-
Designer displays the Data connections window. Configure your data connection using one of these: Recent, Saved, Files, Data Sources, or Server.
The Recent connections section contains recently configured files and data connections. You can select a recent connection or select Clear list to delete recent connections.
Select a Saved connection. To rename and edit your connections, use Manage Data Connections.
To connect to a file in a local or network directory, there are 4 options:
- In File connections, click Select file to browse to a file,
- Drag and drop a file onto File connections,
- In All supported file types, click a file type extension to browse to a file of that type, or
- Close Data connections and drag a file directly onto the canvas.
In the file browse window, type a wildcard as part of the file path.
Consider a case where you have multiple data files with both:
- Multiple files are read using the wildcard format such as *.csv or 2019*.csv in a single Input Tool, as long as the files all contain the same number of fields, and that the data types for each field are the same. Designer will set the number of fields and the file types based on the first file read. Any subsequent files that do not match will be skipped and a warning will be displayed. It is not possible to control which file will be read first when using a wildcard syntax like *.csv; it is up to the system which file will be designated as the first.
- similar names in the same directory
Type the file name they have in common and add an * to substitute all subsequent characters or a ? to substitute one character. Remember to include the file extension that is common to all files when specifying the file names.
This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD.
data\datafiles\ABCD*.csv
It would bring in ABCD_4.csv and ABCD_012.csv.
This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD_ + 1 character.
data\datafiles\ABCD_?.csv
On Select Excel Input, select one of the Excel inputs:
- Select a sheet to choose from the sheets available in the excel file.
- Select a named range to choose from the named ranges available in the excel file.
- Import only the list of sheet names to create output with a single column containing sheet names as values.
None of the related data is output when this option is selected.
Known issues
The Access driver reads !!! as ### and both ,,, and ... as ___. This can impact the sheet names and named ranges in an Excel file pulled into Designer.
See Gzip File Support and Zip File Support for more information.
When you select a .gz, .tgz or .zip file, the Select file type to extract window displays. Use the drop down to select the type of files to use.
Select files: Click the check box next to the file you want to extract.
To see all files in the Gzip or Zip file, including files that are not supported by Alteryx, select Other Files under Select file type to extract. Select a file type to Parse other files as.
Important
Gzip and Zip files are not supported in Alteryx Gallery.
Alteryx Calgary | .cydb |
Alteryx Database | .yxdb |
Alteryx Spatial Zip | .sz |
Avro | .avro |
Comma Separated Values | .csv |
dBase | .dbf |
ESRI Personal GeoDatabase | .mdb |
ESRI Shapefile | .shp |
Extensible Markup Language | .xml |
Flat ASCII | .flat |
GIS | .grc, .grd |
Google Earth/Google Maps | .kml |
Gzip | .gz, .tgz |
IBM SPSS | .sav |
JSON | .json |
MapInfo Professional Interchange Format | .mif |
MapInfo Professional Table | .tab |
Microsoft Access 2000-2003 | .mdb |
Microsoft Access 2007, 2010, 2013, 2016 | .accdb |
Microsoft Excel Binary | .xlsb |
Microsoft Excel 1997-2003 | .xls |
Microsoft Excel | .xlsx |
Microsoft Excel Macro-Enabled | .xlsm |
QlikView | .qvx |
SAS | .sas7bdat |
SQLite | .sqlite |
SRC Geography | .geo |
Text | .txt, .asc |
Zip | .zip |
Data sources displays supported and frequently used data sources.
- Tools - If you select Quick connect for a tool you have not installed, a browser opens to the Alteryx gallery for you to download and install that tool. Read the instructions on the page carefully. Once the tool is installed, the Input Data tool will change on the canvas to the tool you selected from the Data sources tab.
- Data sources
- ODBC launches the ODBC connection window that displays a filtered list of DSNs on the system that use that particular driver.
- OleDB launches the native Windows OleDB manager.
- OCI launches the Native Oracle OCI connection manager. From here, select the Net Service Name as defined in your tnsnames.ora file that you want to use for this connection as well as the username and password credentials.
- Bulk opens a special dialog allowing you set up a bulk connection for the selected connection type.
- Quick connect: For SQL or Oracle Quick connect - You can either use a pre-existing saved connection, or you can create a new saved connection. Refer to the following for details:
- All other Quick connections are connections using another tool.
Click Quick connect under HDFS to create a new Hadoop database connection.
Alteryx connects to a Hadoop Distributed File System and reads .csv and .avro files. All Hadoop distributions implementing the HDFS standard are supported.
HDFS can be read using httpfs (port 14000), webhdfs (port 50070), or Knox Gateway (8443). Consult with your Hadoop administrator for which to use. If you have a Hadoop High Availability (HA) cluster, your Hadoop admin must explicitly enable httpfs.
MapR may not support webhdfs.
In the HDFS Connection window:
- Select a server configuration: HTTPFS, WebHDFS, or Knox Gateway.
- Host: Specify the installed instance of the Hadoop server. The entry must be a URL or IP address.
- Port: Displays the default port number for httpfs (14000), webhdfs (50070), or Knox Gateway (8443), or enter a specific port number.
- URL: The URL defaults based on the Host. The URL can be modified.
- User Name: Depending on the cluster setup, specify the user name and password for access.
- httpfs: A user name is needed, but it can be anything.
- webhdfs: The user name is not needed.
- Knox Gateway: A user name and password is needed.
- Kerberos: Select a Kerberos authentication option for reading and writing to HDFS. The option you choose depends on how your IT admin configured the HDFS server:
- None: No authentication is used.
- Kerberos MIT: Alteryx uses the default MIT ticket to authenticate with the server. You must first acquire a valid ticket using the MIT Kerberos Ticket Manager.
- Kerberos SSPI: Alteryx uses Windows Kerberos keys for authentication, which are obtained when logging in to Windows with your Windows credentials. The User Name and Password fields are therefore not available.
- (Recommended) Click Test to test the connection.
- Click OK.
- Specify the path of the file (for example,
path/to/file.csv
), or browse to the file and select it. - Select the Avro or CSV file format and click OK.
Self-signed certificates are not supported in Alteryx. Use a trusted certificate when configuring Knox authentication.
To connect to HDFS for in-database processing, use the Connect In-DB Tool.
You can also make a Generic connection or a 32-bit connection to databases.
Before you connect to a database, consider the following:
- Both ODBC and OleDB connection types support spatial connections. Alteryx auto-detects if a database supports spatial functionality and displays the required configurations.
- To connect to a database for in-database processing, see In-Database Overview.
Point to an option and click a saved or shared data connection to connect it, or click Manage to view and edit connections.
All Connections: Displays a list of connections saved to your computer plus connections shared with from a Gallery.
My Computer: Displays a list of connections saved to your computer.
Server: Displays a list connections shared with you from a Server.
Add a Server: Opens the Server Sign-in screen. Use your user name and password to log in. After logging in, return to Saved Data Connections and point to the Server in the list to view connections shared from the Server.
See Manage data connections for more on managing saved and shared data connections and troubleshooting.
Adobe | Adobe Analytics |
Amazon | Amazon Athena |
Amazon Aurora | |
Amazon Redshift | |
Amazon S3 | |
Apache | Cassandra |
Hadoop Distributed File System (HDFS) | |
Hive | |
Spark | |
Cloudera | Impala |
Hadoop Distributed File System (HDFS) | |
Hive | |
Databricks | Databricks |
ESRI | ESRI GeoDatabase |
Exasolution | EXASOL |
Google Analytics | |
Google BigQuery | |
Hortonworks | Hadoop Distributed File System (HDFS) |
Hive | |
IBM | IBM DB2 |
IBM Netezza | |
Marketo | Marketo |
MapR | Hadoop Distributed File System (HDFS) |
Hive | |
Microsoft | Microsoft Analytics Platform System |
Microsoft Azure Data Lake Store | |
Microsoft Azure SQL Data Warehouse | |
Microsoft Azure SQL Database | |
Microsoft Cognitive Services | |
Microsoft OneDrive | |
Microsoft SharePoint | |
Microsoft SQL Server | |
MongoDB | MongoDB |
MySQL | MySQL |
NetSuite | NetSuite |
Oracle | Oracle |
Pivotal | Pivotal Greenplum |
PostgreSQL | PostgreSQL |
Salesforce | Salesforce |
SAP | SAP Hana |
Snowflake |
|
Teradata | Teradata |
Teradata Aster | |
Vertica | Vertica |
Server displays each Server and its URL that has been added on the local computer. A list below each Server name contains the saved connections stored on the server you have access to.
Click + Server to add another Server connection.
Select file format options. Options vary based on the file or database to which you connect. See File Format Options.
Preview the data layout in the Preview window. The data layout preview is limited to 100 records, or fewer in wide files. If data contains 1500 columns, at least one record and up to two rows are displayed. To view all data, use a Browse Tool.
If your input file contains multiple columns with the same name, Designer automatically renames the duplicate columns according to these rules:
Duplicate Name Ends with 1 or 9
If the last character in the duplicate column name is either 1 or 9, Designer appends an underscore (_) and a number, starting with 2 to the duplicate column name.
Original Column Name | Duplicate Column (Renamed by Designer) |
---|---|
A1 | A1_2 |
A9 | A9_2 |
Duplicate Name Ends with a Digit Between 2-8 (Inclusive)
If the last character in the duplicate column is a digit between 2 and 8 (inclusive), Designer increments that digit to rename the duplicate column name.
However, if the 2nd to last character is also a digit, Designer appends an underscore (_) and a number, starting with 2 to the duplicate column name.
Original Column Name | Duplicate Column (Renamed by Designer) |
---|---|
A2 | A3 |
A5 | A6 |
A22 | A22_2 |
Duplicate Name Ends with a Letter or Special Character
If the last character in the duplicate column is a letter or special character, Designer adds a number (starting with 2) to rename the duplicate column name.
Original Column Name | Duplicate Column (Renamed by Designer) |
---|---|
age | age2 |
registered? | registered?2 |
Note
Visual Query Builder cannot display multi byte characters correctly. Use the tables tab instead.
You can convert the Input Data tool to a Macro Input Tool or a Dynamic Input Tool. You can undo this change if you have enough undo levels set in your User Settings.
To convert the Input Data tool to a Macro Input Tool or a Dynamic Input Tool.
- Right-click the Input Data tool in your workflow.
- Select Convert To Macro Input or Convert to Dynamic Input.
- Configure the tool.
You can now use the Input Data tool as a Macro Input tool or a Dynamic Input tool.
To use classic mode:
- Click Options > User Settings > Edit User Settings.
- On the Defaults tab, select the checkbox Use classic mode for the Input/Output tool menu options.
- Click OK.
- Click on the canvas, or press F5 to refresh.
You can now use the Input Data Tool classic mode to select your files and data sources.