Input Data Tool
The Input Data tool brings data in to your workflow by connecting to a file or database.
Use the Input Data tool to connect to the following supported data sources:
|Alteryx Spatial Zip||.sz|
|Apache Hadoop Avro||.avro|
|Comma Separated Values||.csv|
|ESRI Personal GeoDatabase||.mdb|
|MapInfo Professional Interchange||.mid, .mif|
|MapInfo Professional Table||.tab|
|Microsoft Access 2000-2003||.mdb|
|Microsoft Excel 1997-2003||.xls|
|Microsoft Excel 2007, 2010, 2013, 2016||.xlsx|
|Microsoft Excel Macro Enabled||.xlsm|
|Microsoft Office Access 2007, 2010, 2013, 2016||.accdb|
|Hadoop Distrubted File System (HDFS)|
|Hadoop Distributed File System (HDFS)|
|DataStax||DataStax Enterprise, DataStax Community|
|Hortonworks||Hadoop Distrubted File System (HDFS)|
|IBM Netezza/Pure Data Systems|
|MapR||Hadoop Distrubted FIle System (HDFS)|
|Microsoft||Microsoft Azure SQL Data Warehouse|
|Microsoft SQL Server 2008, 2012, 2014, 2016|
|Netsuite||Netsuite Suite Analytics|
Use other tools to connect to other supported data sources. For a complete list of data sources supported in Alteryx, see Supported Data Sources.
Configure the tool
- In the Configuration window, type a file path in Connect a File or Database or select an option:
- Click File to browse to the file to connect to a file in a local or network directory.
- Double-click a file to select it.
- identical table structures, meaning the table contains the same columns, data types, and sheet names
- similar names in the same directory
- Select a sheet to choose from the sheets available in the excel file.
- Select a named range to choose from the named ranges available in the excel file.
- Import only the list of sheet names to create output with a single column containing sheet names as values.
None of the related data is output when this option is selected.
- Select a server configuration: HTTPFS, WebHDFS, or Knox Gateway.
- Host: Specify the installed instance of the Hadoop server. The entry must be a URL or IP address.
- Port: Displays the default port number for httpfs (14000), webhdfs (50070), or Knox Gateway (8443), or enter a specific port number.
- URL: The URL defaults based on the Host. The URL can be modified.
- User Name: Depending on the cluster setup, specify the user name and password for access.
- httpfs: A user name is needed, but it can be anything.
- webhdfs: The user name is not needed.
- Knox Gateway: A user name and password is needed.
- Kerberos: Select a Kerberos authentication option for reading and writing to HDFS. The option you choose depends on how your IT admin configured the HDFS server:
- None: No authentication is used.
- Kerberos MIT: Alteryx uses the default MIT ticket to authenticate with the server. You must first acquire a valid ticket using the MIT Kerberos Ticket Manager.
- Kerberos SSPI: Alteryx uses Windows Kerberos keys for authentication, which are obtained when logging in to Windows with your Windows credentials. The User Name and Password fields are therefore not available.
- (Recommended) Click Test to test the connection.
- Click OK.
- Specify the path of the file (for example,
path/to/file.csv), or browse to the file and select it.
- Select the Avro or CSV file format and click OK.
- Both ODBC and OleDB connection types support spatial connections. Alteryx auto-detects if a database supports spatial functionality and displays the required configurations.
- When connecting to any OleDB or ODBC database, be sure to use the native driver provided by the database vendor.
- The Choose Table or Specify Query Window window opens if the database has multiple tables. You can then select tables and construct queries.
- To connect to a database for in-database processing, see In-Database Overview.
- All Connections: Displays a list of connections saved to your computer plus connections shared with you from a Gallery.
- My Computer: Displays a list of connections saved to your computer.
- Gallery: Displays a list connections shared with you from a Gallery.
- Add a Gallery: Opens the Gallery Login screen. Use your user name and password to log in. After logging in, return to Saved Data Connections and point to the Gallery in the list to view connections shared from the Gallery.
- Select file format options. Options vary based on the file or database to which you connect. See File Format Options.
- Preview the data layout.
You can also connect to a file by clicking and dragging a file on your computer to the Alteryx canvas, adding an Input Data tool connected to the file you selected.
In the file browse window, type a wildcard as part of the file path.
Consider a case where you have multiple data tables with both:
Type the file name they have in common and add an * to substitute all subsequent characters or a ? to substitute one character. Remember to include the file extension that is common to all files when specifying the file names.
This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD.
It would bring in ABCD_4.csv and ABCD_012.csv.
This path brings in every .csv file contained within the data\datafiles directory with a file name that begins with ABCD_ + 1 character.
On Select Excel Input, select one of the Excel inputs:
The Access driver reads !!! as ### and both ,,, and ... as ___. This can impact the sheet names and named ranges in an Excel file pulled into Designer.
Select file type to extract: Use the drop down to select the type of files to display.
Select files: Click the check box next to the file you want to extract.
To see all files in the Gzip or Zip file, including files that are not supported by Alteryx, select Other Files under Select file type to extract. Select a file type to Parse other files as.
GZip and Zip files are not supported in Alteryx Gallery.
Click Microsoft SQL Server to create a new Microsoft SQL Server database connection.
Click Hadoop to create a new Hadoop database connection.
Alteryx connects to a Hadoop Distributed File System and reads .csv and .avro files. All Hadoop distributions implementing the HDFS standard are supported.
HDFS can be read using httpfs (port 14000), webhdfs (port 50070), or Knox Gateway (8443). Consult with your Hadoop administrator for which to use. If you have a Hadoop High Availability (HA) cluster, your Hadoop admin must explicitly enable httpfs.
MapR may not support webhdfs.
In the HDFS Connection window:
Self-signed certificates are not supported in Alteryx. Use a trusted certificate when configuring Knox authentication.
To connect to HDFS for in-database processing, use the Connect In-DB Tool.
Point to Other Databases to create a new database connection to a database other than Microsoft, Oracle, or Hadoop.
Select the database you want to connect to:
Before you connect to a database, consider the following:
Point to an option and select a saved or shared data connection to connect it, or click Manage to view and edit connections.
See Manage Data Connections for more on managing saved and shared data connections and troubleshooting.
For best performance and data integrity, close inputs before you build and run a workflow.