Alteryx driver for Microsoft Excel (.xlsx)

Overview

Alteryx has built-in native read and write drivers for Microsoft Excel (.xlsx) to address data loss and performance issues reported with the Microsoft Excel Driver.

By default, all modules with xlsx files will be switched to Microsoft Excel (.xlsx) starting with Alteryx 9.5. To continue to use the existing file format, the user will need to switch it back manually to Microsoft Excel Legacy (*.xlsx).

Features at a glance for the new Microsoft Excel (xlsx) driver include:

Configuration

The Output Options in the Output Data tool available for Microsoft Excel (.xlsx) files via the Alteryx driver provide the same functionality as the Output Options available via the Microsoft Excel Legacy driver.

Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Create New Sheet Create New Table
Overwrite File (Remove) Overwrite File (Remove)
Overwrite Sheet (Drop) Delete Data and Append
Append to Existing Sheet Append Existing

 

Side-by-Side Comparison

General

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Name in Alteryx Microsoft Excel (*.xlsx) Microsoft Excel Legacy (*.xlsx)
Default in 9.5 Yes No
Access Data Engine required No Yes

 

Length and Limits

The Alteryx driver fully supports the documented Excel limits for rows and columns when reading and writing .xlsx files.

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Maximum number of rows (limit specified by Excel) 1,048,575 1,048,575
Maximum number of columns (limit specified by Excel) 16,384 255
Column name length and field name limits 256 recommended for compatibility with other formats. Column names are truncated to 64 characters.
Exceeding record limits (1,048,575 rows and 16,384 columns)

Exceeding column limits results in an error and exceeding row limits results in a truncated file.

Exceeding column and row limits results in a truncated file.
Exceeding file size limit (4,294,967,295 bytes, ~4GB) Displays the error message: “The data being written to this sheet is too large. Aborting process.” Writes up to the limit and then aborts without an error message.

 

Data Types

The Alteryx driver analyzes the sheet data to determine a data type and field name for each column so data can no longer be lost as was possible with the Legacy driver.

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Parsing Data Types If there is a mix of data types in a column the data type for the column will be set to String/WString. The first 16 rows are analyzed to determine data types for each column. Data of an incompatible type found below the 16th row is lost (returned as NULL).
String Values Narrow strings are returned when possible. When the data consists of ASCII strings, this will be more efficient than the Microsoft Legacy driver. Only wide strings are returned.
Numbers Rounding The formatting assigned to the cell will be ignored. For example, if a cell contains the value 1.27952 and the cell is formatted to show 2 decimal places the full value of 1.27952 will be returned instead of 1.28. Data will be returned using the formatting assigned to the cell. For example, decimal points set to a number lower than the data contains will result in data loss such as 1.27952 being returned as 1.28.
Date, Time, and DateTime All Dates and Times are displayed in Alteryx format. A Date data type is formatted based on an analysis of the data in the column.
  1. If every value in the column is date only (e.g., 2014/11/04 or May 4th with no time component), then the data type will be set to Date as YYYY-MM-DD.
  2. If every value in the column contains only time information with no date component, then the data type will be set to Time as HH:MM:SS.
  3. If data values in the column consist of both data and time components, then the data type will be set to Datetime as YYYY-MM-DD HH:MM:SS.
  4. Custom date formats are ignored. Any custom formatting applied to a date in Excel is ignored - only the actual data values are analyzed by Alteryx. For example, if a cell contains a full date and time value but is formatted to display only the date or only the time, Alteryx will ignore the formatting and instead evaluate the raw data value contained by the cell. This may lead to unexpected results if custom formatting has been used to show or hide parts of the date/time value.
Date, DateTime, and Time data types are all formatted as DateTime.

 

Field Names

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Duplicate Field/Column Names The following column names in the original Excel file:

abc, abc, abc, 123, 123, 123

will be read as follows:

abc, abc2, abc3, 123, 123_2, 123_3

This change could impact workflows created in versions previous to 9.5.

The following column names in the original Excel file:

abc, abc, abc, 123, 123, 123

will be read as follows:

abc, abc1, abc2, 123, 1231, 1232

Column names formatted as Dates

Column names formatted as dates in the original Excel file:

1/1/2014, 2/1/2014, 3/1/2014

will generate column names formatted using the default Alteryx date format as follows:

2014-01-01, 2014-02-01, 2014-03-01

This change could impact workflows created in versions previous to 9.5.

Column names formatted as dates in the original Excel file:

1/1/2014, 2/1/2014, 3/1/2014

will generate column names formatted as they appear in Excel.

Special characters in Sheet and Range Names

All characters supported by Excel are now supported by Alteryx with respect to Sheet Names. If Excel allows the character to be used when naming a sheet, the new driver is able to read it and no characters will be converted or changed. The following characters not valid for Sheet names: \ / ? * [ and ].

This change could impact workflows created in versions previous to 9.5. If you have Inputs pointing to xlsx files that were created by an Output tool that handled special characters, the sheet names will be different.

Certain special characters are replaced in Sheet names:
  • A back tick " ` " is replaced with an underscore ' _ '
  • An exclamation point " ! " is replaced by an underscore ' _ '
  • A space is replaced by an underscore ' _ '
  • A period " . " is replaced by a pound sign (hash tag) ' # '
  • A sheet with one or more of these characters will be displayed in Alteryx with the replaced character(s). If two or more sheet names resolve to the same name after the special characters are replaced, only one of the sheets will be accessible in Alteryx due to the Microsoft driver logic.

 

Ranges

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Read Support for Named and Explicit Ranges Yes Yes
Write Support for Named and Explicit Ranges Yes Yes

 

Output Options for XLSX Write Support

Create Sheet Append Sheet Overwrite Sheet Overwrite File

If there is an existing file: A new sheet is created. If there is already a sheet with the same name an error will display.

If there is no existing file: A new file and sheet is created.

If there is an existing file: Data is written in the specified sheet. If the specified sheet doesn’t exist there will be an error.

If there is no existing file: Data will not be written.

If there is an existing file: The specified sheet is deleted if it exists and then data is written in a new sheet with the specified name.

If there is no existing file: A new file and sheet is created.

If there is an existing file: The existing file is deleted and data is written in a new sheet in a new file.

If there is no existing file: A new file and sheet is created.

 

Additional Information for XLSX Write Support with Ranges

Create Sheet/Overwrite File Append Sheet Overwrite Sheet

Explicit ranges can be used.

A named range cannot be used since a named range is tied to a sheet and a sheet does not exist.

Both explicit ranges and named ranges can be used.

A named range must have been previously defined for the sheet.

Both explicit ranges and named ranges can be used.

A named range must have been previously defined for the sheet.

The data cannot contain more columns than the specified range but can contain fewer columns.

If there is not enough incoming data to fill the range, columns will be filled with nulls.

The data cannot contain more columns than the specified range but can contain fewer columns.

If there are fewer incoming columns in the range the unmapped columns will be filled with nulls.

The data cannot contain more columns than the specified range but can contain fewer columns.

If there is not enough incoming data to fill the range, columns will be filled with nulls.

Data is written starting at the top left cell in the first row of the specified range.

Data is appended starting on the first available row after the original data. When appending, no existing data will be overwritten.

Data cannot be written to a sheet that contains merged cells.

Column names should be included in the range. If the first row in the range is empty, the first row in the range that contains data will be used as column names. If the column names don’t match the incoming data, they must be manually mapped to the output names using the Custom Append Mapper in the Output Data tool.

Data is written starting at the top left cell in the first row of the specified range.

Data cannot be written to a sheet that contains merged cells.

The data can contain more rows than the specified range.

Data will be written beyond the specified range until:

  • All data is written
  • The maximum number of rows (1,048,575) is exceeded

The data can contain more rows than the specified range.

Data will be written beyond the specified range until:

  • All data is written
  • The maximum number of rows (1,048,575) is exceeded

The data can contain more rows than the specified range.

Data will be written beyond the specified range until:

  • All data is written
  • The maximum number of rows (1,048,575) is exceeded
  • Existing data is encountered