Alteryx driver for Microsoft Excel (.xlsx)

Overview

Alteryx has built-in native read and write drivers for Microsoft Excel (.xlsx) to address data loss and performance issues reported with the Microsoft Excel Driver.

By default, all modules with xlsx files will be switched to Microsoft Excel (.xlsx) starting with Alteryx 9.5. To continue to use the existing file format, the user will need to switch it back manually to Microsoft Excel Legacy (*.xlsx).

Features at a glance for the new Microsoft Excel (xlsx) driver include:

Configuration

Configurations for Microsoft Excel and Microsoft Excel Legacy are the same except for one of the “Output Options” in the Output Data tool. These were changed in the new driver to match the Microsoft Excel look and feel.

Microsoft Excel (xlsx) Microsoft Excel Legacy (xlsx) Change in Functionality?
Create New Sheet Create New Table No
Overwrite File (Remove) Overwrite File (Remove) No
Overwrite Sheet (Drop) Delete Data and Append The schema is changed by the Microsoft Excel driver but not by the legacy driver.
Append to Existing Sheet Append Existing No

Note: When appending to xlsx, all fields must be non-Null.

 

Side-by-Side Comparison

General

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Name in Alteryx Microsoft Excel (*.xlsx) Microsoft Excel Legacy (*.xlsx)
Default in 9.5 Yes No
Access Data Engine required No Yes

 

Length and Limits

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Maximum Number of Rows (limit specified by Excel) 1,048,575 1,048,575
Maximum Number of Columns (limit specified by Excel) 16,384 (1-2K recommended for performance) 255
Column name length and field name limits Limited only by string size; 256 recommended for compatibility with other formats. Column names are truncated to 64 characters.
Exceeding record limits (1,048,575 rows)(16,384 columns) Any remaining records/columns will be truncated. No file is written. Writes out the maximum number of records/columns and dumps the rest of the data.

 

Datatypes

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Parsing Datatypes If there is a mix of datatypes in a column the data is returned as string values. The first 16 rows are analyzed to determine datatypes for each column. Data of an incompatible type is lost (returned as NULL).
String Values Narrow strings are returned when possible. When reading ASCII data from Excel, this will be smaller in size than Microsoft Legacy driver. Only wide strings are returned.
Numbers Rounding The formatting assigned to the cell will be ignored. For example, if a cell contains the value 1.27452 and the cell is formatted to show 2 decimal places the full value of 1.27452 will be returned instead of 1.28. Data will be returned using the formatting assigned to the cell. For example, decimal points set to a number lower than the data contains will result in data loss such as 1.27452 being returned as 1.28.
Date, Time, and DateTime All Dates and Times are displayed in Alteryx format. A Date datatype is formatted based on an analysis of the data in the column.
  1. If every value in the column is date only (e.g., 2014/11/04 or May 4th), then the datatype will be set to Date as YYYY-MM-DD.
  2. If every value in the column contains only time information, then the datatype will be set to time.
  3. Anything else and the datatype will be set to datetime.
  4. Custom date formats are handled in the same manner to return the correct datatype.
Date, DateTime, and Time datatypes are all formatted as DateTime.

 

Field Names

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Special characters in Sheet and Range Names

All characters are supported in Sheet Names. If Excel allows the character to be used when naming a sheet, the new driver is able to read it.

This change could impact workflows created in versions previous to 9.5. If you have Inputs pointing to xlsx files that were created by an Output tool that handled special characters, the sheet names will be different.

Certain special characters are replaced in Sheet names:
  • Back tick character " ` " is replaced with underscore ' _ '
  • Exclamation point " ! " is replaced by underscore ' _ '
  • Space character is replaced by underscore ' _ ' Period " . " is replaced by pound sign (or hash tag) ' # '
  • A sheet cannot be opened if it contains an exclamation point or pound sign character in the name.
Duplicate Field/Column Names The following column names in the original Excel file:

abc, abc, abc, 123, 123, 123

will be read as follows:

abc, abc2, abc3, 123, 123_2, 123_3

This change could impact workflows created in versions previous to 9.5.

The following column names in the original Excel file:

abc, abc, abc, 123, 123, 123

will be read as follows:

abc, abc1, abc2, 123, 1231, 1232

Column names formatted as Dates

Column names formatted as dates in the original Excel file:

1/1/2014, 2/1/2014, 3/1/2014

will be read as follows:

2014-01-01, 2014-02-01, 2014-03-01

This change could impact workflows created in versions previous to 9.5.

Column names formatted as dates in the original Excel file:

1/1/2014, 2/1/2014, 3/1/2014

will be read as such.

Append Field Map All fields must be mapped. When only some, not all fields are mapped to a custom field map, an error will result and the file will not be written. When only some, not all fields are mapped to a custom field map, a file is written by appending the mapped fields and including NULL value for the unmapped fields.

 

Using Query Builder

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
SQL Editor

You can type things like this in the SQL Editor window:

SELECT * FROM `Sheet1$`

`Sheet1$`SELECT * FROM `Sheet1$A1:D4`

`Sheet1$A1:D4`

`$A1:D4`

No

 

 

Ranges

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Read Support for Named and Explicit Ranges Yes Yes
Write Support for Named and Explicit Ranges No Yes
Create separate Named Ranges for Sheets No Yes

 

Bug Fixes

  Alteryx Driver for Microsoft Excel Legacy Driver for Microsoft Excel
Return all Int64 datatypes when First Row Contains Data checkbox is checked Yes No