Improvements to the Type System

This section provides information on improvements to the Alteryx type system.

Release 8.11

Mismatched values are no longer published as null values in CSV outputs

In prior releases, when a file was published in CSV format, any values that were mismatches for a column's data type were written as null values, which could lead to loss of data that was meaningful to downstream systems.

Beginning in this release, mismatched values are written out in CSV format as String values by default.

Note

The ability to write out mismatched values in CSV outputs is enabled by default in new flows and CSV publishing actions. For existing CSV outputs, the prior behavior is maintained.

Note

This capability applies only to CSV outputs at this time. In the future, it will be applied to other non-schematized outputs, such as JSON.

Tip

When visual profiling is enabled, you can still identify the values in the generated results that are mismatched for their column data types.

As needed, you can configure the ability to write out mismatches for CSV outputs for individual publishing actions. For more information, see File Settings.

Release 8.7

Snowflake date publishing improvements

In prior releases, the Trifacta Application did not fully support Datetime values on publication. It published Date type values to Snowflake as follows:

Publishing Action→	Create/Drop Table	Append/Truncate
TargetTable→		Date	Datetime
Date	DateTime Appends 00:00:00	Append 00:00:00	Append 00:00:00

Beginning in this release, the Trifacta Application publishes Date values as follows:

Publishing Action	Create/Drop	Append/Truncate
TargetTables→		Date	Datetime
Date	Date	Date	DatetimeAppends 00:00:00

For more information, see Snowflake Data Type Conversions.

Release 8.2

None.

Release 8.1

Running environments now support output of Datetime values as Datetime/Timestamp values in Parquet files

In prior releases, Datetime values in the Trifacta Application were written as String values for Parquet outputs.

Beginning in this release, you can optionally enable the generation of Datetime/Timestamp values as the outputs of Datetime columns.

Note

To ensure consistency with prior releases, this feature is disabled by default.

For more information on enabling this feature, see Miscellaneous Configuration.

Release 8.0

Data type inference and row split inference run on more data

When an dataset is imported into the Trifacta Application, a larger volume of data is read from it for the following processes:

Note

The following is applied to datasets that do not contain schema information.

Split row inference: Patterns in the data are used to determine the end of a row of data. When a larger volume of data is read, there should be more potential rows to review, resulting in better precision on where to split the data into separate rows in the application.
Type inference: Patterns of data in the same column are used to determine the best Alteryx data type to assign to the imported dataset. A larger volume of data means that the application has more values for the same column from which to infer the appropriate data type.

Note

An increased data volume should result in a more accurate split row and data type inferencing. For pre-existing datasets, this increased volume may result in changes to the row and column type definitions when a dataset is imported.

Tip

For datasets that are demarcated by quoted values, you may experience a change in how columns are typed.

If you notice unexpected changes in column data types or in row splits in your datasets:

Type inference: You should move your recipe panel cursor to the top of the dataset to see if you must reassign data types.
Split row inference: Create a new imported dataset, disabling type inference in the import settings. Check the splitrows transform to see if it is splitting the rows appropriately. For more information, see Import Data Page.

Change Designer Cloud application load limits

As needed, you can modify the limits that the Trifacta Application uses during the data type and split row inference processes. For more information, see Configure Type Inference.

Release 7.5

PII - Improved and expanded matching for credit card numbers

In prior releases, PII for credit card numbers was identified base on 16-digit values.

In this release, the matching constraints have been expanded to include 14-digit credit card values.

Also, the constraints around valid 16-digit numbers have been improved with better recognition around values for different types of credit cards. In the following table, you can see lists of valid test numbers for different credit card types and can see how detection of these values has changed between releases.

Test Number	Credit Card Type	Is Detected 7.4?	Is detected 7.5?
2222 4053 4324 8877	Mastercard	No	Yes
2222 9909 0525 7051	Mastercard	No	Yes
2223 0076 4872 6984	Mastercard	No	Yes
2223 5771 2001 7656	Mastercard	No	Yes
5105 1051 0510 5100	Mastercard	Yes	Yes
5111 0100 3017 5156	Mastercard	Yes	Yes
5204 7400 0990 0014	Mastercard	Yes	Yes
5420 9238 7872 4339	Mastercard	Yes	Yes
5455 3307 6000 0018	Mastercard	Yes	Yes
5506 9004 9000 0444	Mastercard	Yes	Yes
5553 0422 4198 4105	Mastercard	Yes	Yes
5555 5555 5555 4444	Mastercard	Yes	Yes
4012 8888 8888 1881	Visa	Yes	Yes
4111 1111 1111 1111	Visa	Yes	Yes
6011 0009 9013 9424	Discover	Yes	Yes
6011 1111 1111 1117	Discover	Yes	Yes
3714 496353 98431	American Express	Yes	Yes
3782 822463 10005	American Express	Yes	Yes
3056 9309 0259 04	Diners	No	Yes
3852 0000 0232 37	Diners	No	Yes
3530 1113 3330 0000	JCB	Yes	Yes
3566 0020 2036 0505	JCB	Yes	Yes

For more information, see Credit Card Data Type.

For more information, see Data Type Validation Patterns.

Improved type inference for relational sources

In prior releases, when you generated outputs, the typecasting for the outputs was determined by the data types that were inferred by the application. So, if a column contained only "Yes" or "No" values, then the application is likely to have inferred the column data type as Boolean.

The above presented problems for relational sources for the following reasons:

Relational sources could include schema information, which should override what the application inferred. Type inferencing can optionally be disabled for schematized sources from with in the application. For more information on disabling type inference for a schematized source, see Import Data Page.
For some relational sources, datasets are ingested into the backend datastore and stored there as CSV files. These CSV files are then used as the source for the imported datasets. In these cases, the original source schema information was lost, which meant that the application's type inference was applied. This applied to the following data sources:
1. Snowflake
2. Redshift
3. SQL Datawarehouse
4. Alation
5. Waterline

Beginning in this release, the schemas from relational datasources that are ingested to the backend datastore are now used for generated outputs, unless the type was being forcibly set to something else during the recipe step. At the time of original import, the schema of the relational datasource is stored as part of the ingest process; this schema is stored separately.

Note

If you created recipe steps that forcibly change a column's data type from within the application to be different from the source data type of your relational source, you may need to revise these recipe steps or remove them altogether.

During publication, Designer Cloud Powered by Trifacta Enterprise Edition maps its internal data types to the data types of the publishing target using an internal mapping per vendor. For more information, see Type Conversions.

Release 6.0 and earlier

General Improvements in Typecasting

Mismatched data types

Where there are mismatches between inputs and the expected input data type, the following values are generated for the mismatches:

Source data type	Output if mismatched
Primitive data types: Integer Decimal Boolean Arrays Maps	null value, if mismatched
Datetime	null value, if mismatched
Other non-primitive data types, including: SSN Phone Number Email Address Credit Card Gender IP Address URL HTTP Code Zip Code	Converted to string values, if mismatched
String	Anything can be a String value.

State values and custom data types are converted to string values, if they are mismatched.

Three-value logic for null values

The Trifacta Photon running environment has been augmented to use three-value logic for null values.

When values are compared, the result can be true or false in most cases.

If a null value was compared to a null value in the Trifacta Photon running environment:

In Release 3.0 and earlier, this evaluated to true.
In Release 3.1 and later, this evaluates to an unknown (null) value.

This change aligns the behavior of the running environment with that of SQL and Hadoop Pig.

Improved handling of null values

Assume that the column nuller contains null values and that you have the following transform:

derive value:(nuller >= 0)

Prior to Release 3.1, the above transform generated a column of true values.

In Release 3.1 and later, the transform generates a column of null values.

More consistent evaluation of null values in ternaries

In the following example, a_null_expression always evaluates to a null value.

derive value: (a_null_expression ? 'a' : 'b')

In Release 3.0, this expression generated b for all inputs on the Trifacta Photon running environment and a null value on Hadoop Pig.

In Release 3.1 and later, this expression generates a null value for all inputs on both running environments.

Tip

Beginning in Release 3.1, you can use the if function instead of ternary expressions. Ternaries may be deprecated at some point in the future. For more information, see IF Function.

For example, you have the following dataset:

MyStringCol
This works.
You can't break this.
Not broken yet.

You test each row for the presence of the string can't:

derive value: if(find(MyStringCol, 'can\'t',true,0) > -1, true, false) as:'MyFindResults'

The above transform results in the following:

MyStringCol	MyFindResults
This works.
You can't break this.	true
Not broken yet.

In this case, the value of false is not written to the other columns, since the find function returns a null value. This null value, in turn, nullifies the entire expression, resulting in a null value written in the new column.

You can use the following to locate the null values:

derive value:isnull(MyFindResults) as:'nullInMyFindResults'

Datetime changes

Raw date and time values must be properly formatted

Note

Upgraded recipes continue to function properly. However, if you edit the recipe step in an upgraded system, you are forced to fix the formatting issue before saving the change.

Before this release, you could create a transform like the following:

derive value:date(2016,2,15)

This transform generated a column of map values, like the following:

{"year":"2016","month":"2","date":"15"}

Beginning this release, the above command is invalid, as the date values must be properly formatted prior to display. The following works:

derive value:dateformat(date(2016,2,15),'yyyy-MM-dd')

This transform generates a column of Datetime values in the following format:

2016-02-15

Time:

Before this release:

derive value:time(11,34,58)

Prior release output:

{"hours":"11","minutes":"34","seconds":"58"}

This release:

derive value:dateformat(time(11,34,58), 'HH-mm-ss')

This release's output:

11-34-58

See DATEFORMAT Function.
See UNIXTIMEFORMAT Function.

Date formatting functions supports 12-hour time only if AM/PM indicator is included

Beginning in this release, the unixtimeformat and dateformat functions requires an AM/PM indicator (a) if the date formatting string uses a 12-hour time indicator (h or hh).

Valid for earlier releases:

derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss') as:'myUnixDate'

Valid for this release and later:

derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss a') as:'myUnixDate'

These references in recipes fail to validate in this release or later and must be fixed.

See DATEFORMAT Function.
See Unixtimeformat Function.

Un-inferrable formats from dateformat and unixtimeformat functions are written as strings

If a formatting string is not a datetime format recognized by Designer Cloud Powered by Trifacta Enterprise Edition, the output is generated as a string value.

This change was made to provide clarity to some ambiguous conditions.

Colon as a delimiter for date values is no longer supported

Beginning in this release, the colon (:) is no longer supported as a delimiter for date values. It is still supported for time values.

myDateValue	Recognized?
02:03:2016	No
02:03:16	Recognized as a time value

When data such as the above is imported, it may not be initially recognized by the Trifacta Application as Datetime type.

To fix, you might apply the following transform:

replace col:myDateValue with:'-' on:`-` global:true

The new column values are more likely to be inferred as Datetime values. If not, you can choose the appropriate Datetime format from the data type drop-down for the column. See Data Grid Panel.

In this section:

Improvements to the Type System

Release 8.11

Mismatched values are no longer published as null values in CSV outputs

Release 8.7

Snowflake date publishing improvements

Release 8.2

Release 8.1

Running environments now support output of Datetime values as Datetime/Timestamp values in Parquet files

Release 8.0

Data type inference and row split inference run on more data

Change Designer Cloud application load limits

Release 7.5

PII - Improved matching for social security numbers

PII - Improved and expanded matching for credit card numbers

Improved type inference for relational sources

Release 6.0 and earlier

General Improvements in Typecasting

Mismatched data types

Three-value logic for null values

Improved handling of null values

More consistent evaluation of null values in ternaries

Datetime changes

Raw date and time values must be properly formatted

Date formatting functions supports 12-hour time only if AM/PM indicator is included

Un-inferrable formats from dateformat and unixtimeformat functions are written as strings

Colon as a delimiter for date values is no longer supported

Search results