Skip to main content

Earlier Releases of Dataprep by Trifacta

This section contains an archive of release notes for previous releases of Dataprep by Trifacta.

For the latest release notes, see Release Notes for Dataprep by Trifacta.

May 13, 2022

Release 9.2 - push 2

Changes in System Behavior

Intelligent caching:

Due to technical issues, the intelligent caching of recipe steps feature for performance improvements has been disabled.

Note

NOTE: This feature is in Beta release.

When the technical issues are addressed, this feature will be enabled.

April 20, 2022

Release 9.2

What's New

Lock/unlock column data type:

You can now lock or unlock a column's data type. When the data type is locked, the Trifacta Application no longer attempts to infer the column's data type when subsequent recipe steps are applied.

Tip

You can unlock the individual 's column data type through column menu. To the left of the column name, you can click the icon and select Automatically update to change the column's data type. For more information, see Column Menus.

Tip

As an early step in your recipe, you can use the Advanced column selector in the Change column data type transformation to specify locking of the data types for all columns.

For more information, see Change Column Data Type.

Connectivity:

Early Preview (read-only) connections available with this release:

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Connectivity:

  • Google Analytics are now generally available and supported on Dataprep by Trifacta.

Publish Array data type as arrays to BigQuery:

You can now publish Dataprep by Trifacta Array data type as BigQuery arrays.

Parameterize data in hidden folders:

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Optionally, you can scan hidden folders for wildcard- or pattern-based matches when building your parameterized imported datasets.

Tip

This capability can be useful for creating imported datasets from profiles generated as part of job runs. These profiles are stored in the .profiler hidden directory where the job results are published.

Note

This feature is disabled by default. It can be enabled by an administrator.

Note

Scanning hidden folders may impact performance. For existing imported datasets with parameters, you should enable the inclusion of hidden folders on individual datasets and run a test job to evaluate impact.

For more information on including hidden files, see Dataprep Project Settings Page.

For more information on creating datasets with parameters from files, see Parameterize Files for Import.

Simplified permissions for publishing to BigQuery:

By default, Dataprep by Trifacta requires that the bigquery.datasets.create permission is enabled for each user of the product to run Dataflow jobs on BigQuery data sources. In some environments, this permission cannot be provided to users, and these Dataflow jobs fail.

As a workaround, you can provide to Dataprep by Trifacta a pre-existing BigQuery dataset, in which intermediate query results can be stored. When this dataset is provided to the Trifacta Application, temporary tables are created within it as part of Dataflow job execution, and the bigquery.datasets.create permission is not required.

Note

This BigQuery dataset must be created outside of Dataprep by Trifacta by your BigQuery administrator and must be located in the same region as your BigQuery source tables.

For more information on configuring the BigQuery temp dataset for the Trifacta Application, see Dataprep Project Settings Page.

Documentation:

Published documented solution for integrating Dataprep by Trifacta with your Virtual Private Cloud Service Controls (VPC SC). For more information on this integration, see Configure VPC-SC Perimeter.

Changes in System Behavior

Set column data type transformation locks the column's type by default:

Starting in this release, the column data type is locked by default when you change the column data type.

Note

This change in behavior does not affect recipe steps that were defined before this release. Column data types continue to be re-inferred after those recipe steps. For those steps, you can edit them and mark them as locking the data type, if preferred.

If required, you can unlock the column's data type. For more information, see Change Column Data Type.

Connectivity:

  • The Google Analytics connection type now supports the UniversalAnalytics schema.

    Note

    Previously, this schema was called GoogleAnalytics by the driver vendor. You may need to update your custom SQL queries to reference this new schema name.

Generate an initial sample:

When generating an initial sample from a set of files in a directory, the maximum number of files that can be read is now limited to 50.

  • Previously, the Trifacta Application read files until either 10MB of data or all matching files had been scanned.

  • This change is to limit the number of files that must be read for various operations in the Transformer page. It only applies to generating the initial sample type. Other sampling types, such as random sample, can scan the full set of files.

As needed, an administrator can change this maximum limit.

Performance:

The intelligent caching of recipe steps feature for performance improvements has been made available again. The issues that required removing it from the platform have been addressed.

Note

NOTE: This feature is in Beta release.

This feature can be enabled by an administrator.

For more information, see Dataprep Project Settings Page.

Email notifications:

In a future release, the setting for email notifications based on job success will default to Default (Any Jobs) at the project or workspace level and at the flow level. This change means that the user who executes a job and others who have access to the flow receive, by default, an email notification whenever a job executes for flows where email notification settings have never been modified. As part of this change, each email will contain a richer set of information about the job that was executed.

If needed, this new default setting can be modified:

Deprecated

None.

Key Bug Fixes

Ticket

Description

TD-70522

Cannot import converted files such as Excel, PDF, or JSON through SFTP connections.

TD-69279

Test Connection button fails a ValidationFailed error when editing a working connection configured with SSH tunneling.

TD-66185

Flatten transformation cannot handle multi-character delimiters.

New Known Issues

Ticket

Description

TD-70326

A newer version of the SDK family exists and updating is recommended warning appears for Apache Beam in Dataflow job screen.

Tip

The Apache Beam upgrade to address this issue is in active planning and execution. This issue has no impact on the execution of Dataflow jobs. When the upgrade is complete, the message will be gone.

TD-69813

Dataprep by Trifactaarray type columns in datasets that were imported before Release 9.2 are still published as String type.

Tip

You can create a new imported dataset from the same source to publish those columns as BigQuery arrays.

March 15, 2022

Release 9.1

What's New

Encryption:

  • Support for use of customer-managed encryption keys (CMEK) during Dataflow job execution. Trifacta Application can also check for use of CMEKs before writing results to BigQuery or Cloud Storage.

    Warning

    Private Preview: This feature is disabled by default. For more information on enabling this feature in your project, please contact Alteryx Support.

    Note

    This feature may not be available in all product editions. For more information on available features, see Compare Editions.

JavaScript User Defined Functions:

  • Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.

    Note

    NOTE: This feature is in Beta release.

    Note

    This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Connectivity:

  • Enable connectivity between the Trifacta Application and your cloud databases using SSH tunneling is generally available with this release.

    Tip

    This feature is now generally available.

    Note

    For this release, SSH tunneling can be enabled on the following connection types: Oracle Database, PostgreSQL, MySQL, and Microsoft SQL Server.

    For more information, see Configure SSH Tunnel Connectivity.

Connectivity:

Early Preview (read-only) connections available with this release:

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Job execution:

The Trifacta Application can check for changes to your dataset's schemas before jobs are executed and optionally halt job execution to prevent data corruption.

  • These options can be configured by a project administrator.

    Note

    This feature may not be available in all product editions. For more information on available features, see Compare Editions.

    For more information, see Dataprep Project Settings Page.

Tip

Schema validation can be overridden for individual jobs. For more information, see Run Job Page.

Dataset configuration:

For an imported dataset, you can configure settings through a new interface, including column names and column data types to use in the Trifacta Application.

Note

This experimental feature is intended for demonstration purposes only. This feature may be modified or removed from the Dataprep by Trifacta platform without warning in a future release. It should not be deployed in a production environment.

Note

This feature is part of a larger effort to improve how data is imported into the Trifacta Application. This feature must be enabled by a workspace administrator.

Sample Job IDs:

When a sample is collected, a job ID is generated and displayed in the Trifacta Application. These job IDs enable you to identify the sample jobs.

Import:

For long-loading Parquet datasets, you can monitor the ingest process as you continue your work.

Note

NOTE: This feature is in Beta release.

For more information, see Flow View Page.

Changes in System Behavior

Publishing:

Beginning in this release, you can publish Dataprep by Trifacta Array type columns to BigQuery as BigQuery arrays for Alteryx primitive data types. Arrays containing non-primitive data types continue to be published as String values.

Performance:

A recent release introduced improved performance through intelligent caching of recipe steps.

  • This feature was released as a Beta feature.

  • Due to some recently discovered issues, this feature has been disabled for the time being. It cannot be enabled by a workspace administrator at this time.

    Note

    If this Beta feature had been enabled in your environment, you may experience a reduction in performance when moving between recipe steps in the Transformer page.

  • The feature will be re-enabled in a future release.

Deprecated

None.

Key Bug Fixes

Ticket

Description

TD-60881

For ADLS datasets, parameter indicators in Flow View are shifted by one character.

New Known Issues

None.

February 9, 2022

Release 9.0

What's New

JavaScript User Defined Functions:

Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Warning

This feature is disabled by default. For more information on enabling JavaScript UDFs in your project, please contact Alteryx Support.

For more information, see JavaScript UDFs.

When enabled, JavaScript UDFs are defined through the Library page. For more information, see User Defined Functions Page.

Connectivity:

Build connections to accessible REST API endpoints.

Warning

This feature is disabled by default. For more information about enabling REST API connectivity in your environment, please contact Alteryx Support.

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

For more information, see REST API Connections.

Connectivity:

Early Preview (read-only) connections available with this release:

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Dataset Schema Refresh:

You can now refresh your imported datasets with the current schema information from the source file or table. Schema refresh enables you to capture any changes to the columns in your dataset.

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Changes in System Behavior

None.

Deprecated

None.

Key Bug Fixes

Ticket

Description

TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

New Known Issues

None.

January 27, 2022

Release 8.11 - push 2

What's New

None.

Changes in System Behavior

None.

Deprecated

None.

Key Bug Fixes

Ticket

Description

TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

New Known Issues

None.

January 20, 2022

Release 8.11

What's New

BigQuery Running Environment:

Beginning in this release, sampling jobs can be executed in BigQuery.

Connectivity:

Early Preview (read-only) connections available with this release:

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Session Management:

You can view the current and recent sessions of the Trifacta Application. You can review the devices that are authorized and revoke any unfamiliar devices.

Performance:

  • Improved performance during design time through intelligent caching of recipe steps.

    Note

    NOTE: This feature is in Beta release.

  • Improvements in job execution performance, due to skipping some output validation steps for file-based outputs.

    Note

    When scheduled or API jobs are executed, no validations are performed of any writesettings objects. Issues with these objects may cause failures during transformation or publishing stages of job execution.

Changes in System Behavior

Sample sizes can be increased up to 40MB

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Prior to this release, the size of a sample was capped at 10MB. This size represented:

  • the actual size of the sample object stored in the base storage layer

  • the default maximum size of the sample displayed in the Trifacta Application. This sample size can be reduced from 10MB, if needed.

Beginning in this release:

  • The actual size of the stored sample has increased to 40MB.

    Note

    On backend storage, sample sizes are now four times larger than in previous releases. For datasources that require decompression or conversion, actual storage sizes may exceed this 40 MB limit.

  • The size of the sample displayed for a recipe can be configured to be up to 40MB in size by individual users.

For more information, see Change Recipe Sample Size.

Data type mismatches can now be written out in CSV format

Beginning in this release, for CSV outputs mismatched values are written as regular values by default. In prior releases, mismatched values were written as null values in CSV outputs.

See Improvements to the Type System.

Deprecated

None.

Key Bug Fixes

None.

New Known Issues

Ticket

Description

TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

Tip

To edit your flow parameters, select Parameters from the Flow View context menu.

Note

There is no current workaround for embedding in recipe steps. While your existing parameters should continue to work at execution time, avoid changing names of your flow parameters or editing recipe steps in which they are referenced. New flow parameters cannot be used in recipes at this time.