Earlier Releases of Dataprep by Trifacta
This section contains an archive of release notes for previous releases of Dataprep by Trifacta.
For the latest release notes, see Release Notes for Dataprep by Trifacta.
May 13, 2022
Release 9.2 - push 2
Changes in System Behavior
Intelligent caching:
Due to technical issues, the intelligent caching of recipe steps feature for performance improvements has been disabled.
Note
NOTE: This feature is in Beta release.
When the technical issues are addressed, this feature will be enabled.
April 20, 2022
Release 9.2
What's New
Lock/unlock column data type:
You can now lock or unlock a column's data type. When the data type is locked, the Trifacta Application no longer attempts to infer the column's data type when subsequent recipe steps are applied.
Tip
You can unlock the individual 's column data type through column menu. To the left of the column name, you can click the icon and select Automatically update to change the column's data type. For more information, see Column Menus.
Tip
As an early step in your recipe, you can use the Advanced column selector in the Change column data type transformation to specify locking of the data types for all columns.
For more information, see Change Column Data Type.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Marketo
Note
Marketo connections require an OAuth 2.0 client to be created in the Trifacta Application. For more information, see OAuth 2.0 for Marketo.
For more information, see Early Preview Connection Types.
Connectivity:
Google Analytics are now generally available and supported on Dataprep by Trifacta.
For more information on creating the connection object, seeGoogle Analytics Connections.
Publish Array data type as arrays to BigQuery:
You can now publish Dataprep by Trifacta Array data type as BigQuery arrays.
For more information, see Publishing Actions.
For more information, see BigQuery Data Type Conversions.
Parameterize data in hidden folders:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Optionally, you can scan hidden folders for wildcard- or pattern-based matches when building your parameterized imported datasets.
Tip
This capability can be useful for creating imported datasets from profiles generated as part of job runs. These profiles are stored in the .profiler
hidden directory where the job results are published.
Note
This feature is disabled by default. It can be enabled by an administrator.
Note
Scanning hidden folders may impact performance. For existing imported datasets with parameters, you should enable the inclusion of hidden folders on individual datasets and run a test job to evaluate impact.
For more information on including hidden files, see Dataprep Project Settings Page.
For more information on creating datasets with parameters from files, see Parameterize Files for Import.
Simplified permissions for publishing to BigQuery:
By default, Dataprep by Trifacta requires that the bigquery.datasets.create
permission is enabled for each user of the product to run Dataflow jobs on BigQuery data sources. In some environments, this permission cannot be provided to users, and these Dataflow jobs fail.
As a workaround, you can provide to Dataprep by Trifacta a pre-existing BigQuery dataset, in which intermediate query results can be stored. When this dataset is provided to the Trifacta Application, temporary tables are created within it as part of Dataflow job execution, and the bigquery.datasets.create
permission is not required.
Note
This BigQuery dataset must be created outside of Dataprep by Trifacta by your BigQuery administrator and must be located in the same region as your BigQuery source tables.
For more information on configuring the BigQuery temp dataset for the Trifacta Application, see Dataprep Project Settings Page.
Documentation:
Published documented solution for integrating Dataprep by Trifacta with your Virtual Private Cloud Service Controls (VPC SC). For more information on this integration, see Configure VPC-SC Perimeter.
Changes in System Behavior
Set column data type transformation locks the column's type by default:
Starting in this release, the column data type is locked by default when you change the column data type.
Note
This change in behavior does not affect recipe steps that were defined before this release. Column data types continue to be re-inferred after those recipe steps. For those steps, you can edit them and mark them as locking the data type, if preferred.
If required, you can unlock the column's data type. For more information, see Change Column Data Type.
Connectivity:
The Google Analytics connection type now supports the
UniversalAnalytics
schema.Note
Previously, this schema was called
GoogleAnalytics
by the driver vendor. You may need to update your custom SQL queries to reference this new schema name.For more information, see Google Analytics Connections.
Generate an initial sample:
When generating an initial sample from a set of files in a directory, the maximum number of files that can be read is now limited to 50.
Previously, the Trifacta Application read files until either 10MB of data or all matching files had been scanned.
This change is to limit the number of files that must be read for various operations in the Transformer page. It only applies to generating the initial sample type. Other sampling types, such as random sample, can scan the full set of files.
As needed, an administrator can change this maximum limit.
For more information, seeDataprep Project Settings Page.
For more information on sampling, see Overview of Sampling.
Performance:
The intelligent caching of recipe steps feature for performance improvements has been made available again. The issues that required removing it from the platform have been addressed.
Note
NOTE: This feature is in Beta release.
This feature can be enabled by an administrator.
For more information, see Dataprep Project Settings Page.
Email notifications:
In a future release, the setting for email notifications based on job success will default to Default (Any Jobs) at the project or workspace level and at the flow level. This change means that the user who executes a job and others who have access to the flow receive, by default, an email notification whenever a job executes for flows where email notification settings have never been modified. As part of this change, each email will contain a richer set of information about the job that was executed.
If needed, this new default setting can be modified:
Project owners and administrators change default value of the email notification settings. For more information, see Dataprep Project Settings Page.
Individual users can override these settings for individual flows. For more information, see Manage Flow Notifications Dialog.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-70522 | Cannot import converted files such as Excel, PDF, or JSON through SFTP connections. |
TD-69279 | Test Connection button fails a ValidationFailed error when editing a working connection configured with SSH tunneling. |
TD-66185 | Flatten transformation cannot handle multi-character delimiters. |
New Known Issues
Ticket | Description |
---|---|
TD-70326 |
Tip The Apache Beam upgrade to address this issue is in active planning and execution. This issue has no impact on the execution of Dataflow jobs. When the upgrade is complete, the message will be gone. |
TD-69813 | Dataprep by Trifactaarray type columns in datasets that were imported before Release 9.2 are still published as String type. Tip You can create a new imported dataset from the same source to publish those columns as BigQuery arrays. |
March 15, 2022
Release 9.1
What's New
Encryption:
Support for use of customer-managed encryption keys (CMEK) during Dataflow job execution. Trifacta Application can also check for use of CMEKs before writing results to BigQuery or Cloud Storage.
Warning
Private Preview: This feature is disabled by default. For more information on enabling this feature in your project, please contact Alteryx Support.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see Dataflow Pipeline State CMEK.
For more information on enabling, see Dataprep Project Settings Page.
JavaScript User Defined Functions:
Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.
Note
NOTE: This feature is in Beta release.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
This feature is enabled by default. For more information, see Dataprep Project Settings Page.
For more information, see JavaScript UDFs.
When enabled, JavaScript UDFs are defined through the Library page. For more information, see User Defined Functions Page.
Connectivity:
Enable connectivity between the Trifacta Application and your cloud databases using SSH tunneling is generally available with this release.
Tip
This feature is now generally available.
Note
For this release, SSH tunneling can be enabled on the following connection types: Oracle Database, PostgreSQL, MySQL, and Microsoft SQL Server.
For more information, see Configure SSH Tunnel Connectivity.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Instagram Ads
For more information, see Early Preview Connection Types.
Job execution:
The Trifacta Application can check for changes to your dataset's schemas before jobs are executed and optionally halt job execution to prevent data corruption.
These options can be configured by a project administrator.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see Dataprep Project Settings Page.
Tip
Schema validation can be overridden for individual jobs. For more information, see Run Job Page.
For more information, see Overview of Schema Management.
Dataset configuration:
For an imported dataset, you can configure settings through a new interface, including column names and column data types to use in the Trifacta Application.
Note
This experimental feature is intended for demonstration purposes only. This feature may be modified or removed from the Dataprep by Trifacta platform without warning in a future release. It should not be deployed in a production environment.
Note
This feature is part of a larger effort to improve how data is imported into the Trifacta Application. This feature must be enabled by a workspace administrator.
Sample Job IDs:
When a sample is collected, a job ID is generated and displayed in the Trifacta Application. These job IDs enable you to identify the sample jobs.
For more information, see Generate a Sample.
For more information, see Samples Panel.
For more information, see Sample Jobs Page.
Import:
For long-loading Parquet datasets, you can monitor the ingest process as you continue your work.
Note
NOTE: This feature is in Beta release.
For more information, see Flow View Page.
Changes in System Behavior
Publishing:
Beginning in this release, you can publish Dataprep by Trifacta Array type columns to BigQuery as BigQuery arrays for Alteryx primitive data types. Arrays containing non-primitive data types continue to be published as String values.
For more information, see Improvements to the Type System.
This change can be reverted to previous String publishing behavior on individual outputs. See BigQuery Table Settings.
Performance:
A recent release introduced improved performance through intelligent caching of recipe steps.
This feature was released as a Beta feature.
Due to some recently discovered issues, this feature has been disabled for the time being. It cannot be enabled by a workspace administrator at this time.
Note
If this Beta feature had been enabled in your environment, you may experience a reduction in performance when moving between recipe steps in the Transformer page.
The feature will be re-enabled in a future release.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-60881 | For ADLS datasets, parameter indicators in Flow View are shifted by one character. |
New Known Issues
None.
February 9, 2022
Release 9.0
What's New
JavaScript User Defined Functions:
Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Warning
This feature is disabled by default. For more information on enabling JavaScript UDFs in your project, please contact Alteryx Support.
For more information, see JavaScript UDFs.
When enabled, JavaScript UDFs are defined through the Library page. For more information, see User Defined Functions Page.
Connectivity:
Build connections to accessible REST API endpoints.
Warning
This feature is disabled by default. For more information about enabling REST API connectivity in your environment, please contact Alteryx Support.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see REST API Connections.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
LinkedIn Ads
Zendesk
For more information, see Early Preview Connection Types.
Dataset Schema Refresh:
You can now refresh your imported datasets with the current schema information from the source file or table. Schema refresh enables you to capture any changes to the columns in your dataset.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see Overview of Schema Management.
Dataset schemas can be refreshed through the following pages:
See Dataset Details Page.
In Flow View. For more information, see View for Imported Datasets.
Changes in System Behavior
None.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-68162 | Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps. |
New Known Issues
None.
January 27, 2022
Release 8.11 - push 2
What's New
None.
Changes in System Behavior
None.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-68162 | Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps. |
New Known Issues
None.
January 20, 2022
Release 8.11
What's New
BigQuery Running Environment:
Beginning in this release, sampling jobs can be executed in BigQuery.
For more information, see Flow Optimization Settings Dialog.
For more information, see BigQuery Running Environment.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Asana
Exact Online
Facebook Ads
Jira by Atlassian
Pinterest
QuickBooks Online
Trino
For more information, see Early Preview Connection Types.
Session Management:
You can view the current and recent sessions of the Trifacta Application. You can review the devices that are authorized and revoke any unfamiliar devices.
For more information, see Preferences Page.
For more information, see Sessions Page.
Performance:
Improved performance during design time through intelligent caching of recipe steps.
Note
NOTE: This feature is in Beta release.
A workspace administrator may need to enable this feature in your project. SeeDataprep Project Settings Page.
Improvements in job execution performance, due to skipping some output validation steps for file-based outputs.
Note
When scheduled or API jobs are executed, no validations are performed of any writesettings objects. Issues with these objects may cause failures during transformation or publishing stages of job execution.
A workspace administrator may need to enable this feature in your project. See Dataprep Project Settings Page.
Changes in System Behavior
Sample sizes can be increased up to 40MB
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Prior to this release, the size of a sample was capped at 10MB. This size represented:
the actual size of the sample object stored in the base storage layer
the default maximum size of the sample displayed in the Trifacta Application. This sample size can be reduced from 10MB, if needed.
Beginning in this release:
The actual size of the stored sample has increased to 40MB.
Note
On backend storage, sample sizes are now four times larger than in previous releases. For datasources that require decompression or conversion, actual storage sizes may exceed this 40 MB limit.
The size of the sample displayed for a recipe can be configured to be up to 40MB in size by individual users.
For more information, see Change Recipe Sample Size.
Data type mismatches can now be written out in CSV format
Beginning in this release, for CSV outputs mismatched values are written as regular values by default. In prior releases, mismatched values were written as null values in CSV outputs.
Deprecated
None.
Key Bug Fixes
None.
New Known Issues
Ticket | Description |
---|---|
TD-68162 | Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps. Tip To edit your flow parameters, select Parameters from the Flow View context menu. Note There is no current workaround for embedding in recipe steps. While your existing parameters should continue to work at execution time, avoid changing names of your flow parameters or editing recipe steps in which they are referenced. New flow parameters cannot be used in recipes at this time. |