Release Notes for Dataprep by Trifacta
These release notes apply to the following product tiers of Dataprep by Trifacta:
Enterprise Edition
Professional Edition
Starter Edition
Premium Edition
Standard Edition
Legacy Edition
Tip
You can see your product tier in the Trifacta Application. Select Resources menu > About Cloud Dataprep.
For more information, see Product Editions.
For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.
April 16, 2024
Release 10.4
What's New
None
Key Bug Fixes
Issue | Description |
---|---|
TSMC-861 | The issue where the post-job SQL fails after 30 minutes with a timeout error is fixed by increasing the time out to 2 hours. |
January 23, 2024
Release 10.3
What's New
Connectivity
Early Preview (read-only) connections available with this release.
For more information, see Microsoft OneDrive Connections.
For more information, see Microsoft SharePoint Files Connections.
Key Bug Fixes
Issue | Description |
---|---|
TSMC-1314 | The issue when the disabled Dataprep users were charged when they attempt to log in was resolved. |
Known Issues
Issue | Description |
---|---|
TSMC-1345 | Stratified and Cluster-Based samples will not be supported for the OneDrive and Sharepoint Files connectors on Dataprep. |
TSMC-1219 | When you change the setting “Manage access to data using user IAM permissions” is triggering the workspace logout, which inturn stucks at sign-in page. |
TSMC-946 | Header appears multiple times in the output when running via Dataflow engine |
TSMC-864 | BigQuery jobs are again failing with "Failed while uploading UDF bundle to Google Cloud Storage. Job Id: xxxx. Error: Java heap space |
TSMC-1331 | Full Scan sample jobs fail for recipes on datasets form OneDrive / Sharepoint Files connectors |
Changes in System Behavior
Note
Schedules owned by disabled users will no longer continue to execute. To re-enable these schedules, the workspace admin will need to transfer ownership of the Flow or Plan to an enabled user and recreate the schedule. For more information, go to Schedules Page.
October 11, 2023
Release 10.2
What's New
Connectivity:
Early Preview (read-only) connections available with this release.
For more information, see Exasol Connections.
For more information, see Google Analytics 4 Connections.
Upgraded Google Ads API to Version 15.
Key Bug Fixes
Ticket | Description |
---|---|
TSMC-826 | Issue with the Plan jobs stuck in either Ingest or sql execution after disabling "Schema validation" has been resolved |
TSMC-592 | Issue with importing Excel from SFTP connection failure when in-VPC execution is enabled has been resolved |
Known Issues
Ticket | Description |
---|---|
TSMC-1062 | Message does not adhere to API specification" is thrown when attempting to change storage directories in Dataprep projects |
TSMC-948 | Wrangle job failing for Parametrized JSON input |
TSMC-880 | several jobs stuck in Queued state in the UI for a week and counting |
TSMC-876 | Optimizer job times out intermittently |
TSMC-864 | BigQuery jobs are again failing with "Failed while uploading UDF bundle to Google Cloud Storage. Job Id: xxxx. Error: Java heap space |
TSMC-861 | Post-job SQL job fails after 30 minutes with timeout error |
TSMC-672 | Standardize function hangs on 'edit' when schema changes |
TSMC-865 | Job failed due to "REMOTE JOB datasystem-29104989: Caused by: java.sql.SQLException" |
July 17, 2023
Release 10.1
What's New
In-VPC Execution:
Enable and configure design time connectivity and conversion jobs within your VPC through the Trifacta Application.
Tip
In-VPC execution for Dataprep by Trifacta is now generally available.
For more information, see Dataprep Project Settings Page.
BigQuery running environment:
Support for pushdown execution of nested data types in BigQuery. See BigQuery Running Environment.
Changes in System Behavior
None.
Deprecated
Templates:
Use of templates to launch new flows in the Trifacta Application is now deprecated. Templates may be accessible for a short period of time after the release has been pushed and will be removed at some point after without further notice.
Key Bug Fixes
None
New Known Issues
Ticket | Description |
---|---|
TVIN-1295 | Context menu for folders is not available |
TRCP-220 | Plans are restructured post the Dataprep 10.0 push |
TLI-1686 | Unable to validate in Vpc settings on admin page, throws [feature. openApi.enableResponseEnforcement] api mismatch error |
TLI-1575 | unable to import data from oauth connections in inVpc enabled project on Dataprep |
TLI-1570 | Schema drift is breaking for in Vpc enabled workspace on Dataprep. |
TLI-1567 | v4/importedDatasets api is breaking when inVpc settings are enabled. |
TKAN-114 | Some jobs that ingest from Google Ads are failing with API-deprecated error |
TCMAN-306 | Access Bigquery through Dataprep is slow and takes more time than usual |
TCMAN-267 | Merge output publish using BQ pushdown is publishing complex types as STRING even with objectStrictTypeMatch enabled |
TSMC-412 | Deprecated GCP Dataflow SDK version in Dataprep |
TOPZ-542 | Resolved a problem that prevented the plan from running after transferring the plan to a new owner |
TCMAN-140 | Custom SQL ingestion failing for CData-based connection types |
February 9, 2023
Release 10.0
What's New
Connectivity:
Google Spanner now supports connections to PostgresSQL database instances. For more information, see Google Spanner Connections.
BigQuery running environment:
Support for BigQuery table sources being published as files in Cloud Storage.
For more information on enabling this optimization, see Flow Optimization Settings Dialog.
For more information on limitations, see BigQuery Running Environment.
Transformer page:
Improved display of previewed columns in the data grid.
Connectivity:
Improved security for data service interactions with third-party connection types.
Connectivity:
Enable connectivity between the Dataprep by Trifacta and SAP HANA using SSH tunneling.
For more information, see Configure SSH Tunnel Connectivity.
For more information, see SAP HANA Connections.
Connectivity:
Google Spanner now supports connections to database instances that were built with PostgresSQL dialect.
For more information, Early Preview Connection Types.
BigQuery running environment:
Support for BigQuery table sources being published as files in Cloud Storage.
For more information on enabling this optimization, see Flow Optimization Settings Dialog.
For more information on limitations, see BigQuery Running Environment.
Manage purchases in the Trifacta Application:
Beginning in this release, you can complete self-service purchases via Stripe in the application.
For more information on differences between product editions, please visit Pricing and Packaging .
For more information, see Checkout Page.
For additional questions, please contact Alteryx Support.
Changes in System Behavior
SQL Server:
Updated base SQL Server driver to 11.2.0.jre8.
With this update, the following versions of SQL Server are no longer supported:
SQL Server 2012
PDW 2008R2 AU34
SQL Server 2008 R2
SQL Server 2008
The following version of SQL Server is now supported:
SQL Server 2019
This change applies to the following connection types:
Microsoft SQL Server Connections for Cloud SQL (see previous)
For more information, see https://learn.microsoft.com/en-us/sql/connect/jdbc/microsoft-jdbc-driver-for-sql-server-support-matrix?view=sql-server-ver16#sql-version-compatibility.
Support for nvarchar2 and datetime2 data types.
For more information, see SQL Server Data Type Conversions.
For more information, see Microsoft SQL Server Connections.
Documentation:
Beginning in this release, the Library Page is renamed as Library for Data Page.For more information, see Library for Data Page.
Deprecated
Templates:
A template is a pre-configured flow with annotated references to simplify the development of a working flow to satisfy a specific use case. When you start from a template, you create a copy of the templated flow, which is no longer connected to your flow.
Note
Due to changes in underlying infrastructure, access to templates will be disabled soon. To use a template, click Start from a template on the Home page.
Key Bug Fixes
None.
New Known Issues
Ticket | Description |
---|---|
TD-75991 | Spark jobs may fail when getting job status from EMR. |
TGR-183 | Flow view is broken when adding an output object via API that has |
January 11, 2022
Release 9.7
What's New
Schema validation:
Schema validation is now supported for CSV, TSV, and TXT files.
Note
Detect structure must be enabled on a file-based imported dataset.
For more information, see Overview of Schema Management.
Dataflow running environment:
Project administrators can define default execution settings that are passed to Dataflow for jobs executed within the project. For more information, see Dataflow Execution Settings Page.
Note
By default, project users are permitted to override these settings in their account preferences or in individual jobs. As needed, project administrators can disable use of these overrides, which means all project users use the same Dataflow execution settings. For more information, see Dataprep Project Settings Page.
Note
Execution settings that were previously specified for output objects are not affected by this change.
In-VPC Execution:
Tip
This feature was formerly known as Run Dataprep in Your VPC.
Support for running conversion jobs in your VPC. This includes ingestion of data from binary sources such as PDF, Excel, and Google Sheets.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Note
To process Google Sheets data in your VPC, a public key and private key must be specified. The private key must be accessible within your VPC. Otherwise, these value are optional. For more information, see Dataprep In-VPC Execution.
Tip
Service accounts are now used for execution of conversion jobs in your VPC.
Note
This capability is disabled by default when In-VPC execution is enabled. For more information on enabling this feature, please contact Alteryx Support.
For more information, see Dataprep In-VPC Execution.
Publishing:
You can publish Dataprep by Trifacta Objects and arrays of Objects as complex types in BigQuery.
For more information, see BigQuery Table Settings.
For more information, see BigQuery Data Type Conversions.
Import flow:
When importing flows in the Trifacta Application, you can now remap connections used for pre- and post-execution SQL scripts. For more information, see Import Flow.
Changes in System Behavior
Previously, Arrays could be published to BigQuery only for primitive homogenous data types. Beginning in this release, Arrays can now be published as nested data.
Note
If you have previously enabled publication of Arrays as primitive types in BigQuery, this change may cause breakages in your data pipelines.
For more information, see Improvements to the Type System.
Deprecated
None.
Key Bug Fixes
None.
New Known Issues
None.
November 10, 2022
Release 9.6
What's New
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Adobe Analytics
Microsoft Dataverse
Google BigQuery JDBC (Private Preview)
Google Contacts
For more information, see Early Preview Connection Types.
Connectivity:
Support for OAuth 2.0 authentication for Workday.
OAuth 2.0 authentication must be enabled. For more information, see Enable OAuth 2.0 Authentication.
You must also create an OAuth 2.0 client for the Trifacta Application. For more information, see OAuth 2.0 for Workday.
For more information, see Workday Connections.
Job history:
Changes to the layout of history pages for flow jobs, sample jobs, and plan runs for an improved user experience.
For more information, see Job History Page.
For more information, see Sample Jobs Page.
For more information, see Plan Runs Page.
BigQuery RunningEnvironment:
Support for merge (upsert) operations when jobs are executed in BigQuery for table- and file-based sources. See BigQuery Running Environment.
In-VPC execution:
In-VPC execution now supports connection from the Trifacta Application to an in-VPC data service instance, which enables testing connections, viewing table and schema information, and collecting initial samples from datasources hosted within your VPC.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Tip
In-VPC execution is now supported for Premium Edition product edition.
Note
This capability is disabled by default when In-VPC execution is enabled. For more information on enabling this feature, please contact Alteryx Support.
For more information, see Dataprep In-VPC Execution.
Documentation:
Over the next set of releases, a number of object types are being migrated from the Trifacta Application to the underlying platform layer.
Tip
This migration of objects to the platform layer is being performed to enable broader access to these object types in the future. You can expect to see enhancements to these capabilities in the future.
These object types include the following:
Plans
Connections
Library:
Imported datasets
Reference datasets
Macros
Job history:
Flow jobs
Sample jobs
Plan runs
Schedules
Additionally, the following capabilities are moving to the platform level:
Admin
Preferences
Over the next few releases, documentation for these objects is migrating to the new Platform area. For more information, see Platform.
Changes in System Behavior
Flow collaborators can now edit custom SQL
Collaborators on a flow who have the flow editor permission can now edit any custom SQL used in importing datasets into the flow.
For more information, see Create Dataset with SQL.
For more information on permissions, see Overview of Authorization.
Enable / Disable Data Grid
You can enable or disable the data grid in the Transformer page. When the data grid is disabled, you may not be able to edit some recipe steps.For steps that you can edit, select Preview to see the effects of the step on the data. When you select Preview, the data grid is re-enabled.
For more information, see Data Grid Panel.
For more information, see Recipe Panel.
Note
This feature can be enabled or disabled by an administrator.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-74742 | BigQuery SQL execution fails with |
New Known Issues
None.
October 17, 2022
Release 9.5
What's New
Tip
the updated resources in the left nav bar and on the Home page!
Language:
Sort rows in your dataset with new transformation recipe step. For more information, see Sort Rows.
In-VPC execution:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Note
In-VPC job execution requires additional configuration. See Dataprep In-VPC Execution.
For jobs executed in your enterprise VPC, you can now leverage service accounts for ingest from and publishing to BigQuery.
Tip
Service accounts can replace the use of user credentials for most types of in-VPC batch processing, which provides a higher level of security and is recommended by Google.
Batch job execution, which includes running jobs in Dataflow, can now utilize service accounts or Companion Service accounts for job execution within your VPC.
Service accounts can be the base project Service Account, or if it has been enabled in your project, a Companion Service Account for the user executing the job.
Note
Workload Identity must be enabled to run Dataprep by Trifacta jobs in your GKE cluster, which is required for In-VPC job execution. For more information on configuring Workload Identity, see Dataprep In-VPC Execution.
For more information on service accounts, see Google Service Account Management.
Parameterization:
Flow parameters now support a new type. Selector type parameters allow you to specify a list of permitted values for the parameter, which ensures data integrity throughout the transformation process. See Create Flow Parameter.
Schema refresh supports Excel, PDF, and Google Sheets:
You can now refresh the schemas from Excel, PDF or Google Sheets datasources converted during ingest. Schema refresh enables you to identify changes to the columns in your dataset.
For more information, see Overview of Schema Management.
Dataset schemas can be refreshed through the following pages:
See Dataset Details Page.
In Flow View. For more information, see View for Imported Datasets.
For more information on importing these datasources:
Import flow:
During the import flow process in the Trifacta Application, you can now remap connections and environment parameters in the flow to corresponding objects in the new project or workspace through drop-down menus. For more information, see Import Flow.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Workday
Google Calendar
For more information, see Early Preview Connection Types.
Connectivity:
Support for OAuth 2.0 authentication for SharePoint.
OAuth 2.0 authentication must be enabled. For more information, see Enable OAuth 2.0 Authentication.
You must also create an OAuth 2.0 client for the Trifacta Application. For more information, see OAuth 2.0 for SharePoint.
Support for Azure AD authentication for SharePoint.
For more information, see SharePoint Connections.
Billing:
Support for direct deposit payments from your US Bank account.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see Plan and Billing Page.
Asset transfer:
Transfer ownership of your individual assets to another user. See Transfer Asset Ownership.
Note
Administrators can transfer ownership of any user's asset to another user. Administrators can also transfer ownership of all of a user's assets. See User Details Page.
Job History:
Administrators can configure the default number of days of jobs to display in the Job History page. Default setting is 180
days.
Tip
Individual users can filter the list of jobs further, as needed.
For more information, see Dataprep Project Settings Page.
Transformer page:
For improved performance, you can now edit your recipes without loading a sample in the data grid.
Note
This feature must be enabled in your environment.
For more information, see Dataprep Project Settings Page.
Changes in System Behavior
New pricing and packaging model
Release 9.5 introduces a new model for what is packaged in Enterprise, Professional, and Starter product editions and pricing for each edition.
Existing customers on the old model may continue to use their current set of features until their subscription expires.
These customers cannot downgrade to another old-model product edition.
When their subscription expires:
They can renew their subscription using a product edition under the new model.
They cannot renew using any product edition under the old model.
Note
The new model does not include the Premium, Standard, and Legacy product editions. Renewing customers on these product editions must migrate to a supported product edition. For more information, see Product Editions.
The new pricing and packaging model introduces some changes to the features that are available for each product edition. Key changes:
API access has changed. See below.
Scheduling and orchestration using plans are now available across all paid editions.
There are now per-edition limits on automated job execution (scheduled jobs and API-based jobs).
For more information on differences between product editions in the new model, please visit Pricing and Packaging.
For additional questions, please contact Alteryx Support.
Subscription management through Sales
Beginning in Release 9.5, changes to your product edition are managed through Sales. In the Trifacta Application, click Contact Sales to reach out. For more information, see Start a Subscription.
API access moving to Enterprise only
Beginning in Release 9.5, all new or renewed subscriptions have access to public API endpoints on the Enterprise product edition only. For example, new or renewed subscriptions for the Professional product edition do not have access to API endpoints.
Existing customers that currently have access to API endpoints for non-Enterprise product editions can continue to use their available endpoints until their subscription expires. To use API endpoints after renewal, you must upgrade to the Enterprise product edition.
For more information on differences between product editions in the new model, please visit Pricing and Packaging.
For additional questions, please contact Alteryx Support.
Quickbooks Online
The QuickBooks Online connection type has been re-enabled. See QuickBooks Online Connections.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-69813 | Dataprep by Trifactaarray type columns in datasets that were imported before Release 9.2 are still published as String type. |
New Known Issues
None.
August 17, 2022
Release 9.4
What's New
JavaScript User Defined Functions:
The ability to create user-defined functions (UDFs) is now generally available.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
This feature is enabled by default. For more information, see Dataprep Project Settings Page.
UDFs can be created in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements. For more information, see JavaScript UDFs.
When enabled, JavaScript UDFs are uploaded through the Library page. For more information, see User Defined Functions Page.
Import:
For long-loading datasets from BigQuery, you can monitor the ingest process through the Trifacta Application as you continue your work on other tasks.
Long-loading from BigQuery may need to be enabled by a project administrator. For more information, see Dataprep Project Settings Page.
For more information, see Flow View Page.
Broader service account usage:
For jobs executed within your enterprise VPC, you can now configure the use of service accounts for most jobs.
Tip
Service accounts can replace the use of user credentials for most types of in-VPC batch processing, which provides a higher level of security and is recommended by Google.
Note
Workload Identity must be enabled to run Dataprep by Trifacta jobs in your GKE cluster, which is required for In-VPC job execution. For more information on configuring Workload Identity, see Dataprep In-VPC Execution.
For more information on service accounts, see Google Service Account Management.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
SendGrid
SAP HANA
For more information, see Early Preview Connection Types.
Connectivity:
Support for creating connections to Denodo.
Note
This connection type is disabled by default. For more information on enabling this connection type, please contact, Alteryx Support.
OAuth 2.0 authentication must be enabled. For more information, see Enable OAuth 2.0 Authentication.
You must also create an OAuth 2.0 client for the Trifacta Application. See OAuth 2.0 for Denodo.
For more information on creating the connection object, see Denodo Connections.
Connectivity:
Support for creating OAuth 2.0 connections to single-tenant and multi-tenant Microsoft Dynamics 365 Sales (Deprecated).
OAuth 2.0 authentication must be enabled. For more information, see Enable OAuth 2.0 Authentication.
You must also create an OAuth 2.0 client for the Trifacta Application. See OAuth 2.0 for Microsoft Dynamics 365 Sales.
For more information on creating the connection object, see Microsoft Dynamics 365 Sales Connections.
Schema refresh supports JSON:
You can now refresh the schemas from JSON datasource converted during ingest. Schema refresh enables you to identify changes to the columns in your dataset.
For more information, see Overview of Schema Management.
Dataset schemas can be refreshed through the following pages:
See Dataset Details Page.
In Flow View. For more information, see View for Imported Datasets.
A converted datasource is file type that must be converted during the ingestion process into a format that is natively supported by the platform. For more information on JSON as a converted datasource,see Working with JSON v2.
Email notifications enhancements:
Tip
Each email notification includes a summary of Data Quality rules (rules that were run and the success/failures of those rules) and the profile details (valid, mismatched, missing) when a job is completed successfully. In the email, click View job to view the details of the job. See Email Notifications Page.
Changes in System Behavior
Email notifications enabled by default:
Note
The settings for email notifications based on success or failure of jobs or plan runs have been enabled at the project or workspace level and at the flow level.
This change means that the user who executes a job and others who have access to the flow receive, by default, an email notification whenever a job executes for flows where email notification settings have never been modified.
If needed, these new default settings can be modified.
Project owners and administrators change default value of the email notification settings. For more information, see Dataprep Project Settings Page.
Individual users can override these settings for individual flows. For more information, see Manage Flow Notifications Dialog.
Deprecated
None.
Key Bug Fixes
None.
New Known Issues
None.
June 21, 2022
Release 9.3
What's New
Private data processing:
You can execute Trifacta Photon jobs within your enterprise's virtual private cloud (VPC).
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
In-VPC execution must be enabled by an administrator.
Note
This feature requires additional configuration of the Google Cloud Platform through the gcloud command line tools.
For more information, see Dataprep In-VPC Execution.
Private data processing:
Jobs related to ingesting, sampling, and publishing data for relational databases can now be executed within your enterprise's virtual private cloud (VPC).
Note
NOTE: This feature is in Beta release.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
In-VPC execution must be enabled by an administrator.
Note
This feature requires additional configuration of the Google Cloud Platform through the gcloud command line tools.
For more information, see Dataprep In-VPC Execution.
Expandable left nav bar:
The new left navigation bar can be expanded to display full-text options for each menu item. Collapse it to reclaim the screen area. Available options remain consistent. See Home Page.
Configure range joins:
Specify ranges of key values in your joins.
Note
This feature may need to be enabled by an administrator. See Dataprep Project Settings Page.
For more information, see Configure Range Join.
Billing:
Edit credit card and billing information and review billing history and invoices.
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
For more information, see Plan and Billing Page.
Connectivity:
Early Preview (read-only) connections available with this release:
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
Zoho CRM
Note
connections require an OAuth 2.0 client to be created in the Trifacta Application. For more information, see OAuth 2.0 for Zoho CRM.
DocuSign
For more information, see Early Preview Connection Types.
Connectivity:
Enable connectivity between the Trifacta Application and Teradata or MongoDB using SSH tunneling.
For more information on enabling SSH tunneling, see Configure SSH Tunnel Connectivity.
For more information, see MongoDB Connections.
For more information, see Teradata Connections.
Transformer page:
Improved performance of the Transformer page through asynchronous loading of initial samples.
Changes in System Behavior
Generate an initial sample:
When generating an initial sample from a set of files in a directory, the maximum number of files that can be read is now limited to 10 files by default. For more information on changing the maximum number, see Dataprep Project Settings Page.
Quickbooks Online connections are disabled:
This feature has been disabled due to technical issues. It will be re-enabled when these issues are resolved in a future release.
Deprecated
None.
Key Bug Fixes
None.
New Known Issues
None.
Earlier Releases
For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.