Dataprep Project Settings Page

The following settings can be customized for the user experience in your Dataprep by Trifacta project. When you modify a setting, the change is immediately applied to the project. To access the page, select User menu > Admin console > Settings.

Note

Users may not experience the changed environment until each user refreshes the application page or logs out and in again.

Enablement Options:

Note

Any values specified in this page applies exclusively to the specific project and override any system-level defaults.

Option	Description
Default	The default value is applied. This value may be inherited from higher level configuration. Tip You can review the default value as part of the help text.
Enabled	The setting is enabled. Note If the setting applies to a feature, the feature is enabled. Additional configuration may be required. See below.
Disabled	The setting is disabled.
Edit	Click Edit to enter a specific value for the setting.

Disable Dataprep

To disable Dataprep by Trifacta for this project, click the link.

Note

To remove a user and his or her assets from a project, please contact Alteryx Support.

For more information, see Enable or Disable Dataprep.

General

Filter Job History

Set the default number of days of jobs that are displayed in the Job History page. Default value is 180 days.

Tip

You can filter the dates of the jobs displayed in the Job History page.

For more information, see Job History Page.

Locale

Set the locale to use for inferring or validating data in the application, such as numeric values or dates. The default is United States.

Note

After saving changes to your locale, refresh your page. Subsequent executions of the data inference service use the new locale settings.

For more information, see Locale Settings.

Session duration

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Specify the length of time in minutes before a session expires. Default is 10080 (one week).

API

Allow users to generate access tokens

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, individual users can generate their own personal access tokens, which enable access to REST APIs. For more information, see Manage API Access Tokens.

Maximum lifetime for user generated access tokens (days)

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Defines the maximum number of days that a user-generated access token is permitted for use in the product.

Tip

To permit generation of access tokens that never expire, set this value to -1.

For more information, see Manage API Access Tokens.

Connectivity

Custom SQL query

When enabled, users can create custom SQL queries to import datasets from relational tables. For more information, see Create Dataset with SQL.

Enable conversion of standard JSON files via conversion service

When enabled, the Trifacta Application utilizes the conversion service to ingest JSON files and convert them to a tabular format that is easier to import into the application. For more information, see Working with JSON v2.

Note

This feature is enabled by default but can be disabled as needed. The conversion process performs cleanup and re-organization of the ingested data for display in tabular format.

When disabled, the Trifacta Application uses the old version of JSON import, which does not restructure the data and may require additional recipe steps to manually structure it into tabular format.

Note

Although imported datasets and recipes created under v1 of the JSON importer continue to work without interruption, the v1 version is likely to be deprecated in a future release. You should switch your old imported datasets and recipes to using the new version. Instructions to migrate are provided at the link below.

Note

The legacy version of JSON import is required if you are working with compressed JSON files or only Newline JSON files.

For more information, see Working with JSON v1.

Enables long loading from bigquery

When enabled, large datasets or custom SQL requests from BigQuery are ingested in the background, allowing users to continue to use the Trifacta Application while the ingest completes.

Tip

You can monitor the ingest process through Flow View or the Dataset Details page.

Manage access to data using user IAM Permissions

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, user access to data services in Google Cloud Platform, such as Cloud Storage and BigQuery is determined by the permissions defined in a user's assigned IAM role.

Note

When this feature is enabled, all Premium Edition users that belong to the project are automatically logged out of all Trifacta Application sessions across all projects. For example, if a Premium Edition user is logged into the product through another project, the user is logged out of their Trifacta Application session when this feature is enabled. When each user logs in to the Trifacta Application again, any changes to the user's permissions are applied. Since each each API request requires authentication in the header, API users are not automatically logged out.

For more information on IAM-based permissions, Required Dataprep User Permissions .

Max endpoints per JDBC REST connection

For a REST API connection to a JDBC source, this parameter defines the maximum number of endpoints that can be defined per connection .

Avoid modifying this value unless you are experiencing timeouts or failures to connect.

For more information, see REST API Connections.

Flows, workflows, recipes and plans

Column from examples

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, users can access a tool through the column menus that enables creation of new columns based on example mappings from the selected column. For more information, see Overview of TBE.

Connection mapping

When enabled, users who are importing flows from one project to another can map the connections and environment parameters referenced in the import to corresponding objects in the target project. See Import Flow.

Editor Scheduling

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, flow editors are also permitted to create and edit schedules. For more information, see Flow View Page.

Note

The Scheduling feature may need to be enabled in your environment. When enabled, flow owners can always create and edit schedules.

When this feature is enabled, plan collaborators are also permitted to create and edit schedules. For more information, see Plan View Page.

Enable creation of custom JavaScript UDFs for use in recipes

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, users can create and upload JavaScript-based user-defined functions (UDFs), which can be referenced in the recipes created in the project. For more information, see JavaScript UDFs.

Note

User-defined functions can be pushed down to BigQuery during job execution. This optimization must be enabled for each flow. For more information, see Flow Optimization Settings Dialog .

Export

When enabled, users are permitted to export their flows and plans. Exported flows can be imported into other work areas or product editions.

Note

If plans have been enabled in your project settings, enabling this flag applies to flows and plans.

For more information, see Export Flow.
For more information, seeExport Plan.

Import

When enabled, users are permitted to import exported flows and plans.

Note

If plans have been enabled in your project settings, enabling this flag applies to flows and plans.

For more information, see Import Flow.
For more information, seeImport Plan.

Maximum number of files to read in a directory for the initial sample

When the Trifacta Application is generating an initial sample of data for your dataset from a set of source files, you can define the maximum number of files in a directory from which the sample is generated. This limit is applied to reduce the overhead of reading in a new file, which improves performance in the Transformer page.

Tip

The initial sample type for files is generated by reading one file after another from the source. If the source is multiple files or a directory, this limit caps the maximum number of files that can be scanned for sampling purposes.

Note

If the files in the directory are small, the initial sample may contain the maximum number of files and less than the maximum size permitted for a sample. You may see fewer rows that expected.

If the generated sample is unsatisfactory, you can generate a new sample using a different method. In that case, this limit no longer applies. For more information, see Overview of Sampling.

Plan feature

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, users can create plans to execute sequences of recipes across one or more flows. For more information, see Plans Page.

For more information on plans and orchestration, see Overview of Operationalization.

Schematized output

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, all output columns are typecast to their annotated types. This feature is enabled by default.

UI for range join

When enabled, workspace users can specify join key matching across a range of values. For more information, see Configure Range Join.

Webhooks

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, webhook notification tasks can be configured on a per-flow basis in Flow View page. Webhook notifications allow you to deliver messages to third-party applications based on the success or failure of your job executions. For more information, see Create Flow Webhook Task.

Job execution

BigQuery execution

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, the Trifacta Application can execute transformation jobs inside BigQuery when all data sources and outputs for the job are located in BigQuery.

Note

Logical and physical optimization of jobs must also be enabled.

To enable BigQuery execution on your flow jobs, you must enable all general and BigQuery optimizations within the flow. For more information, see Flow Optimization Settings Dialog.

For more information on BigQuery as a running environment, see Overview of Job Execution.

BigQuery query temp dataset

By default, Dataprep by Trifacta assumes that the service account used to run Dataflow jobs has been granted the bigquery.datasets.create permission. For jobs that contain a BigQuery data source, this permission is required by default. Dataflow job execution on BigQuery data sources creates a temporary BigQuery dataset, in which temporary tables are created to store intermediate query results from the BigQuery sources. The ability to create BigQuery datasets is required to run Dataflow jobs that contain one or more BigQuery data sources.

In some environments, this permission cannot be granted, which prevents job execution from creating the required temporary dataset and causes the job to fail.

As an alternative, you can use this setting to specify a pre-existing BigQuery dataset within which Dataprep by Trifacta can create the temporary tables. When this BigQuery dataset is provided, the job execution process writes intermediate query results into temporary tables within the dataset, and the bigquery.datasets.create permission is no longer required.

Tip

Temp tables stored in this dataset are automatically deleted after 24 hours.

Requirements:

The dataset must be a pre-existing BigQuery dataset that is created outside ofDataprep by Trifacta. Please have your BigQuery administrator create the dataset first.
The temporary dataset must be located in the same region as the BigQuery source tables. Otherwise, the Dataflow job fails.
This BigQuery dataset is used only for Dataflow job execution on BigQuery datasources. Other sources and running environments are not affected.

Ignore publishing warnings for running jobs

When enabled, a user may execute a job if the previously saved location is not available for the current IAM permissions used to run the job. Default is Enabled .

Tip

Setting this value to Enabled is helpful for resolving changes in IAM permissions.

When disabled, the Run Job button is disabled if the previously saved location is not available through IAM permissions.

Tip

Setting this value to Disabled prevents execution of jobs that are going to fail at publication time, which can be expensive in terms of time and compute costs.

For more information, see Run Job Page.

In-VPC Conversion job execution

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When In-VPC Execution is enabled, you can optionally enable the execution of conversion jobs within your VPC. A conversion job converts binary or non-native file formats into a format that can be consumed by the product.

Some aspects of runtime execution in your VPC can be configured through the Trifacta Application. For more information, see VPC Runtime Settings Page.

In-VPC Data-Service communication

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When In-VPC Execution is enabled, you can optionally enable design-time connectivity to occur within your VPC through the data service. These jobs perform ingestion from and publishing to datastores that are available through connections.

Some aspects of connectivity in your VPC can be configured through the Trifacta Application. For more information, see VPC Runtime Settings Page.

In-VPC execution

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, jobs, jobs can be executed within your enterprise's virtual private cloud (VPC).

Note

Additional configuration through the Google Cloud command line is required. For more information, see Dataprep In-VPC Execution.

When in-VPC execution has been enabled and configured, you can configure some aspects of runtime execution through the Trifacta Application. For more information, see VPC Runtime Settings Page.

Logical and physical optimization of jobs

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, the Trifacta Application attempts to optimize job execution through logical optimizations of your recipe and physical optimizations of your recipes interactions with data.

This workspace setting can be overridden for individual flows.

Tip

You should keep this feature enabled. Please enable it at the project level and disable it only if needed at the flow level.

For more information, see Flow Optimization Settings Dialog .

Require a companion service account for running jobs

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

By default, Dataprep by Trifacta utilizes a default compute service account for running jobs on Dataflow. Optionally, you can enable this feature, which requires each user in the project to provide their own companion service account to run jobs and in-VPC jobs. This feature is disabled by default.

Prerequisites:

Service accounts must be created in the Google Cloud platform.
Companion service accounts must have a minimum set of permissions.
For more information, see Google Service Account Management.

When this feature is enabled:

Project administrators can review and specify companion service accounts for individual users of the project. For more information, see Service Accounts Page.
Individual users can specify their companion service account. For more information see User Profile Page.
At runtime, an override service account can be applied if needed. See Run Job Page.
If In-VPC job execution has been enabled, the companion service account is applied to those jobs, as well. For more information, see Dataprep In-VPC Execution.

When this feature is disabled:

By default, all users of the project use the Compute Engine service account specified for the project.
If companion service accounts has been enabled, when it's disabled, the default service account for the project is used.
For more information, see Google Service Account Management.

SQL Scripts

When enabled, users may define SQL scripts to execute as part of a job's run. Scripts can be executed before data ingestion, after output publication, or both through any write-supported relational connection to which the user has access.

For more information, see Create Output SQL Scripts.

Schema validation feature

When enabled, by default the structure and ordering of columns in your import datasets are checked for changes before data is ingested for job execution.

Tip

Schema validation can be overridden for individual jobs when the schema validation option is enabled in the job settings. See below.

Errors are immediately reported in the Job Details page. See Job Details Page.

For more information on schema validation, see Overview of Schema Management.

Schema validation option in job settings

When the schema validation feature and this setting are enabled, users can make choices on how individual jobs are managed when schema changes are detected. This setting is enabled by default.

For more information, see Run Job Page.

For more information on schema validation, see Overview of Schema Management.

Schema validation option to fail job

When schema validation is enabled, this setting specifies the default behavior when schema changes are found.

When enabled, jobs are failed when schema changes are found, and error messages are surfaced in the Trifacta Application.
When disabled, jobs are permitted to continue.
- Jobs may ultimately fail due to schema changes.
- Jobs may result in bad data being written in outputs.
- Job failures may be more challenging to debug.
  Tip
  Setting this value to Disabled matches the behavior of the Trifacta Application from before schema validation was possible.

Tip

This setting can be overridden for individual jobs, even if it is disabled. For more information, see Run Job Page.

Errors are immediately reported in the Job Details page. See Job Details Page.

For more information on schema validation, see Overview of Schema Management.

Skip write settings validation

When enabled, write settings objects are not validated as part of job execution. Write settings are used to define the outputs for file-based results. Default is enabled.

Note

When this feature is enabled, no validations are performed of any writesettings objects for scheduled and API-based jobs. Issues with these objects may cause failures during the transformation and publishing stages of job execution.

Tip

Before running a job via schedule or API that produces file-based outputs, you should do a test manual execution of the job to verify the outputs.

Trifacta Photon execution

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, users can choose to execute their jobs on Trifacta Photon, a proprietary running environment built for execution of small- to medium-sized jobs in memory on the Trifacta node.

Note

Jobs executed in Trifacta Photon are executed within the Alteryx VPC. Data is temporarily streamed to the Alteryx VPC during job execution and is not persisted.

Note

Jobs that are executed on Trifacta Photon may be limited to run for a maximum of 10 minutes, after which they fail with a timeout error. If your job fails due to this limit, please switch to running the job on Dataflow.

Tip

When enabled, you can select to run jobs on Trifacta Photon through the Run Job page. The default running environment is the one that is best for the size of your job.

When Trifacta Photon is disabled:

You cannot run jobs on the local running environment. All jobs must be executed on a clustered running environment.
Trifacta Photon is used for Quick Scan sampling jobs. If Trifacta Photon is disabled, the Trifacta Application attempts to run the Quick Scan job on another available running environment. If that job fails or no suitable running environment is available, the Quick Scan sampling job fails.

For more information, see Run Job Page.

Update default dataflow execution settings

When disabled, the project administrator defines the default settings, which are applied to all Dataflow jobs executed for the project. No overrides are permitted. For more information, see Dataflow Execution Settings Page.

This feature is enabled by default. When enabled:

An administrator sets the defaults for the project.
Users are permitted to update the default Dataflow execution settings for their accounts. See User Execution Settings Page.
Users can also override defaults and account settings at runtime for specific jobs. See Runtime Dataflow Execution Settings.

Scheduling and parameterization

Include Hidden Files in Parameterization

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, hidden files and hidden directories can be searched for matches for wildcard- or pattern-based parameters when importing datasets.

Tip

This can be useful for importing data from generated profiles, which are stored in the .profiler folder in a job output directory.

Note

Scanning hidden folders may impact performance. For existing imported datasets with parameters, you should enable the inclusion of hidden folders on individual datasets and run a test job to evaluate impact.

For more information, see Parameterize Files for Import.

Scheduling feature

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

When enabled, project users can schedule the execution of flows. See Add Schedule Dialog.

Publishing

JSON output format

When enabled, members can generate outputs in JSON format.

Notifications

Email notification feature

When enabled, Dataprep by Trifacta can send email notifications to users based on the success or failure of jobs. By default, this feature is Enabled.

Email notification trigger when flow jobs fail

When email notifications are enabled, you can configure the default setting for the types of failed jobs that generate an email to interested stakeholders. The value set here is the default value for each flow in the workspace.

Settings:

Setting	Description
Default (any jobs)	By default, email notifications are sent on failure of any job.
Never send	Email notifications are never sent for job failures.
Scheduled jobs	Notifications are sent only when scheduled jobs fail.
Manual jobs	Notifications are sent only when ad-hoc (manually executed) jobs fail. Tip Jobs executed via API are `Manual jobs`.
Any	Notifications are sent for all job failures.

Individual users can opt out of receiving notifications or configure a different email address. See Email Notifications Page.

Emailed stakeholders are configured by individual flow. For more information, see Manage Flow Notifications Dialog.

Email notification trigger when flow jobs succeed

When email notifications are enabled, you can configure the default setting for the types of successful jobs that generate an email to interested stakeholders. The value set here is the default value for each flow in the workspace.

For more information on the settings, see the previous section. Default setting is Default (any jobs).

Individual users can opt out of receiving notifications or configure a different email address. See Email Notifications Page.

Emailed stakeholders are configured by individual flow. For more information, see Manage Flow Notifications Dialog.

Email notification trigger when plans run

You can configure the default trigger for email notifications when a plan runs. Default setting is Default (all runs).

Setting	Description
Default (all runs)	By default, email notifications are sent to users for all plan runs.
All runs	Emails are sent for all runs.
Failed runs	Emails are sent for failed runs only.
Success runs	Emails are sent for successful runs only.

Experimental features

These experimental features are not supported.

Warning

Experimental features are in active development. Their functionality may change from release to release, and they may be removed from the product at any time. Do not use experimental features in a production environment.

These settings may or may not change application behavior.

Cache data in the Transformer intelligently

Warning

This feature has been disabled. This feature is currently disabled due to technical issues. When this message is gone, it will be available in the product. For updates on system status, please visit https://status.trifacta.com/.

Note

NOTE: This feature is in Beta release.

When enabled, this feature allows the Trifacta Application to cache data from the Transformer page periodically based on Trifacta Photon execution time. This feature enables users to move faster between recipe steps.

Default language

Select the default language to use in the Trifacta Application.

Edit recipes without loading sample

When enabled, you can perform edits in the Transformer page without loading a sample in the data grid.

Tip

This feature can be helpful when you know the edits that need to be performed and do not need sample data to perform the corrections. You can also use it to switch the active sample without loading.

In Flow View, select Edit recipe without datagrid from the context menu on the right side when the recipe is selected. See View for Recipes.

Enable/Disable data grid from view options

When enabled, you can enable or disable live previewing in the data grid of the Transformer page. Disabling can improve performance. These options are available in the Show/hide data grid options drop-down in the status bar at the bottom of the Transformer page:

Edit with data grid
- When the data grid is disabled, you may not be able to edit some recipe steps. For steps that you can edit, select Preview to see the effects of the step on the data. When you select Preview, the data grid is re-enabled.
Show column histogram

- When the data grid is enabled, you can choose to disable the column histograms in the data grid, which can improve performance.

For more information, see Data Grid Panel.

Execution time threshold (in milliseconds) to control caching in the Transformer

Warning

Note

NOTE: This feature is in Beta release.

When intelligent caching in the Transformer is enabled, you can set the threshold time in milliseconds for when Trifacta Photon updates the cache. At each threshold of execution time in Trifacta Photon, the output of the intermediate recipe (CDF) steps are cached in-memory, which speeds up movements between recipe steps in the Trifacta Application.

Language localization

When enabled, the Trifacta Application is permitted to display text in the selected language.

Show user language preference

When enabled, users are permitted to select a preferred language in their preferences. See Preferences Page.

Dataprep Project Settings Page

Disable Dataprep

General

Filter Job History

Locale

Session duration

API

Allow users to generate access tokens

Maximum lifetime for user generated access tokens (days)

Connectivity

Custom SQL query

Enable conversion of standard JSON files via conversion service

Enables long loading from bigquery

Manage access to data using user IAM Permissions

Max endpoints per JDBC REST connection

Flows, workflows, recipes and plans

Column from examples

Connection mapping

Editor Scheduling

Enable creation of custom JavaScript UDFs for use in recipes

Export

Import

Maximum number of files to read in a directory for the initial sample

Plan feature

Schematized output

UI for range join

Webhooks

Job execution

BigQuery execution

BigQuery query temp dataset

Ignore publishing warnings for running jobs

In-VPC Conversion job execution

In-VPC Data-Service communication

In-VPC execution

Logical and physical optimization of jobs

Require a companion service account for running jobs

SQL Scripts

Schema validation feature

Schema validation option in job settings

Schema validation option to fail job

Skip write settings validation

Trifacta Photon execution

Update default dataflow execution settings

Scheduling and parameterization

Include Hidden Files in Parameterization

Scheduling feature

Publishing

JSON output format

Notifications

Email notification feature

Email notification trigger when flow jobs fail

Email notification trigger when flow jobs succeed

Email notification trigger when plans run

Sharing email notifications

Experimental features

Cache data in the Transformer intelligently

Default language

Edit recipes without loading sample

Enable/Disable data grid from view options

Execution time threshold (in milliseconds) to control caching in the Transformer

Language localization

Show user language preference

Search results