Skip to main content

Create Dataset with Parameters

This section provides an overview on how to parameterize relational sources and files while importing data into the Dataprep by Trifacta platform.

For more information on parameterization of datasets and other types of parameters, see Overview of Parameterization.

From File System

When browsing for data on your default storage layer, you can choose to parameterize elements of the path. Through the Import Data page, you can select elements of the path, apply one of the supported parameter types and then create the dataset with parameters.

Note

When you import a file, the data is not stored in the Dataprep by Trifacta platform . What you create is an imported dataset thatis simply a reference to the source of the data. The Dataprep by Trifacta platform never stores or modifies source data.

When you create a dataset with parameters in the Dataprep by Trifacta platform, you can replace segments of the input path with parameters. Suppose you have the following files that you'd like to capture through a parameterized dataset:

//source/user/me/datasets/month01/2017-01-31-file.csv
//source/user/me/datasets/month02/2017-02-28-file.csv
//source/user/me/datasets/month03/2017-03-31-file.csv
//source/user/me/datasets/month04/2017-04-30-file.csv
//source/user/me/datasets/month05/2017-05-31-file.csv
//source/user/me/datasets/month06/2017-06-30-file.csv
//source/user/me/datasets/month07/2017-07-31-file.csv
//source/user/me/datasets/month08/2017-08-31-file.csv
//source/user/me/datasets/month09/2017-09-30-file.csv
//source/user/me/datasets/month10/2017-10-31-file.csv
//source/user/me/datasets/month11/2017-11-30-file.csv
//source/user/me/datasets/month12/2017-12-31-file.csv

A parameterized reference to all of these files would look something like:

//source/user/me/datasets/month##/YYYY-MM-DD-file.csv

Through the application, you can specify the parameters to match all values for:

  • ## - You can use a wildcard or (better) a pattern to replace these values.

  • YYYY-MM-DD - A formatted Datetime parameter can replace these values.

For more information, see Parameterize Files for Import.

Parameterize bucket names

You can create environment parameters for your bucket names.

From Relational Sources

You can create datasets from a relational source by applying parameters to the custom SQL that pulls the data from the source. During import of database tables through relational connections, you can apply parameters to the SQL query that you use to define the imported dataset. In some scenarios, you may need to define the table to import using a variable parameter or to parameterize the time value associated with a table name. Using parameters, you can define the tables, columns, and conditions of the query that you use to bring in data from a relational database.

For more information, see Parameterize Tables for Import.

Edit Parameter

After you have created your dataset with parameters, you can edit the parameter as needed.

Steps:

  1. In the left nav bar, select Library for Data.

  2. In the Library for Data page, locate the dataset. From its context menu, select either of the following:

    1. Files: Select Edit parameters. In the Edit Dataset with Parameters, click the parameter to modify its definition.

    2. Tables: Click Edit Custom SQL. In the Custom SQL window, you can modify the SQL statement, including any parameters in it. For more information, see Create Dataset with SQL.

Apply Parameter Overrides

After you have created a parameterized dataset, you can apply overrides to the default value. These override values can be applied in the following order of precedence.

  • Job: When you choose to execute a job, you can set a new value for the parameter, which is applied for the specified job only.

  • Flow: If your imported dataset containing a parameter is added to a flow, you can define an override value for the dataset's parameter through Flow View. Whenever a job is executed on the imported dataset within the flow, the override value is applied to the dataset.

    Note

    If a job-level override is applied on top of a flow-level override, the job override value is applied to the job.

  • Default: The default value for the parameter is used if no override is applied.

Apply parameter overrides for your flow

Steps:

  1. Open the flow.

  2. In Flow View, select the icon for your dataset with parameters.

  3. From the context menu, select Parameter.

  4. In the Manager Parameters dialog, click the Overrides tab.

  5. Edit the required values, click Save.

For more information, see Manage Parameters Dialog.

Apply parameter overrides for your job

You can apply parameter overrides to your job.

Steps:

  1. In Flow View, select the output that you wish to generate.

  2. In the right context panel, click Run Job.

  3. In the Run Job page, you can specify job-level overrides at the bottom of the screen.

For more information, see Run Job Page.

Delete Parameter

Steps:

  1. In the Edit Dataset with Parameters screen, select the parameter that you wish to remove.

    Note

    Before you remove parameter, you may want to take note of the default value, which may need to be applied to the path or query after you remove the parameter.

  2. In the popup, click Delete.

  3. Save your changes.

  4. The parameter is removed from the imported dataset definition.