Create Output SQL Scripts
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
As part of job execution for an output, you can define SQL scripts to run before the job, after it, or both. These SQL scripts are stored as part of the output object definition and can be executed through any database connection to which the user has access. SQL scripts can be applied to file-based and table-based job executions.
When flows are shared, the shared user can modify SQL Scripts if the user has Editor permissions on the flow. See Overview of Sharing.
Example uses:
Insert or update log entries in a database log table before or after a job that publishes to file or database destinations.
Perform custom inserts, updates, and delete logic to other database tables based on job output data that is published to a database.
Create and refresh tables or materialized views that join the job’s output data with data from other tables using
CREATE AS SELECT
.Operational tasks such as disabling/enabling indexes and managing partitions on supported databases.
Script Types
Note
If one of these scripted steps fails, then all downstream phases of the job also fail.
Pre-job: After a job has been initiated and before data is ingested into the platform, a SQL script can be executed by the Trifacta Application.
Post-job: After job results have been generated and published, a SQL script can be executed.
Note
If publishing job fails, then all downstream tasks also fail, including the SQL script, which is not executed and is recorded as a failed phase of the job execution.
Script execution
SQL lines in an individual script are executed in the order listed in the script.
If you have defined multiple scripts of the same type (pre-job, for example), those scripts may be executed in parallel.
Note
The order of listing of scripts in the Trifacta Application does not affect the order of execution of those scripts.
A warning message is displayed if the pre/post SQL scripts do not have any valid connection.
Limitations
Warning
These SQL scripts are executed without validation through the selected database connection. There are no explicit limitations on the types of SQL statements that can be executed. It is possible to do harm through this feature.
After each SQL statement in a script, a semi-colon is required.
SQL validation is not supported for some connection types.
When flows containing output SQL scripts are imported, the connection to the database where the script is to be executed must exist in the new environment. Otherwise, the SQL script is dropped from the import.
Output SQL script actions may not be supported for all connection types. If the connection is not available in the dropdown for selecting a connection, then the feature may not be available for the connection type.
Enable
This feature may need to be enabled in your environment.
A workspace administrator can enable the use of SQL scripts. For more information, see Workspace Settings Page.
Create Output SQL Script
Through the Trifacta Application, you add SQL scripts as part of the output object definition.
Tip
Depending on the nature of your SQL script, you may choose to test it first in a demo environment on a demo database.
Where to add:
You can create SQL scripts for the following types of outputs:
Manual Settings destinations: In Flow View, you can select an output object and then modify one of its Manual Settings destinations.
Scheduled Settings destinations: In Flow View, select an output object and then modify one of its Scheduled Settings destinations.
Steps:
In Flow View, select the output object for which you wish to create the SQL script.
In the Outputs panel on the right, click the Manual Settings tab.
For the type of destination, click Edit.
In the SQL Scripts panel at the bottom of the screen, click Add Script.
In the Add SQL Script window:
Select the database connection to use for executing the SQL script.
Enter the SQL script in the panel.
Choose when you would like to execute the script:
Run before data ingest
- before the job is executedRun after data publish
- after the job results have been written
Before you save your changes, click Validate SQL.
Note
Some connection types do not support SQL validation.
Note
Validating the SQL does not execute the SQL script on the database. It performs a check of SQL syntax against the selected database.
To save your SQL script, click Add.
For more information, see SQL Scripts Panel.
Parameterize values
You can add variable or Datetime parameters to your SQL scripts.
Parameters with the same name that are also defined on input datasets, flow parameters, and output objects can be referenced during job execution to pass the same value for consistency.
Tip
You can parameterize values in your SQL script. Parameters can be variables, Datetime parameters, or environment parameters. For more information, see Overview of Parameterization.
Monitoring execution
You can monitor the execution of any SQL scripts that are part of a job execution. For more information, see Overview of Job Monitoring.
Example Scripts
In the following sections, you can review some common examples for how to use SQL scripts in your data pipelines.
Example - log entries
In this example, you insert log entries into a log table in your database before and after the execution of your job.
Pre-job:
Your SQL script might look like the following:
CREATE TABLE IF NOT EXISTS "transactions"."log-tri" ( timestamp date, jobType varchar(255), jobStatus varchar(255) ); INSERT INTO "transactions"."log-tri"(timestamp, jobType, jobStatus) VALUES ('2021-06-22','transformation','started');
The above script is composed of two statements:
CREATE TABLE IF NOT EXISTS
- This statement creates thelog-tri
table in thetransactions
database.This table is defined with three fields:
timestamp
,jobType
, andjobStatus
, each of which is assigned a data type.The
IF NOT EXISTS
keyword ensures:The table is created if it does not exist.
If it exists, then no error is returned, which could stop the job run.
INSERT INTO
- This statement inserts a record into the log-tri
table, populating each column with an appropriate VALUE
:
Column name | Value |
---|---|
timestamp | '2021-06-22' |
jobType | 'transformation' |
jobStatus | 'started' |
Tip
In the above example, the value for the timestamp
is a literal value. If needed, you can parameterize that value, so that a Datetime parameter can be inserted into the record as needed. See "Parameterize values" above.
Post-job:
After the job results have been published, a post-job SQL script might look like the following:
CREATE TABLE IF NOT EXISTS "transactions"."log-tri" ( timestamp date, jobType varchar(255), jobStatus varchar(255) ); INSERT INTO "transactions"."log-tri"(timestamp, jobType, jobStatus) VALUES ('2021-06-22','transformation','complete');
This script is very similar to the previous:
Create the table if it doesn't exist. This statement also provides schema information if you need to make modifications in the future.
Inserts a new row in the table, indicating the
transformation
job type is nowcomplete
.
Example - updates based on job results
If you write your job results through the same connection where you are executing your SQL script, you can leverage the data directly from your job results into your SQL script.
In the following scenario, a customer account dimension table in the datawarehouse ( dw.DimCustAccount custdim
) is updated with data enriched through Designer Cloud in the job results. In this case the num_emp
, industry_cd
, and duns
columns are mapped to the corresponding columns in the custenr
enriched data table with values where the customer identifier in the customer dimension table (custdim.custId
) matches the customer identifier in the enriched data table (custenr.custId
).
UPDATE TABLE dw.DimCustAccount custdim SET num_emp = custenr.empcnt, industry_cd = custenr.ind_cd, duns = custenr.duns_num FROM tri.cust_enriched custenr WHERE custdim.custId = custenr.custId;
Edit Output SQL Script
Steps:
In Flow View, select the output object.
In the context panel on the right, select the Manual Settings tab.
Click Edit next to the type of destination to modify.
In the dialog, locate the one to modify in the SQL Scripts panel. Click Edit.
Make changes as need. Click Save.
Delete Output SQL Script
Warning
After you deleting a SQL script and save the output object, the SQL script is removed permanently. Before deleting, you may wish to copy the script and paste it into a text editor.
Steps:
In Flow View, select the output object.
In the context panel on the right, select the Manual Settings tab.
Click Edit next to the type of destination to modify.
In the dialog, hover over the one to modify in the SQL Scripts panel. From the More menu, select Delete.
Create Output SQL Script via API
You can create SQL scripts via API. These scripts can then be associated with specific output objects.
Create SQL script
Key information:
Attribute | Description |
---|---|
sqlScript | Text of the SQL script. You should validate this script before inserting it into the API. |
type | Set type to be:
|
vendor | The vendor type of the database to which you are connecting. See Connection Types. |
outputObjectId | Internal identifier of the output object to which you are associating the SQL script. When the object is selected in Flow View, the identifier is part of the URL. |
connectionId | Internal identifier of the connection that you are using to execute the SQL script. |
Endpoint | /v4/sqlScripts |
---|---|
Method | POST |
For more information, see Designer Cloud Powered by Trifacta: API Reference docs
List SQL scripts
List all SQL scripts.
Endpoint | /v4/sqlScripts |
---|---|
Method | GET |
For more information, see Designer Cloud Powered by Trifacta: API Reference docs
Edit SQL script
Endpoint | /v4/sqlScripts/{id} |
---|---|
Method | PATCH |
where:
{id}
is the internal identifier of the SQL script
For more information, see Designer Cloud Powered by Trifacta: API Reference docs
Delete SQL script
Endpoint | /v4/sqlScripts/{id} |
---|---|
Method | DELETE |
where:
{id}
is the internal identifier of the SQL script
For more information, see Designer Cloud Powered by Trifacta: API Reference docs