API Task - Run Job

This section describes how to run a job using the APIs available in Designer Cloud Powered by Trifacta Enterprise Edition.

A note about API URLs:

In the listed examples, URLs are referenced in the following manner:

<protocol>://<platform_base_url>/

In your product, these map references map to the following:

<http or https>://<hostname>:<port_number>/

For more information, see API Reference.

Run Job Endpoints

Depending on the type of job that you are running, you must use one of the following endpoints:

Run job

Run a job to generate the outputs from a single recipe in a flow.

Tip

This method is covered on this page.

Endpoint	/v4/jobGroups/:id
Method	POST
Reference documentation	https://api.trifacta.com/ee/9.7/index.html#operation/runJobGroup

Run flow

Run all outputs specified in a flow. Optionally, you can run all scheduled outputs.

Endpoint	/v4/flows/:id/run
Method	POST
Reference documentation	https://api.trifacta.com/ee/9.7/index.html#operation/runFlow

Run deployment

Run the primary flow in the active release of the specified deployment.

Deployments are available only through the Deployment Manager. For more information, see Overview of Deployment Manager.

Endpoint	/v4/deployments/:id/run
Method	POST
Reference documentation	https://api.trifacta.com/ee/9.7/index.html#operation/runDeployment

Prerequisites

Before you begin, you should verify the following:

Get authentication credentials. As part of each request, you must pass in authentication credentials to the platform. For more information, see Manage API Access Tokens.
For more information, see https://api.trifacta.com/ee/9.7/index.html#section/Authentication
Verify job execution. Run the desired job through the Trifacta Application and verify that the output objects are properly generated.
Note
By default, when scheduled or API jobs are executed, no validations are performed of any writesettings objects for file-based outputs. Issues with these objects may cause failures during transformation or publishing stages of job execution. Jobs of these types should be tested through the Trifacta Application first. A workspace administrator can disable the skipping of these validations.
Acquire recipe (wrangled dataset) identifier. In Flow View, click the icon for the recipe whose outputs you wish to generate. Acquire the numeric value for the recipe from the URL. In the following, the recipe Id is 28629:
```
http://<platform_base_url>/flows/5479?recipe=28629&tab=recipe
```
Create output object. A recipe must have at least one output object created for it before you can run a job via APIs. For more information, see Flow View Page.

If you wish to apply overrides to the inputs or outputs of the recipe, you should acquire those identifiers or paths now. For more information, see "Run Job with Parameter Overrides" below.

Step - Run Job

Through the APIs, you can specify and run a job. To run a job with all default settings, construct a request like the following:

Note

A wrangledDataset is an internal object name for the recipe that you wish to run. Please see previous section for how to acquire this value.

Endpoint	`<protocol>://<platform_base_url>/v4/jobGroups`
Authentication	Required
Method	POST
Request Body	{ "wrangledDataset": { "id": 28629 } }
Response Code	201 - Created
Response Body	{ "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ 21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 961247, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } }

If the 201 response code is returned, then the job has been queued for execution.

Tip

Retain the id value in the response. In the above, 961247 is the internal identifier for the job group for the job. You will need this value to check on your job status.

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/runJobGroup

Tip

You have queued your job for execution.

Step - Monitoring Your Job

You can monitor the status of your job through the following endpoint:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups/<id>/
Authentication	Required
Method	GET
Request Body	None.
Response Code	200 - Ok
Response Body	{ "id": 961247, "name": null, "description": null, "ranfrom": "ui", "ranfor": "recipe", "status": "Complete", "profilingEnabled": true, "runParameterReferenceDate": "2019-08-20T17:46:27.000Z", "createdAt": "2019-08-20T17:46:28.000Z", "updatedAt": "2019-08-20T17:53:17.000Z", "workspace": { "id": 22 }, "creator": { "id": 38 }, "updater": { "id": 38 }, "snapshot": { "id": 774476 }, "wrangledDataset": { "id": 28629 }, "flowRun": null }

When the job has successfully completed, the returned status message includes the following:

"status": "Complete",

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/getJobGroup

Tip

You have executed the job. Results have been delivered to the designated output locations.

Step - Re-run Job

In the future, you can re-run the job using the same, simple request:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST
Request Body	{ "wrangledDataset": { "id": 28629 } }

The job is re-run as it was previously specified.

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createJobGroup

Step - Run Job with Overrides - Files

As needed, you can specify runtime overrides for any of the settings related to the job definition or its outputs. For file-based jobs, these overrides include:

Data sources
Execution environment
profiling
Output file, format, and other settings

Input file overrides

You can override the file-based data sources your job run. In the following example, two datasets are overridden with new files.

Note

Overrides for data sources apply only to file-based sources. File-based sources that are converted during ingestion, such as Microsoft Excel files and JSON files, cannot be swapped in this manner.

Note

Overrides must be applied to the entire file path. As part of this overrides, you can redefine the bucket from which the source data is taken.

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST
Request Body	{ "wrangledDataset": { "id": 28629 }, "overrides": { "datasources": { "airlines - region 1": [ "s3://my-new-bucket/test-override-input/airlines3.csv", "s3://my-new-bucket/test-override-input/airlines4.csv", "s3://my-new-bucket/test-override-input/airlines5.csv" ], "airlines - region 2": [ "s3://my-new-bucket/test-override-input/airlines1.csv", ] } } }

The job specified for recipe 28629 is re-run using the new data sources.

Notes:

The names of the datasources (airlines - region 1 and airlines - region 2) refer to the display name values for the datasets that are the sources for the wrangledDataset (recipe) in the flow.
You can use this API method to overwrite the bucket name for your source, but you must replace the entire path.
- The parameterized list of files can be from different folders, too.
File type and size information is not displayed in the Job Details page for these overridden jobs.
No validation is performed on the existence of these files prior to execution. If the files do not exist, the job fails.

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createJobGroup

Output file overrides

Note

Override values applied to a job are not validated. Invalid overrides may cause your job to fail.

Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.

Construct a request using the following:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "profiler": true,
    "execution": "spark",
    "writesettings": [
      {
        "path": "<new_path_to_output>",
        "format": "csv",
        "header": true,
        "asSingleFile": true,
        "includeMismatches": true
      }
    ]
  },
  "ranfrom": null
}

In the above example, the job has been launched with the following overrides:

Job will be executed on the Spark cluster. Other supported values depend on your product edition and available running environments:

Value for `overrides.execution`	Description
photon	Running environment on Trifacta node
spark	Spark on integrated cluster, with the following exceptions.
databricksSpark	Spark on Azure Databricks
emrSpark	Spark on AWS EMR
dataflow	Dataflow

Job will be executed with profiling enabled.
Output is written to a new file path.
Output format is CSV to the designated path.
Output has a header and is generated as a single file.
Output will include values if they are mismatched for the column's data type.
Note
includeMismatches is false by default. You can set it to true as an override or as part of the output object definition.

A response code of 201 - Created is returned. The response body should look like the following:

{

    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 962221,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}

Retain the id value, which is the job identifier, for monitoring.

Step - Run Job with Overrides - Tables

You can also pass job definition overrides for table-based outputs. For table outputs, overrides include:

Path to database to which to write (must have write access)
Connection to write to the target.
Tip
This identifier is for the connection used to write to the target system. This connection must already exist. For more information on how to retrieve the identifier for a connection, see
https://api.trifacta.com/ee/9.7/index.html#operation/listConnections
Name of output table
Target table type
Tip
You can acquire the target type from the vendor value in the connection response. For more information, see
https://api.trifacta.com/ee/9.7/index.html#operation/listConnections

action:

Key value	Description
create	Create a new table with each publication.
createAndLoad	Append your data to the table.
truncateAndLoad	Truncate the table and load it with your data.
`dropAndLoad`	Drop the table and write the new table in its place.

Identifier of connection to use to write data.

Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.

Construct a request using the following:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "publications": [
      {
        "path": [
          "prod_db"
        ],
        "tableName": "Table_CaseFctn2",
        "action": "createAndLoad",
        "targetType": "postgres",
        "connectionId": 3
      }
    ]
  },
  "ranfrom": null
}

In the above example, the job has been launched with the following overrides:
Note
When overrides are applied to publishing, any publications that are already attached to the recipe are ignored.
1. Output path is to the prod_db database, using table name is Table_CaseFctn2.
2. Output action is "create and load." See above for definitions.
3. Target table type is a PostgreSQL table.

A response code of 201 - Created is returned. The response body should look like the following:

{


    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 962222,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}

Retain the id value, which is the job identifier, for monitoring.

Step - Run Job with Overrides - Webhooks

When you execute a job, you can pass in a set of parameters as overrides to generate a webhook message to a third-party application, based on the success or failure of the job.

For more information on webhooks, see Create Flow Webhook Task.

Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.

Construct a request using the following:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "webhooks": [{
      "name": "webhook override",
      "url": "http://example.com",
      "method": "post",
      "triggerEvent": "onJobFailure",
      "body": {
        "text": "override" 
       },
      "headers": {
        "testHeader": "val1" 
       },
      "sslVerification": true,
      "secretKey": "123"
  }]
 }
}

In the above example, the job has been launched with the following overrides:

Override setting	Description
name	Name of the webhook.
url	URL to which to send the webhook message.
method	The HTTP method to use. Supported values: `POST`, `PUT`, `PATCH`, `GET`, or DELETE. Body is ignored for `GET` and `DELETE` methods.
triggerEvent	Supported values: `onJobFailure` - send webhook message if job fails `onJobSuccess` - send webhook message if job completes successfully `onJobDone` - send webhook message when job fails or finishes successfully
body	(optional) The value of the `text` field is the message that is sent. Note Some special token values are supported. See Create Flow Webhook Task.
header	(optional) Key-value pairs of headers to include in the HTTP request.
sslVerification	(optional) Set to `true` if SSL verification should be completed. If not specified, the value is `true`.
secretKey	(optional) If enabled, this value should be set to the secret key to use.

A response code of 201 - Created is returned. The response body should look like the following:

{
    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 962222,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}

Retain the id value, which is the job identifier, for monitoring.

Step - Run Job with Parameter Overrides

You can pass overrides of the default parameter values as part of the job definition. You can use the following mechanism to pass in parameter overrides of the following types:

Datasets with parameters (variable type)
Output object parameters
Flow parameters

The syntax is the same for each type.

Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.

Construct a request using the following:

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "runParameters": {
      "overrides": {
        "data": [
          {
            "key": "varRegion",
            "value": "02"
          }
        ]
      }
    }
  },
  "ranfrom": null
}

In the above example, the specified job has been launched for recipe 28629 . The run parameter varRegion has been set to 02 for this specific job. Depending on how it's defined in the flow, this parameter could influence change either of the following:
1. The source for the imported dataset.
2. The path for the generated output.
3. A flow parameter reference in the recipe
4. For more information, see Overview of Parameterization.

A response code of 201 - Created is returned. The response body should look like the following:

{
    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 962223,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}

Retain the id value, which is the job identifier, for monitoring.

Step - Spark Job Overrides

When it is enabled, you can submit overrides to a specific set of Spark properties for your job.

This feature and the Spark properties to override must be enabled. For more information on enabling this feature, seeEnable Spark Job Overrides.

The following example, shows how to run a job for a specified recipe with Spark property overrides applied to it. This example assumes that the job has already been configured to be executed on Spark ("execution": "spark"):

Endpoint	<protocol>://<platform_base_url>/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "sparkOptions": [
    {
      "key": "spark.executor.cores",
      "value": "2"
    },
    {
      "key": "spark.executor.memory",
      "value": "4GB"
    }
   ]
  }
}

Step - Databricks Job Overrides

You can submit overrides to a specific set of Databricksproperties for your job execution. These overrides can be applied to AWS Databricks or Azure Databricks.

General example

The following example shows how to run a job on Databricks for a specified recipe with several property overrides applied to it:

Endpoint	https://www.example.com/v4/jobGroups
Authentication	Required
Method	POST

Request Body:

{
  "wrangledDataset": {
    "id": 60
  },
  "overrides": {
    "execution": "databricksSpark",
    "profiler": true,
    "databricksOptions": [
      {"key": "maxWorkers", "value": 8},
      {"key": "poolId", "value": "pool-123456789"},
      {"key": "enableLocalDiskEncryption", "value": true}
    ]
  }
}

The above overrides do the following:

Sets the maximum number of worker nodes on the cluster to 8. Databricks is permitted to adjust the number of nodes for job execution up to this limit.
Instructs the Databricks cluster to use worker pool pool-123456789 for the job.
Enables encryption on the local Databricks cluster node of temporary job files for additional security.

Databricks job overrides reference

The following properties can be overridden for AWS Databricks and Azure Databricks jobs:

{
  "wrangledDataset": {"id": 60},
  "overrides": {
    "databricksOptions": [
      "autoterminationMinutes" : <integer_override_value>,
      "awsAttributes.availability" : "<string_override_value>",
      "awsAttributes.availabilityZone" : "<string_override_value>",
      "awsAttributes.ebsVolume.count" : <integer_override_value>,
      "awsAttributes.ebsVolume.size" : <integer_override_value>,
      "awsAttributes.ebsVolume.type" : "<string_override_value>",
      "awsAttributes.firstOnDemandInstances" : <integer_override_value>,
      "awsAttributes.instanceProfileArn" : "<string_override_value>",
      "awsAttributes.spotBidPricePercent" : <decimal_override_value>,
      "clusterMode" : "<string_override_value>",
      "clusterPolicyId" : "<string_override_value>",
      "driverNodeType" : "<string_override_value>",
      "enableAutotermination" : <boolean_override_value>,
      "enableLocalDiskEncryption" : <boolean_override_value>,
      "logsDestination" : "<string_override_value>",
      "maxWorkers" : <integer_override_value>,
      "minWorkers" : <integer_override_value>,
      "poolId" : "<string_override_value>",
      "poolName" : "<string_override_value>",
      "driverPoolId" : "<string_override_value>",
      "driverPoolName" : "<string_override_value>",
      "serviceUrl" : "<string_override_value>",
      "sparkVersion" : "<string_override_value>",
      "workerNodeType" : "<string_override_value>",
    ]
  }
}

Note

Overrides that begin with awsAttributes apply to AWS Databricks only.

Note

If a Databricks cluster policy is used, all job-level overrides except for clusterPolicyId are ignored.

For more information:

In this section:

API Task - Run Job

Run Job Endpoints

Run job

Run flow

Run deployment

Prerequisites

Step - Run Job

Step - Monitoring Your Job

Step - Re-run Job

Step - Run Job with Overrides - Files

Input file overrides

Output file overrides

Step - Run Job with Overrides - Tables

Step - Run Job with Overrides - Webhooks

Step - Run Job with Parameter Overrides

Step - Spark Job Overrides

Step - Databricks Job Overrides

General example

Databricks job overrides reference

Search results