API Task - Publish Results
Overview
After you have run a job to generate results, you can publish those results to different targets as needed. This section describes how to automate those publishing steps through the APIs.
Note
This task applies to re-publishing job results after you have already generated them.
Note
After you have generated results and written them to one target, you cannot publish to the same target. You must configure the outputs to specify a different format and location and then run a new job.
In the application, you can publish after generating results. See Publishing Dialog.
Basic Task
Create connections to each target to which you wish to publish. Connections must support write operations.
Specify a job whose output meets the requirements for the target.
Run the job.
When the job completes, publish the results to the target(s).
Step - Create Connections
For each target, you must have access to create a connection to it. After a connection is created, it can be reused, so you may find it easier to create them through the application.
Some connections can be created via API. For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createConnection
Other connections must be created through the application. Links to instructions are provided below.
Note
Connections created through the application must be created through the Connections page, which is used for creating read/write connections. Do not create these connections through the Import Data page. See Connections Page.
Redshift connection
Required Output Format: Avro
Example Id: 2
Create via API: N
Doc Link:Amazon Redshift Connections
Other Requirements:
Requires S3 set as the base storage layer. See Set Base Storage Layer.
Hive connection
Required Output Format: Avro
Example Id: 1
Create via API: Y
Doc Link: Hive Connections
Other Requirements:
Requires integration with a Hadoop cluster.
Tableau Server connection
Required Output Format: HYPER
Example Id: 3
Create via API: Y
Doc Link: Tableau Server Connections
Other Requirements:
None.
SQL DW connection
Required Output Format: Parquet
Example Id: 4
Create via API: N
Doc Link: Microsoft SQL Data Warehouse Connections
Other Requirements:
Available only on Azure deployments. See Configure for Azure.
Step - Run Job
Before you publish results to a different datastore, you must generate results and store them in HDFS.
Note
To produce some output formats, you must run the job on the Spark running environment.
In the examples below, the following example data is assumed:
Identifier | Value |
---|---|
jobId | 2 |
flowId | 3 |
For more information on running a job, see https://api.trifacta.com/ee/9.7/index.html#operation/runJobGroup
For more information on the publishing endpoint, see https://api.trifacta.com/ee/9.7/index.html#operation/publishJobGroup
Step - Publish Results to Hive
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table
table in the default
Hive schema through connectionId=1.
Note
To publish to Hive, the targeted database is predefined in the connection object. For the path
value in the request body, you must specify the schema in this database to use. Schema information is not available through API. To explore the available schemas, click the Hive icon in the Import Data page. The schemas are the first level of listed objects. For more information, see Import Data Page.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish |
---|---|
Authentication | Required |
Method | PUT |
Request Body | { "connection": { "id": 1 }, "path": ["default"], "table": "test_table", "action": "create", "inputFormat": "avro" } |
Response:
Status Code | 200 - OK |
---|---|
Response Body | { "jobgroupId":2, "reason":"JobStarted", "sessionId":"24862060-4fcd-11e8-8622-fda0fbf6f550" } |
Step - Publish Results to Tableau Server
The following uses the HYPER results from the specified job (jobId = 2) to publish the results to the test_table3
table in the default
Tableau Server database through connectionId=3.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish |
---|---|
Authentication | Required |
Method | PUT |
Request Body | { "connection": { "id": 3 }, "path": ["default"], "table": "test_table3", "action": "createAndLoad", "inputFormat": "hyper" } |
Response:
Status Code | 200 - OK |
---|---|
Response Body | { "jobgroupId":2, "reason":"JobStarted", "sessionId":"24862060-4fcd-11e8-8622-fda0fbf6f552" } |
Step - Publish Results to SQL DW
The following uses the Parquet results from the specified job (jobId = 2) to publish the results to the test_table4
table in the dbo
SQL DW database through connectionId=4.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish |
---|---|
Authentication | Required |
Method | PUT |
Request Body | { "connection": { "id": 4 }, "path": ["dbo"], "table": "test_table4", "action": "createAndLoad", "inputFormat": "pqt" } |
Response:
Status Code | 200 - OK |
---|---|
Response Body | { "jobgroupId": 2, "jobIds": 22, "reason": "JobStarted", "sessionId": "855f83a0-dc94-11e8-bd1a-f998d808020d" } |
Step - Publish Results to Redshift
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table2
table in the public
Redshift schema through connectionId=2.
Note
To publish to Redshift, the targeted database is predefined in the connection object. For the path
value in the request body, you must specify the schema in this database to use. Schema information is not available through API. To explore the available schemas, click the Redshift icon in the Import Data page. The schemas are the first level of listed objects. For more information, see Import Data Page.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish |
---|---|
Authentication | Required |
Method | PUT |
Request Body | { "connection": { "id": 2 }, "path": ["public"], "table": "test_table2", "action": "create", "inputFormat": "avro" } |
Response:
Status Code | 200 - OK |
---|---|
Response Body | { "jobgroupId":2, "reason":"JobStarted", "sessionId":"fae64760-4fc4-11e8-8cba-0987061e4e16" } |
Step - Publish Results with Overrides
When you are publishing results to a relational source, you can apply overrides to the job to redirect the output or change the action applied to the target table. For more information, see API Task - Run Job.