API Task - Deploy a Flow
Overview
In this task, you learn how to deploy a flow in development to a production instance of the platform. After you have created and finished a flow in a Development (Dev) instance, you can deploy it to an environment designed primarily for production execution of jobs for finished flows (Prod instance). For more information on managing these deployments, see Overview of Deployment Manager.
Prerequisites
Finished flow: This example assumes that you have finished development of a flow with the following characteristics:
Single dataset imported from a table through a Redshift connection
Single JSON output
Separate Dev and Prod instances: Although it is possible to deploy flows to the same instance in which they are developed, this example assumes that you are deploying from a Dev instance to a completely separate Prod instance. The following implications apply:
Separate user accounts to access Dev (User1) and Prod (Admin2) instances.
Tip
You should do all of your recipe development and testing in Dev/Test. Avoid making changes in a Prod environment.
Note
Although these are separate user accounts, the assumption is that the same admin-level user is using these accounts through the APIs.
New connections must be created in the Prod instance to access the production version of the database table.
Task
In this example, your environment contains separate Dev and Prod instances, each of which has a different set of users.
Item | Dev | Prod |
---|---|---|
Environment | http://wrangle-dev.example.com:3005 Tip Dev environment work can be done through the UI, which may be easier. | http://wrangle-prod.example.com:3005 |
User | User1 Note User1 has no access to Prod. | Admin2 |
Source DB | devWrangleDB | prodWrangleDB |
Source Table | Dev-Orders | Prod-Orders |
Connection Name | Dev Redshift Conn | Prod Redshift Conn |
Example Flow:
User1
is creating a flow, which is used to wrangle weekly batches of orders for the enterprise. The flow contains:
A single imported dataset that is created from a Redshift database table.
A single recipe that modifies the imported dataset.
A single output to a JSON file.
Production data is hosted in a different Redshift database. So, the Prod connection is different from the Dev connection.
Steps:
Build in Dev instance: User1 creates the flow and iterates on building the recipe and running jobs until a satisfactory output can be generated in JSON format.
Export: When User1 is ready to push the flow to production, User1 exports the flow and downloads the export package ZIP file to the local desktop.
Deploy to Prod instance:
Admin2 creates a new deployment in the Prod instance.
Admin2 creates a new connection (Prod Redshift Conn) in the Prod instance.
Admin2 creates new import rules in the Prod instance to map from the old connection (Dev Redshift Conn) to the new one (Prod Redshift Conn).
Admin2 uploads the export ZIP package.
Test deployment: Through Flow View in the Prod instance, Admin2 runs a job. The results look fine.
Set schedule: Using cron, Admin2 sets a schedule to run the active release for this deployment once per week.
Each week, the Prod-Orders table must be refreshed with data.
The dataset is now operational in the Prod environment.
Step - Get Flow Id
The first general step is for the Dev user (User1) to get the flowId and export the flow from the Dev instance.
Steps:
Tip
If it's easier, you can gather the flowId from the user interface in Flow View. In the following example, the flowId is 21
:
http://www.wrangle-dev.example.com:3005/flows/21
Through the APIs, you can create a flow using the following call:
Endpoint
http://www.wrangle-dev.example.com:3005/v4/flows
Authentication
Required
Method
GET
Request Body
None.
The response should be status code
200 - OK
with a response body like the following:{ "data": [ { "id": 21, "name": "Intern Training", "description": "null", "createdAt": "2019-01-08T18:14:37.851Z", "updatedAt": "2019-01-08T18:57:26.824Z", "creator": { "id": 2 }, "updater": { "id": 2 }, "folder": { "id": 1 }, "workspace": { "id": 1 } }, { "id": 19, "name": "example Flow", "description": null, "createdAt": "2019-01-08T17:25:21.392Z", "updatedAt": "2019-01-08T17:30:30.959Z", "creator": { "id": 2 }, "updater": { "id": 2 }, "folder": { "id": 4 }, "workspace": { "id": 1 } } ] }
Retain the flow identifier (
21
) for later use.
Note
You have identified the flow to export.
For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/listFlows
Step - Export a Flow
Export the flow to your local desktop.
Tip
This step may be easier to do through the UI in the Dev instance.
Steps:
Export flowId=21:
Endpoint
http://www.wrangle-dev.example.com:3005/v4/flows/21/package
Authentication
Required
Method
GET
Request Body
None.
The response should be status code
200 - OK
. The response body is the flow itself.Download and save this file to your local desktop. Let's assume that the filename you choose is
flow-WrangleOrders.zip
.
For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/getFlowPackage
Step - Create Deployment
In the Prod environment, you can create the deployment from which you can manage the new flow. Note that the following information has changed for this environment:
Item | Prod env value |
---|---|
userId | Admin2 |
baseURL | http://www.wrangle-prod.example.com:3005 |
Steps:
Through the APIs, you can create a deployment using the following call:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/deployments
Authentication
Required
Note
Username and password credentials must be submitted for the
Admin2
account.Method
POST
Request Body
{ "name": "Production Orders" }
The response should be status code
201 - Created
with a response body like the following:{ "id": 3, "name": "Production Orders", "updatedAt": "2017-11-27T23:48:54.340Z", "createdAt": "2017-11-27T23:48:54.340Z", "creator": { "id": 1 }, "updater": { "id": 1 } }
Retain the deploymentId (
3
) for later use.
For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createDeployment
Step - Create Connection
When a flow is exported, its connections are not included in the export. Before you import the flow into a new environment:
Connections must be created or recreated in the Prod environment. In some cases, you may need to point to production versions of the data contained in completely different databases.
Rules must be created to remap the connection to use in the imported flow.
This section and the following step through these processes.
Steps:
From the Dev environment, you collect the connection information for the flow:
Endpoint
http://www.wrangle-dev.example.com:3005/v4/connections
Authentication
Required
Note
Username and password credentials must be submitted for the
User1
account.Method
GET
Request Body
None.
The response should be status code
200 - Ok
with a response body like the following:{ "data": [ { "id": 9, "host": "dev-redshift.example.com", "port": 5439, "vendor": "redshift", "params": { "connectStrOpts": "", "defaultDatabase": "devWrangleDB", "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS" }, "ssl": false, "vendorName": "redshift", "name": "Dev Redshift Conn", "description": "", "type": "jdbc", "isGlobal": true, "credentialType": "iamRoleArn", "credentialsShared": true, "uuid": "b8014610-ce56-11e7-9739-27deec2c3249", "disableTypeInference": false, "createdAt": "2017-11-21T00:55:50.770Z", "updatedAt": "2017-11-21T00:55:50.770Z", "credentials": [ { "user": "devDBuser" } ], "creator": { "id": 2 }, "updater": { "id": 2 }, "workspace": { "id": 1 } } ], "count": { "owned": 1, "shared": 0, "count": 1 } }
You retain the above information for use in Production.
In the Prod environment, you create the new connection using the following call:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/connections
Authentication
Required
Note
Username and password credentials must be submitted for the
Admin2
account.Method
POST
Request Body
{ "host": "prod-redshift.example.com", "port": 1433, "vendor": "redshift", "params": { "connectStrOpts": "", "defaultDatabase": "prodWrangleDB", "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS" }, "vendorName": "redshift", "name": "Redshift Conn Prod", "description": "", "isGlobal": true, "type": "jdbc", "ssl": false, "credentialType": "iamRoleArn", "credentials": [ { "username": "prodDBUser", "password": "<password>", "iamRoleArn": "iam:aws:12345" } ] }
The response should be status code
201 - Created
with a response body like the following:{ "id": 12, "host": "prod-redshift.example.com", "port": 5439, "vendor": "redshift", "params": { "connectStrOpts": "", "defaultDatabase": "prodWrangleDB", "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS" }, "ssl": false, "name": "Redshift Conn Prod", "description": "", "type": "jdbc", "isGlobal": true, "credentialType": "iamRoleArn", "credentialsShared": true, "uuid": "fa7e06c0-0143-11e8-8faf-27c0392328c5", "disableTypeInference": false, "createdAt": "2018-01-24T20:20:11.181Z", "updatedAt": "2018-01-24T20:20:11.181Z", "credentials": [ { "username": "prodDBUser" } ], "creator": { "id": 2 }, "updater": { "id": 2 } }
When you hit the
/v4/connections
endpoint again, you can retrieve the connectionId for this connection. In this case, let's assume that the connectionId value is12
.
See https://api.trifacta.com/ee/9.7/index.html#operation/createConnection
Step - Create Import Rules
Now that you have defined the connection to use to acquire the production data from within the production environment, you must create an import rule to remap from the Dev connection to the Prod connection within the flow definition. This rule is applied during the import process to ensure that the flow is working after it has been imported.
In this case, you must remap the uuid
value for the Dev connection, which is written into the flow definition, with the connection Id value from the Prod instance.
For more information on import rules, see API Task - Define Deployment Import Mappings.
Steps:
From the Dev environment, you collect the connection information for the flow:
Endpoint
http://www.wrangle-dev.example.com:3005/v4/connections
Authentication
Required
Note
Username and password credentials must be submitted for the
User1
account.Method
GET
Request Body
None.
The response should be status code
200 - Ok
with a response body like the following:{ "data": [ { "id": 9, "host": "dev-redshift.example.com", "port": 5439, "vendor": "redshift", "params": { "connectStrOpts": "", "defaultDatabase": "devWrangleDB", "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS" }, "ssl": false, "vendorName": "redshift", "name": "Dev Redshift Conn", "description": "", "type": "jdbc", "isGlobal": true, "credentialType": "iamRoleArn", "credentialsShared": true, "uuid": "b8014610-ce56-11e7-9739-27deec2c3249", "disableTypeInference": false, "createdAt": "2017-11-21T00:55:50.770Z", "updatedAt": "2017-11-21T00:55:50.770Z", "credentials": [ { "user": "devDBuser" } ], "creator": { "id": 2 }, "updater": { "id": 2 }, "workspace": { "id": 1 } } ], "count": { "owned": 1, "shared": 0, "count": 1 } }
From the above information, you retain the following, which uniquely identifies the connection object, regardless of the instance to which it belongs:
"uuid": "b8014610-ce56-11e7-9739-27deec2c3249",
Against the Prod environment, you now create an import mapping rule:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/deployments/3/objectImportRules
Authentication
Required
Method
PATCH
Request Body:
[{"tableName":"connections","onCondition":{"uuid": "b8014610-ce56-11e7-9739-27deec2c3249"},"withCondition":{"id":12}}]
The response should be status code
200 - Ok
with a response body like the following:{ "deleted": [] }
Since the method is a
PATCH
, you are updating the rules set that applies to all imports for this deployment. In this case, there were no pre-existing rules, so the response indicates that nothing was deleted. If another set of import rules is submitted, then the one you just created is deleted.
See https://api.trifacta.com/ee/9.7/index.html#operation/updateObjectImportRules
See https://api.trifacta.com/ee/9.7/index.html#operation/updateValueImportRules
Step - Import Package to Create Release
You are now ready to import the package to create the release.
Steps:
Against the Prod environment, you now import the package:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/deployments/3/releases
Authentication
Required
Method
POST
Request Body
The request body must include the following key and value combination submitted as form data:
key
value
data
"@path-to-flow-WrangleOrders.zip"
The response should be status code
201 - Created
with a response body like the following:{ "importRuleChanges": { "object": [{"tableName":"connections","onCondition":{"uuid": "b8014610-ce56-11e7-9739-27deec2c3249"},"withCondition":{"id":12}}], "value": [] }, "flowName": "Wrangle Orders" }
See https://api.trifacta.com/ee/9.7/index.html#operation/importPackageForDeployment
Step - Activate Release
When a package is imported into a release, the release is automatically set as the active release for the deployment. If at some point in the future, you need to change the active release, you can use the following endpoint to do so.
Steps:
Against the Prod environment, use the following endpoint:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/releases/5
Authentication
Required
Method
PATCH
Request Body
{ "active": true }
The response should be status code
200 - OK
with a response body like the following:{ "id": 3, "updater": { "id": 3 }, "updatedAt": "2017-11-28T00:06:12.147Z" }
See https://api.trifacta.com/ee/9.7/index.html#operation/patchRelease
Step - Run Deployment
You can now execute a test run of the deployment to verify that the job executes properly.
Note
When you run a deployment, you run the primary flow in the active release for that deployment. Running the flow generates the output objects for all recipes in the flow.
Note
For datasets with parameters, you can apply parameter overrides through the request body through the following API call. For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/runDeployment
Steps:
Against the Prod environment, use the following endpoint:
Endpoint
http://www.wrangle-prod.example.com:3005/v4/deployments/3/run
Authentication
Required
Method
POST
Request Body
None.
The response should be status code
201 - Created
with a response body like the following:{ "data": [ { "reason": "JobStarted", "sessionId": "dd6a90e0-c353-11e7-ad4e-7f2dd2ae4621", "id": 33 } ] }
See https://api.trifacta.com/ee/9.7/index.html#operation/runDeployment
Step - Iterate
If you need to make changes to fix issues related to running the job:
Recipe changes should be made in the Dev environment and then passed through export and import of the flow into the Prod deployment.
Connection issues:
Check Flow View in the Prod instance to see if there are any red dots on the objects in the package. If so, your import rules need to be fixed.
Verify that you can import data through the connection.
Output problems could be related to permissions on the target location.
Step - Set up Production Schedule
When you are satisfied with how the production version of your flow is working, you can set up periodic schedules using a third-party tool to execute the job on a regular basis.
The tool must hit the Run Deployment endpoint and then verify that the output has been properly generated.