Skip to main content

Overview of Scheduling

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

As needed, you can schedule the execution of jobs on a recurring basis. For example, if your source data for the job is updated outside of the application on a weekly basis, you can define a schedule to the job associated with the related imported dataset after the data has been refreshed. When the scheduled job successfully executes, you can collect the transformed output in the specified output location, where it is available in the published form that you have specified.

To schedule a job, you must create the following configuration objects:

  1. Define a schedule - For each supported asset type, you can define a schedule. A schedule specifies one or more recurring times (triggers) when scheduled jobs are executed. For example, in a single schedule, you can specify daily trigger times for incremental updates and monthly execution times for rollups.

    Tip

    The scheduler supports a modified form of cron job syntax. For more information, see cron Schedule Syntax Reference.

  2. Define one or more scheduled destinations - When you specify a scheduled destination, the upstream sets of transformation steps are executed whenever one of the schedule's execution times occurs. Scheduled destinations are specified like regular destinations.

    Note

    When a schedule is triggered, all transformation jobs upstream of the scheduled destination are executed. Manual destinations are not generated. You cannot create schedules for individual outputs.

Limitations

  • One schedule cannot be applied to multiple assets.

  • You cannot create separate schedules for individual assets within a flow or workflow.

  • Only an asset owner can create or modify the asset's schedule.

Data Management

Note

Since scheduled destinations are re-populated with each scheduled execution, you must determine how you wish to manage the data that is published to each location. Data management should be done outside of the Dataprep by Trifacta platform.

  • Import: Before each scheduled execution, you should refresh the source of the imported dataset with new data outside of the Dataprep by Trifacta platform.

  • Execution: Please verify that the publishing settings for your scheduled destination are consistent with how you are using the results. For example, if the scheduled destination creates a new file with the same name for each execution (replace), you must move the generated file out of the output location before the next scheduled execution.

  • Output: You must collect the generated results. While you can export the job's results through the Job History page, you may find it easier to use an external scheduler to gather the results and forward to the downstream consumer of them.

Flows for scheduling

Tip

When a schedule is executed, all outputs in a flow are generated, even if they are unused. For better performance on larger flows, you can create a separate flow that contains only the references back to the objects in the source flow that you wish to have scheduled. As an additional benefit, this separation keeps development and scheduled execution in separate flows.

Schedule a Job

Schedules and scheduled destinations are defined through the Schedules page. See Schedules Page.

Job Execution

Tracking

You can monitor a scheduled job like any other job in the application. See Job History Page.

Service interruptions

Schedule job executions may be interrupted when the services are down for maintenance or for other reasons.

  • Any scheduled job that is in progress when a service interruption begins will resume after the service interruption ends.

  • A scheduled job that is triggered during a service interruption is attempted for execution after service returns.

    Note

    If multiple scheduled executions of the same job are triggered during service interruptions, only one scheduled execution occurs after service returns. Scheduled jobs resume execution according to their schedules as normal.

  • After a service interruption, scheduled jobs are executed from a queue. It may take some time before your scheduled job is executed.