Running Job Basics
This section provides an overview of running and executing job basics.
Configure Job
When you are ready to test your recipe against the entire dataset, clickRun in the Transformer page. In the Run Job page, you specify the output formats and any compression to apply. Unless you are working with a large dataset, compression is unneeded for this basic walkthrough. If you do not specify a publishing action, which includes output information, a default one is created for you.
For more information, see Run Job Page.
Run Job
To queue the specified job for execution, click Run.
The job is queued up for processing.
You can track progress in the Job Details page.
If visual profiling was enabled for the job, click the Profile tab.
When the job is completed, you can access results in the Output Destinations tab.
For more information, see Job Details Page.
Visual Profiling
When you define your job, you can choose to generate a visual profile, which provides visual information on the quality of your results, including statistical information about each column.
Tip
Visual profiles can be useful for troubleshooting wrangling issues and for summarizing your datasets for other analysis.
Tip
Optionally, you can disable generating a visual profile of your results. While the visual profile is very useful for examining issues in your recipe and iterating, it is a resource-intensive process. If you are working with large datasets that do not require additional debugging, you can consider disabling the profiling of your results. For more information, see Overview of Visual Profiling.
Tip
Depending on your product configuration, you may have multiple running environments available to you. In most cases, you should choose to use the default running environment, which is selected for you based on the size of the dataset.
For more information on the job execution options, see Run Job Page.
Iterate
In the Profile tab of the Job Details page, you can review the effects of the transformation recipe across the entire dataset. Statistics and data histograms provide overall visibility into the quality of your transformation recipe.
See Job Details Page.
Use the links in the Job Details page to resume working on your dataset sample and the related recipe, generating jobs when you think you are done, until you have generated the appropriate dataset.
Create Output
Every job requires an output object, which defines the format, location, and other settings of the output file(s) or table(s) that are produced when the job completes.
Tip
When you run the first job on a recipe, a default output object is created for you to produce a CSV in your default output location.
You can modify or define outputs to meet your pipeline requirements. For more information, see Create Outputs.
Export Results
During job execution, you can monitor progress on the job through the Job Details page. When the job is complete, your results are ready in the designated output location and format.
As needed, you can download results for offline use or use them to create a new dataset.
For more information, see Export Basics.
Schedule Jobs
In many cases, the same job needs to be executed on a periodic basis. For example, your source dataset may be updated on a weekly basis to include a fresh set of transactions. As needed, you can schedule the execution of your jobs to refresh the output data. For more information, see Schedule a Job.