Skip to main content

Library for Data Page

Review the assets to which you have access in the Library for Data page.

To open the Library for Data, select Library in the left navigation bar.

Tip

If you land in an empty Library for Data page, you can start adding datasets. Click Import Data. See Import Data Page.

LibraryPage.png

Figure: Library for Data Page

Tabs:

  • All Data: You can view all the imported datasets or references available to you.

  • Imported Datasets: Review your imported datasets from sources such as file-based storage, connected databases, or desktop.

    • The Source column indicates where the original source data is located.

  • References: Reference datasets are created from a recipe's output.

  • Macros: Macros are sequence of steps that can be reused in other's recipe.

  • User defined functions: Review any user defined functions that you have created and uploaded to the project.

Filter by type:

Click one of the pre-defined filters to show datasets of the following types:

Filter by ownership:

For the selected object type, you can filter based on the ownership of the object:

  • All: All objects of the selected type to which you have access.

  • Owned by me: All objects of the selected type that you own.

  • Shared with me: All objects of the type that have been shared with you.

Columns:

  • Name: Name of the asset.

  • Owner: Owner of the asset.

  • In flows: Count of flows in which the object is in use.

  • Source: Location where the asset is located.

  • Last updated: Timestamp of the last time that the asset was modified.

Actions:

  • Browse: If displayed, use the page browsing controls to explore the available objects.

  • Search: To search object names, enter a string in the search bar. Results are highlighted immediately in the page.

  • Sort: Click a column header to sort the display by the column's entries.

Object Actions:

Hover over an object to reveal these actions on the right side of the screen.

  • Details: Review details about the dataset. See Dataset Details Page.

  • Preview: Inspect a preview of the dataset.

    Note

    Preview is not available for binary format sources.

  • Use in new flow: (Imported dataset only) You can create a new flow and begin immediately wrangling the dataset. This step also creates a recipe in the flow.

  • Add to flow: Add the dataset to a new or existing flow.

  • Make a copy: Create a copy of the imported dataset. This option is not available for reference datasets.

  • Edit name and description: Change the name and description of the dataset.

  • Edit data settings: If the source of the imported dataset required conversion to an internally supported format, you can modify settings related to that conversion process. For more information, see File Import Settings.

    Tip

    This setting applies primarily to binary file formats, such as PDF and Excel, or file formats that may require additional steps to convert into tabular data, such as JSON.

  • Refresh dataset: If available, this option refreshes the dataset's metadata with the latest source schema.

Note

When a dataset is refreshed, all samples associated with the dataset are deleted, whether the dataset has changed. Samples must be recreated in their recipes.

Note

If you attempt to refresh the schema of a parameterized dataset based on a set of files, only the schema for the first file is checked for changes. If changes are detected, the other files are contain those changes as well. This can lead to changes being assumed or undetected in later files and potential data corruption.

For more information, see Overview of Schema Management.

  • Transfer ownership: For assets that you own, you can transfer ownership of them to another user. For more information, see Transfer Asset Ownership.

  • Delete dataset: Delete the dataset.

    Warning

    Deleting a dataset cannot be undone.

Imported Datasets

Note

You can only see the imported datasets to which you have access in your currently selected project or workspace. If the data underlying the imported dataset is not available, the imported dataset is still listed in the page, since it is just a reference to the data.

To create a new imported dataset, click Import Data. For more information, see Import Data Page.

For more information, see Imported Datasets Page.

References

A reference dataset is a reference to a recipe's output. For more information, see References Page.

Note

A reference dataset is a read-only object where it is referenced. A reference dataset must be created in the source flow from the recipe to use.

A reference dataset is created from the context menu of a flow's recipe.

Macros

A macro is a saved sequence of one or more recipe steps that can be reused in other recipes. See Macros Page.You can either import macros from your desktop or browse through the Dataprep by Trifacta platform community page for existing macros. For more information, see Import Macro.

User-Defined Functions

A user-defined function (UDF) is an externally created function that can be imported into the product for use in your recipe steps. For more information, see User-Defined Functions Page.