Delete Data
A key task in cleaning up your data is to remove unwanted columns and rows, which can simplify future transformations and improve job execution performance. Designer Cloud Powered by Trifacta Enterprise Edition provides multiple mechanisms for removing data from your dataset.
Tip
When you are deleting data, you should consider if that data may have other uses in the future or for other users. If so, you should consider doing the data removal through a separate recipe off of your current recipe, which preserves the data for other uses in the current recipe.
Delete Columns
You can delete one or more columns based on the following:
By selection
Through transformation
Tip
When you delete through transformation steps, you have additional controls at your disposal.
By selection
You can delete a single column or multiple columns:
To delete a column from your dataset, click the column and select Delete from the column drop-down.
If you select Delete others, all other remaining columns are deleted except the selected column.
Tip
To delete multiple columns, select them in the data grid or column browser. Then selectDeletefrom the column menu.
The column or columns are removed from the data grid, and a new step is added to your recipe.
Through transformation
You can delete columns through the transformation steps.
Steps:
In the Transformer page, click Delete columns.
The Delete columns transformation is populated in the Transformer Builder.
Select one or more columns, as required:
Multiple: Select one or more columns from the drop-down list.
All: Select all columns in the dataset.
Note
This step removes all columns in your dataset.
Range: Specify a start and ending columns. All columns inclusive of start and end are deleted.
Advanced: Specify the columns using a comma-separated list. Ranges of columns can be specified using the tilde (
~
) character. Examples:Entry
Description
Store_Nbr
~Daily
Columns from
Store_Nbr
toDaily
in the dataset are deleted.Store_Name,Store_Manager,Store_Nbr
~Daily
The following columns are deleted:
Store_Name
Store_Manager
Store_Nbr
toDaily
From the Action area, select one of the following options:
Delete selected columns: Deletes only the selected columns.
Delete unselected columns: Deletes all other remaining columns except the selected columns.
To delete columns, click Add.
Example transformation:
The following transformation deletes the columns between Store_Nbr
and Daily
, inclusive.
Transformation Name |
|
---|---|
Parameter: Columns | Advanced |
Parameter: Column | Store_Nbr~Daily |
Parameter: Action | Delete selected columns |
Delete Rows
Since rows do not have an identifying header, you must identify the rows to remove in your dataset based on a specified condition. You can delete rows based on the following:
By selection
By custom conditions
By selection
You can delete rows by selecting values. You are prompted for data filtering suggestions when you select values in:
column histograms
column data quality bars
cells or values within a cell
When you make a selection, select the Delete rowstransformation in the context panel. The Transform Builder contains a transformation to filter rows based on the the condition that you have selected. For example, if you selected the value California
in the State
column, then the transformation is specified to filter out rows in which State=California
.
In the Transform Builder, you must decide if the transformation keeps matching rows (deleting all others) or deletes matching rows. In the following example, rows in which State=California
are selected for deletion:
Transformation Name |
|
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | State == "California" |
Parameter: Action | Delete matching rows |
By custom conditions
You can delete a set of rows based on a condition specified in the condition
column. If the conditional expression is true
, then the selected rows are deleted.
In the Transformer page, click the Recipe icon. The Recipe panel is displayed.
In the Search Transformations panel, enter
Filter in
.In the Filter rows transformation, enter the required details:
Condition: Filter based on the condition type that you select in the drop-down. Some condition types do not support specifying the condition by formula.
Column: The column containing the values to filter. For example,
action_count
.Values or Formula: Specify the values or the formula used to determine the condition.
If these values are present, then the condition evaluates to
true
.The formula must evaluate to
true
orfalse
.
Action: The action to be performed to the rows based on the specified conditions.
In the following example, the rows where the
action_count
column values fall between 1 and 10 are deleted:Transformation Name
Filter rows
Parameter: Condition
Custom formula
Parameter: Type of formula
Custom single
Parameter: Condition
(action_count >= 1) && (action_count <= 10)
Parameter: Action
Delete matching rows
Tip
You can apply logical operators such as
&&
(logical AND) above to build more sophisticated logical tests.
To add the recipe to the step, click Add. The dataset rows are filtered based on the configured transformation.