Nest Your Data
In Dataprep by Trifacta, you can nest columns into arrays and objects (maps) using a variety of transformations.
Nested data types include the following types of data:
Arrays are lists of values, grouped into a single value. Examples:
[1,2,3,4]
[1,2,[a,b],4]
Objects are collections of key-value pairs. Examples:
{"make":"Subaru","model":"Outback","year":"2015","color":"black"}
{"make":"Subaru","model":"Outback","year":{"first":"2015","last":"2021"},"color":"black"}
In the above examples:
The first example is a flat or unnested Array or Object.
The first example is a nested Array or Object.
Nest Columns into Array
This section provides simple examples of nesting columns into Arrays by extracting values from a column or nesting one or more columns into an Array column.
Create by extraction:
You can create an array of values by extracting pattern-based values from a specified column. The following transformation extracts from the msg
column a list of all values where all letters are capitalized and places them into the new acronyms
column:
Transformation Name |
|
---|---|
Parameter: Column | msg |
Parameter: Pattern matching elements in the list | `{upper}+` |
Parameter: New column name | acronyms |
msg | acronyms |
---|---|
SCUBA, IMHO, is the greatest sport in the world. | ["SCUBA","IMHO"] |
[] | |
LOL, that assignment you finished is DOA. You need to fix it PDQ. | ["LOL","DOA","Y","PDQ"] |
Notes:
An empty input column value renders an empty array.
In the final row, the Wrangle matches on the
"Y"
value. To fix this, you can change the Pattern matching value to the following, which matches on two or more uppercase letters in a row:`{upper}{upper}+`
Create by nesting:
You can create arrays by nesting together the values from multiple columns:
num1 | num2 | num3 |
---|---|---|
11 | 12 | 13 |
14 | 15 | 16 |
17 | 18 | 19 |
You can nest the values in num1
and num2
into a single array and then to nest the array with num3
:
Note
If you are nesting a multi-level array, you should nest from the lowest level to the top level.
Transformation Name |
|
---|---|
Parameter: Columns1 | num1 |
Parameter: Columns2 | num2 |
Parameter: Nest columns to | Array |
Parameter: New column name | nest1 |
Then, you can perform the nesting of the top-level elements:
Note
The order in which you list the columns to nest determines the order in which the elements appear in the generated array.
Transformation Name |
|
---|---|
Parameter: Columns1 | nest1 |
Parameter: Columns2 | num3 |
Parameter: Nest columns to | Array |
Parameter: New column name | nest2 |
In the generated columns, you notice that all values are quoted, even though these values are integers.
Note
Elements that are generated into arrays using a nest transformation are always rendered as quoted values.
You can use the following transformation to remove the quotes from the nest2
column:
Transformation Name |
|
---|---|
Parameter: Column | nest2 |
Parameter: Find | '"' |
Parameter: Replace | (empty) |
Parameter: Match all occurrences | true |
num1 | num2 | num3 | nest2 |
---|---|---|---|
11 | 12 | 13 | [[11,12],13] |
14 | 15 | 16 | [[14,15],16] |
17 | 18 | 19 | [[17,18],19] |
Nest Columns into Objects
You can nest multiple columns into a single column of objects using nest
transform.
This section provides a simple example of nesting columns into a new column of Object data type.
Source:
In the following example, furniture product dimensions are stored in separate columns in cm.
Category | Name | Length_cm | Width_cm | Height_cm |
---|---|---|---|---|
bench | Hooska | 118.11 | 74.93 | 46.34 |
lamp | Tansk | 30.48 | 30.48 | 165.1 |
bookshelf | Brock | 27.94 | 160.02 | 201.93 |
couch | Loafy | 95 | 227 | 83 |
Transformation:
Use the nest
transform to bundle the data into a single column.
Transformation Name |
|
---|---|
Parameter: Columns | Length_cm,Width_cm,Height_cm |
Parameter: Nest columns to | Object |
Parameter: New column name | 'Dimensions' |
Results:
Category | Name | Length_cm | Width_cm | Height_cm | Dimensions |
---|---|---|---|---|---|
bench | Hooska | 118.11 | 74.93 | 46.34 | {"Length_cm":"118.11","Width_cm":"74.93","Height_cm":"46.34"} |
lamp | Tansk | 30.48 | 30.48 | 165.1 | {"Length_cm":"30.48","Width_cm":"30.48","Height_cm":"165.1"} |
bookshelf | Brock | 27.94 | 160.02 | 201.93 | {"Length_cm":"27.94","Width_cm":"160.02","Height_cm":"201.93"} |
couch | Loafy | 95 | 227 | 83 | {"Length_cm":"95,"Width_cm":"227","Height_cm":"83"} |
Hybrids of Arrays and Objects
A single Array or Object can contain other Arrays, Objects, or both. In fact, this structuring is essentially the JSON file format. In this example, you can see how data, when laid out in a single table, can be assembled into a set of JSON records. See EXAMPLE - Nest JSON Records.