Unnest Transform

Note

Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.

Unpacks nested data from an Array or Object column to create new rows or columns based on the keys in the source data.

This transform works differently on columns of Array or Object type.

The unnest transform must include keys that you specify as part of the transform step. To unnest a column of array data that contains no keys, use the flatten transform. See Flatten Transform.

This transform might be automatically applied as one of the first steps of your recipe.

Basic Usage

unnest col: myObj keys:'sourceA','sourceB' pluck:true markLineage:true

Output:

Extracts from the myObj column the corresponding values for the keys sourceA and sourceB into two new columns.
Since markLineage is true, these new column names are prepended with the source name: sourceA_column1 and sourceB_column2.
Any non-missing values from the source columns are added to the corresponding new columns and are removed from the source column, since pluck is true.

Syntax and Parameters

unnest col:column_ref keys:'key1','key2' [pluck:true|false] [markLineage:true|false]

Token	Required?	Data Type	Description
unnest	Y	transform	Name of the transform
col	Y	string	Source column name
keys	Y	string	Comma-separated list of quoted key names. See below for examples.
pluck	N	boolean	If `true`, any values unnested from the source are also removed from the source. Default is `false`.
markLineage	N	boolean	If `true`, the names of new columns are prepended with the name of the source column.

For more information on syntax standards, see Language Documentation Syntax Notes.

col

Identifies the column to which to apply the transform. You can specify only one column.

Usage Notes:

Required?	Data Type
Yes	String (column name)

keys

Comma-separated list of keys to use to extract data from the specified source column.

Key values must be quoted. (e.g 'key1','key2'). Any quoted value is considered the path to a single key.
Key values are case-sensitive.
Each key must be listed. A range of keys cannot be specified.

Note

Keys that contain non-alphanumeric values, such as spaces, must be enclosed in square brackets and quotes. Values with underscores do not require this bracketing.

The comma-separated list of keys determines the columns to generate from the source data. If you specify three values for keys, the three new columns contain the corresponding values from the source column.

This parameter has different syntax to use for single-level and multi-level nested data. There are also variations in syntax between Object and Array data type.

Usage Notes:

Required?	Data Type
Yes	Comma-separated String values. Syntax examples are provided below.

Keys for Object data - single-level

Note

Key names are case-sensitive.

For a single, top-level key in an Object field, you can specify the key as a simple quoted string:

unnest col:myCol keys: 'myObjKey'

The above looks for the key myObjKey among the top-level keys in the Object and returns the corresponding value for the new column. You can also bracket this key in square brackets:

unnest col:myCol keys: '[myObjKey]'

To specify multiple first-level keys, use the following:

unnest col:myCol keys:'myObjKey','my2ndObjKey'

The above generates two new columns ( myObjKey and my2ndObjKey) containing the corresponding values for the keys.

Keys for Object data - multi-level

You can also reference keys that are below the first level in the Object.

Example data:

{ "Key1" :
  { "Key1A" :
    { "Key1A1" : "Value1" }
  }
}
{ "Key2" :
  { "Key2A" :
    { "Key2A1" : "Value2" }
  }
}
{ "Key3" :
  { "Key3A" :
    { "Key3A1" : "Value3" }
  }
}

To acquire the data for the Key1A key, use the following:

unnest col: myCol keys: 'Key1[Key1A]'

In the new column, the displayed value is the following:

{ "Key1A1" : "Value1" }

To unnest a third-layer value, use a transform similar to the following:

unnest col: myCol keys: 'Key2[Key2A][Key2A1]'

In the new column, this transform generates a value of Value2.

Keys for Array data - single level

You can reference array elements using zero-based indexes or key names.

Note

All references to Array keys must be bracketed. Array keys can be referenced by index number only.

Example array data:

["red","orange","yellow","green","blue","indigo","violet"]

unnest col: myCol keys:'[1]'

The above transform retrieves the value orange from the array.

unnest col: myCol keys:'[1]','[3]'

Returned values: orange and green.

Keys for Array data - multi-level

The following example nested Array data matches the structure of the Object data in the previous example:

[ [ "Item1", ["Item1A", ["Item1A1","Value1"] ] ], [ "Item2", ["Item2A",  ["Item2A1","Value2"] ] ], [ "Item3", ["Item3A",["Item3A1","Value3"] ] ] ]

To unnest the value for Items2A:

unnest col:myCol keys:'[1][0]'

The value inserted into the new column is the following:

["Item2A1","Value2"]

To unnest from the third level:

unnest col:myCol keys:'[2][0][0]'

The inserted value is Item3A.

pluck

Indicates whether any values added from source to output columns should be removed from the source.

Set to true to remove values from source after they have been added to output columns.
(Default) Set to false to leave source columns untouched.

Usage Notes:

Required?	Data Type
No	Boolean

markLineage

When set to true, the names of new columns are prepended with the name of the source column. Example:

Source Column	Output Column
mySourceColumn	mySourceColumn_column1

Nested key references are appended to the column name:

Source Column	Key Value	Output Column
mySourceColumn	keys: '[Key1][Key2]'	mySourceColumn_Key1_Key2

Note

If your unnest transform does not change the number of rows, you can still access source row number information in the data grid, assuming it was still available when the transform was executed.

Usage Notes:

Required?	Data Type
No	Boolean

Examples

Tip

For additional examples, see Common Tasks.

Example - Unnest an Object

You have the following dataset. The Sizes column contains Object data on available sizes.

Source:

ProdId	ProdName	Sizes
1001	Hat	{'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'Y'}
1002	Shirt	{'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'N'}
1003	Pants	{'Small':'Y','Medium':'Y','Large':'Y','Extra-Large':'N'}

Transformation:

Note

Depending on the format of your source data, you might need to perform some replacements in the Sizes column in order to make it inferred as proper Object type values. The final format should look like the above.

If it is not inferred already, set the type of the Sizes column to Object:

Transformation Name	`Change column data type`
Parameter: Columns	Sizes
Parameter: New type	Object

Unnest the data into separate columns. The following prepends Sizes_ to the newly generated column name.

Transformation Name	`Unnest Objects into columns`
Parameter: Column	Sizes
Parameter: Paths to elements	'Small','Medium','Large','Extra-Large'
Parameter: Include original column name	test

You might find it useful to addpluck:trueto the above transform. When added, values that are un-nested are removed from the source, leaving only the values that weren't processed:

Transformation Name	`Unnest Objects into columns`
Parameter: Column	Sizes
Parameter: Paths to elements	'Small','Medium','Large','Extra-Large'
Parameter: Remove elements from original	true
Parameter: Include original column name	true

If all values have been processed, the Sizes column now contains a set of maps missing data. You can use the following to determine if the length of the remaining data is longer than two characters. This transform is a good one to just preview:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	(len(Sizes) > 2)
Parameter: New column name	'len_Sizes'

You can delete the source column:

Transformation Name	`Delete columns`
Parameter: Columns	Sizes
Parameter: Action	Delete selected columns

Results:

When you are finished, the dataset should look like the following:

ProdId	ProdName	Sizes_Small	Sizes_Medium	Sizes_Large	Sizes_Extra-Large
1001	Hat	N	Y	Y	Y
1002	Shirt	N	Y	Y	N
1003	Pants	Y	Y	Y	N

Example - Unnest an array

The following example demonstrates differences between the unnest and the flatten transform, including how you use unnest to flatten array data based on specified keys.

For more information, see Flatten Transform.

This example illustrates you to use the flatten and unnest transforms.

Source:

You have the following data on student test scores. Scores on individual scores are stored in the Scores array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:

One row for each student test
Unique identifier for each student-score combination

LastName	FirstName	Scores
Adams	Allen	[81,87,83,79]
Burns	Bonnie	[98,94,92,85]
Cannon	Charles	[88,81,85,78]

Transformation:

When the data is imported from CSV format, you must add a header transform and remove the quotes from the Scores column:

Transformation Name	`Rename column with row(s)`
Parameter: Option	Use row(s) as column names
Parameter: Type	Use a single row to name columns
Parameter: Row number	1

Transformation Name	`Replace text or pattern`
Parameter: Column	colScores
Parameter: Find	'\"'
Parameter: Replace with	''
Parameter: Match all occurrences	true

Validate test date: To begin, you might want to check to see if you have the proper number of test scores for each student. You can use the following transform to calculate the difference between the expected number of elements in the Scores array (4) and the actual number:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	(4 - arraylen(Scores))
Parameter: New column name	'numMissingTests'

When the transform is previewed, you can see in the sample dataset that all tests are included. You might or might not want to include this column in the final dataset, as you might identify missing tests when the recipe is run at scale.

Unique row identifier: The Scores array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of LastName-FirstName-Scores values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called Tests, which contains an index array for the number of values in the Scores column. Index values start at 0:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	range(0,arraylen(Scores))
Parameter: New column name	'Tests'

Also, we will want to create an identifier for the source row using the sourcerownumber function:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	sourcerownumber()
Parameter: New column name	'orderIndex'

One row for each student test: Your data should look like the following:

LastName	FirstName	Scores	Tests	orderIndex
Adams	Allen	[81,87,83,79]	[0,1,2,3]	2
Burns	Bonnie	[98,94,92,85]	[0,1,2,3]	3
Cannon	Charles	[88,81,85,78]	[0,1,2,3]	4

Now, you want to bring together the Tests and Scores arrays into a single nested array using the arrayzip function:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	arrayzip([Tests,Scores])

Your dataset has been changed:

LastName	FirstName	Scores	Tests	orderIndex	column1
Adams	Allen	[81,87,83,79]	[0,1,2,3]	2	[[0,81],[1,87],[2,83],[3,79]]
Adams	Bonnie	[98,94,92,85]	[0,1,2,3]	3	[[0,98],[1,94],[2,92],[3,85]]
Cannon	Charles	[88,81,85,78]	[0,1,2,3]	4	[[0,88],[1,81],[2,85],[3,78]]

Use the following to unpack the nested array:

Transformation Name	`Expand arrays to rows`
Parameter: Column	column1

Each test-score combination is now broken out into a separate row. The nested Test-Score combinations must be broken out into separate columns using the following:

Transformation Name	`Unnest Objects into columns`
Parameter: Column	column1
Parameter: Paths to elements	'[0]','[1]'

After you delete column1, which is no longer needed you should rename the two generated columns:

Transformation Name	`Rename columns`
Parameter: Option	Manual rename
Parameter: Column	column_0
Parameter: New column name	'TestNum'

Transformation Name	`Rename columns`
Parameter: Option	Manual rename
Parameter: Column	column_1
Parameter: New column name	'TestScore'

Unique row identifier: You can do one more step to create unique test identifiers, which identify the specific test for each student. The following uses the original row identifier OrderIndex as an identifier for the student and the TestNumber value to create the TestId column value:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	(orderIndex * 10) + TestNum
Parameter: New column name	'TestId'

The above are integer values. To make your identifiers look prettier, you might add the following:

Transformation Name	`Merge columns`
Parameter: Columns	'TestId00','TestId'

Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the LastName value, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following:

Transformation Name	`Merge columns`
Parameter: Columns	'LastName','FirstName'
Parameter: Separator	'-'
Parameter: New column name	'studentId'

You can now use this as a grouping parameter for your calculation:

Transformation Name	`New formula`
Parameter: Formula type	Single row formula
Parameter: Formula	average(TestScore)
Parameter: Group rows by	studentId
Parameter: New column name	'avg_TestScore'

Results:

After you delete unnecessary columns and move your columns around, the dataset should look like the following:

TestId	LastName	FirstName	TestNum	TestScore	studentId	avg_TestScore
TestId0021	Adams	Allen	0	81	Adams-Allen	82.5
TestId0022	Adams	Allen	1	87	Adams-Allen	82.5
TestId0023	Adams	Allen	2	83	Adams-Allen	82.5
TestId0024	Adams	Allen	3	79	Adams-Allen	82.5
TestId0031	Adams	Bonnie	0	98	Adams-Bonnie	92.25
TestId0032	Adams	Bonnie	1	94	Adams-Bonnie	92.25
TestId0033	Adams	Bonnie	2	92	Adams-Bonnie	92.25
TestId0034	Adams	Bonnie	3	85	Adams-Bonnie	92.25
TestId0041	Cannon	Chris	0	88	Cannon-Chris	83
TestId0042	Cannon	Chris	1	81	Cannon-Chris	83
TestId0043	Cannon	Chris	2	85	Cannon-Chris	83
TestId0044	Cannon	Chris	3	78	Cannon-Chris	83

Example - extracting key values from car data and then unnesting into separate columns

This example shows how you can unpack data nested in an Object into separate columns.

Source:

You have the following information on used cars. The VIN column contains vehicle identifiers, and the Properties column contains key-value pairs describing characteristics of each vehicle. You want to unpack this data into separate columns.

VIN	Properties
XX3 JT4522	year=2004,make=Subaru,model=Impreza,color=green,mileage=125422,cost=3199
HT4 UJ9122	year=2006,make=VW,model=Passat,color=silver,mileage=102941,cost=4599
KC2 WZ9231	year=2009,make=GMC,model=Yukon,color=black,mileage=68213,cost=12899
LL8 UH4921	year=2011,make=BMW,model=328i,color=brown,mileage=57212,cost=16999

Transformation:

Add the following transformation, which identifies all of the key values in the column as beginning with alphabetical characters.

The valueafter string identifies where the corresponding value begins after the key.
The delimiter string indicates the end of each key-value pair.

Transformation Name	`Convert keys/values into Objects`
Parameter: Column	Properties
Parameter: Key	`{alpha}+`
Parameter: Separator between key and value	`=`
Parameter: Delimiter between pair	','

Now that the Object of values has been created, you can use the unnest transform to unpack this mapped data. In the following, each key is specified, which results in separate columns headed by the named key:

Note

Each key must be entered on a separate line in the Path to elements area.

Transformation Name	`Unnest Objects into columns`
Parameter: Column	extractkv_Properties
Parameter: Paths to elements	year
Parameter: Paths to elements	make
Parameter: Paths to elements	model
Parameter: Paths to elements	color
Parameter: Paths to elements	mileage
Parameter: Paths to elements	cost

Results:

When you delete the unnecessary Properties columns, the dataset now looks like the following:

VIN	year	make	model	color	mileage	cost
XX3 JT4522	2004	Subaru	Impreza	green	125422	3199
HT4 UJ9122	2006	VW	Passat	silver	102941	4599
KC2 WZ9231	2009	GMC	Yukon	black	68213	12899
LL8 UH4921	2011	BMW	328i	brown	57212	16999

Unnest Transform

Basic Usage

Syntax and Parameters

col

keys

Keys for Object data - single-level

Keys for Object data - multi-level

pluck

markLineage

Examples

Example - Unnest an Object

Example - Unnest an array

Example - extracting key values from car data and then unnesting into separate columns

Search results