Skip to main content

Working with Arrays

This section describes how to work with the Array data type in theTrifacta Application.Anarrayis a set of delimited values. Any individual value in the list can be a separate array, which allows for the creation of nested data arrays.

Array Types

To be recognized as an array, a source column must contain values that are:

  • Bracketed by square brackets

  • Values in cell are delimited by commas

Such columns are likely to be recognized as Array data type.

The following are valid arrays:

[1,2,3]
["A","B"]
["C",["D","E"],"F",["G",["H","I"]]]
  • Ragged arrays: If the number of elements varies between two arrays, they are considered ragged. In the above, all three arrays have a different number of top-level elements (3,2,4).

  • Nested arrays: When an array element is an array itself, the element is considered a nested array. See the last example above.

For more information, see Array Data Type.

Create Arrays

Within Designer Cloud, you can generate arrays using values from one or more columns to do so.

Create by extraction

You can create an array of values by extracting pattern-based values from a specified column. The following transformation extracts from the msg column a list of all values where all letters are capitalized and places them into the new acronyms column:

Transformation Name

Extract matches into Array

Parameter: Column

msg

Parameter: Pattern matching elements in the list

`{upper}+`

Parameter: New column name

acronyms

msg

acronyms

SCUBA, IMHO, is the greatest sport in the world.

["SCUBA","IMHO"]

[]

LOL, that assignment you finished is DOA. You need to fix it PDQ.

["LOL","DOA","Y","PDQ"]

Notes:

  • An empty input column value renders an empty array.

  • In the final row, the Wrangle matches on the "Y" value. To fix this, you can change the Pattern matching value to the following, which matches on two or more uppercase letters in a row:

    `{upper}{upper}+`

Create by nesting

You can create arrays by nesting together the values from multiple columns.

Source:

num1

num2

num3

11

12

13

14

15

16

17

18

19

You want to nest the values in num1 and num2 into a single array and then to nest the array with num3:

Note

If you are nesting a multi-level array, you should nest from the lowest level to the top level.

Transformation Name

Nest columns into Objects

Parameter: Columns1

num1

Parameter: Columns2

num2

Parameter: Nest columns to

Array

Parameter: New column name

nest1

Then, you can perform the nesting of the top-level elements:

Note

The order in which you list the columns to nest determines the order in which the elements appear in the generated array.

Transformation Name

Nest columns into Objects

Parameter: Columns1

nest1

Parameter: Columns2

num3

Parameter: Nest columns to

Array

Parameter: New column name

nest2

In the generated columns, you notice that all values are quoted, even though these values are integers.

Note

Elements that are generated into arrays using a nest transformation are always rendered as quoted values.

You can use the following transformation to remove the quotes from the nest2 column:

Transformation Name

Replace text or patterns

Parameter: Column

nest2

Parameter: Find

'"'

Parameter: Replace

(empty)

Parameter: Match all occurrences

true

After removing the unused nest1 column, the data looks like the following:

num1

num2

num3

nest2

11

12

13

[[11,12],13]

14

15

16

[[14,15],16]

17

18

19

[[17,18],19]

Create from column values

You can use one of several available functions to create arrays from a column's values.

Source:

listVals

5

TRUE

{"key1":"value1","keys2":"value2"}

[1,2,3]

My String

-5.5

The following transformation generates a new column in which each row contains an array of all of the values of the input column:

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

LIST(listVals,1000)

Parameter: New column name

listOfListVals

Results:

listVals

listOfListVals

5

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

TRUE

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

{"key1":"value1","keys2":"value2"}

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

[1,2,3]

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

My String

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

-5.5

["5","TRUE","{\"key1\":\"value1\",\"keys2\":\"value2\"}","[1,2,3]","My String","-5.5"]

Notes:

  • The second parameter on the LIST function defines the maximum number of values to write. 1000 is the default.

  • All values in the generated array are written as String values.

  • Quoted values are escaped in the output.

The following functions allow you to generate various types of arrays from a column's set of values.

Function

Description

LIST Function

Extracts the set of values from a column into an array stored in a new column. This function is typically part of an aggregation.

UNIQUE Function

Extracts the set of unique values from a column into an array stored in a new column. This function is typically part of an aggregation.

LISTIF Function

Returns list of all values in a column for rows that match a specified condition.

ROLLINGLIST Function

Computes the rolling list of values forward or backward of the current row within the specified column and returns an array of these values.

RANGE Function

Computes an array of integers, from a beginning integer to an end (stop) integer, stepping by a third parameter.

Note

The lower bound of the range is included, while the upper bound is not.

Tip

Additional examples are available in the above links for these functions.

Create from Object type

You can extract the keys of an Object column into an array of string values. In an Object type, the values are listed in quoted key/value pairs and can be nested. See Object Data Type.

Source:

Suppose your Object data looks like the following:

myObject

{"key1":"value1","key2":"value2","key3":"value3"}

{"apples":"2","oranges":"4"}

{"planes":{"boeing":"5","airbus":"4"},"trains":{"amtrak":"1","SP":"2"}, "automobiles":{"toyota":"100","nissan":"50"}}

You can run the following transformation to extract the top-level keys into arrays in a new named column:

Note

The KEYS function retrieves only the top-level keys from the Object.

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

KEYS(myObject)

Parameter: New column name

myObjectKeys

Results:

myObject

myObjectKeys

{"key1":"value1","key2":"value2","key3":"value3"}

["key1","key2","key3"]

{"apples":"2","oranges":"4"}

["apples","oranges"]

{"planes":{"boeing":"5","airbus":"4"},"trains":{"amtrak":"1","SP":"2"}, "automobiles":{"toyota":"100","nissan":"50"}}

["planes","trains","automobiles"]

For more information, see KEYS Function.

Read from Arrays

You can read values from arrays in your dataset.

Note

After an array has been created, you can append to the array or otherwise combine it with another array. You cannot replace values in the array without breaking apart the array and rebuilding it.

Function

Description

IN Function

Returns true if the first parameter is contained in the array of values in the second parameter.

ARRAYELEMENTAT Function

Computes the 0-based index value for an array element in the specified column, array literal, or function that returns an array.

ARRAYLEN Function

Computes the number of elements in the arrays in the specified column, array literal, or function that returns an array.

ARRAYUNIQUE Function

Generates an array of all unique elements among one or more arrays.

Tip

Additional examples are available in the above links for these functions.

Compute from Arrays

You can use the following functions to perform computations on the values in your arrays:

Function

Description

LISTSUM Function

Computes the sum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMAX Function

Computes the maximum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMIN Function

Computes the minimum of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTAVERAGE Function

Computes the average of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTVAR Function

Computes the variance of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTSTDEV Function

Computes the standard deviation of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

LISTMODE Function

Computes the most common value of all numeric values found in input array. Input can be an array literal, a column of arrays, or a function returning an array. Input values must be of Integer or Decimal type.

Combine Arrays

You can combine arrays together using a variety of methods of combining.

Source:

array1

array2

["1","2","3"]

["A","B","C"]

["4","5","6"]

["D","E","F"]

["7","8","9"]

["G","H","I"]

The following transformation concatenates the above arrays into a single single array:

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

ARRAYCONCAT([array1,array2])

Parameter: New column name

arrayConcat

Results:

array1

array2

arrayConcat

["1","2","3"]

["A","B","C"]

["1","2","3","A","B","C"]

["4","5","6"]

["D","E","F"]

["4","5","6","D","E","F"]

["7","8","9"]

["G","H","I"]

["7","8","9","G","H","I"]

These functions can be used to combine arrays together:

Function

Description

ARRAYCONCAT Function

Combines the elements of one array with another, listing all elements of the first array before listing all elements of the second array.

ARRAYCROSS Function

Generates a nested array containing the cross-product of all elements in two or more arrays.

ARRAYINTERSECT Function

Generates an array containing all elements that appear in multiple input arrays, referenced as column names or array literals.

ARRAYSTOMAP Function

Combines one array containing keys and another array containing values into an Object of key-value pairs.

ARRAYZIP Function

Combines multiple arrays into a single nested array, with element 1 of array 1 paired with element 2 of array 2 and so on. Arrays are expressed as column names or as array literals.

Tip

Additional examples are available in the above links for these functions.

Break out Arrays

Expand arrays into rows

You can break out arrays into individual values using the following transformations. Here is some example data from the nest2 column that was generated earlier. The num3 column is retained for reference:

num3

nest2

13

[[11,12],13]

16

[[14,15],16]

19

[[17,18],19]

You can use the following simple transformation to flatten the values in nest2 into individual values in each row:

Note

Depending on the number of elements in your arrays, you can significantly increase the size of your dataset.

Note

If a cell in the source column does not contain an array, an empty value is written into the corresponding row.

Transformation Name

Expand Array to rows

Parameter: column

nest2

Results:

num3

nest2

13

[11,12]

13

13

16

[14,15]

16

16

19

[17,18]

19

19

Note

Converting a column of arrays to rows unpacks the top level of the array only. You may have to apply this transformation multiple times.

Unnest array elements into columns

You can break out individual elements of an array into separate columns.

Note

Each element that you want broken out into a column must be listed on a separate line in Path to elements.

Source:

arrayNested

["A",["B","C"],"D"]

["H",["I","J",["K","L"]]]

["E","F","G"]

The following transform retrieves the second and third elements of each array:

Transformation Name

Unnest Objects into columns

Parameter: Column

arrayNested

Parameter: Paths to elements1

[1]

Parameter: Paths to elements2

[2]

Parameter: Include original column name

true

This one retrieves the first element of the array that is nested as the second element of the array:

Transformation Name

Unnest Objects into columns

Parameter: Column

arrayNested

Parameter: Paths to elements1

[1][0]

Parameter: Include original column name

true

The resulting data should look like the following:

arrayNested

arrayNested_1

arrayNested_2

["A",["B","C"],"D"]

["B","C"]

B

["H",["I","J",["K","L"]]]

["I","J",["K","L"]]

I

["E","F","G"]

F