Skip to main content

APPROXIMATEMEDIAN Function

Computes the approximate median from all row values in a column or group. Input column can be of Integer or Decimal.

  • If a row contains a missing or null value, it is not factored into the calculation. If the entire column contains no values, the function returns a null value.

  • When used in apivottransform, the function is computed for each instance of the value specified in thegroupparameter. See Pivot Transform.

  • The approximate percentile functions utilize a different algorithm for efficiently estimating quantiles for streaming and distributed processing, depending on the running environment where the function is computed.

    Tip

    Approximation functions are suitable for larger datasets. As the number of rows increases, accuracy and calculation performance improves for these functions.

Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

approximatemedian(myRating)

Output: Returns the approximate median of the values in the myRating column.

Syntax and Arguments

approximatemedian(function_col_ref) [group:group_col_ref] [limit:limit_count]

Argument

Required?

Data Type

Description

function_col_ref

Y

string

Name of column to which to apply the function

dec_error_bound

N

decimal

Error factor for computing approximations. Decimal value represents error factor as a percentage (0.4is 0.4%).

For more information on the group and limit parameters, see Pivot Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column the values of which you want to calculate the median. Column must contain Integer or Decimal values.

  • Literal values are not supported as inputs.

  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?

Data Type

Example Value

Yes

String (column reference)

myValues

dec_error_bound

As needed, you can insert an error boundary factor as a parameter into the computation of this approximate value.

Note

This value is not applicable to jobs executed on the Trifacta Photon running environment.

  • This value must be a Decimal literal value.

  • This decimal value represents the percentage error factor. By default, this value is 0.5 (0.5%).

Usage Notes:

Required?

Data Type

Example Value

No

Decimal (literal)

0.01

Examples

Tip

For additional examples, see Common Tasks.

Example - Percentile functions

This example illustrates you to apply percentile functions.

Functions:

Item

Description

MEDIAN Function

Computes the median from all row values in a column or group. Input column can be of Integer or Decimal.

PERCENTILE Function

Computes a specified percentile across all row values in a column or group. Input column can be of Integer or Decimal.

QUARTILE Function

Computes a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.

APPROXIMATEMEDIAN Function

Computes the approximate median from all row values in a column or group. Input column can be of Integer or Decimal.

APPROXIMATEPERCENTILE Function

Computes an approximation for a specified percentile across all row values in a column or group. Input column can be of Integer or Decimal.

APPROXIMATEQUARTILE Function

Computes an approximation for a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.

Source:

The following table lists each student's height in inches:

Student

Height

1

64

2

65

3

63

4

64

5

62

6

66

7

66

8

65

9

69

10

66

11

73

12

69

13

69

14

61

15

64

16

61

17

71

18

67

19

73

20

66

Transformation:

Use the following transformations to calculate the median height in inches, a specified percentile and the first quartile.

  • The first function uses a precise algorithm which can be slow to execute across large datasets.

  • The second function uses an appropriate approximation algorithm, which is much faster to execute across large datasets.

    • These approximate functions can use an error boundary parameter, which is set to 0.4 (0.4%) across all functions.

Median: This transformation calculates the median value, which corresponds to the 50th percentile.

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

median(heightIn)

Parameter: New column name

'medianIn'

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

approximatemedian(heightIn, 0.4)

Parameter: New column name

'approxMedianIn'

Percentile: This transformation calculates the 68th percentile.

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

percentile(heightIn, 68, linear)

Parameter: New column name

'percentile68In'

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

approximatepercentile(heightIn, 68, 0.4)

Parameter: New column name

'approxPercentile68In'

Quartile: This transformation calculates the first quartile, which corresponds to the 25th percentile.

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

quartile(heightIn, 1, linear)

Parameter: New column name

'percentile25In'

Transformation Name

New formula

Parameter: Formula type

Single row formula

Parameter: Formula

approximatequartile(heightIn, 1, 0.4)

Parameter: New column name

'approxPercentile25In'

Results:

studentId

heightIn

approxPercentile25In

percentile25In

approxPercentile68In

percentile68In

approxMedianIn

medianIn

1

64

64

64

67.1

66.92

66

66

2

65

64

64

67.1

66.92

66

66

3

63

64

64

67.1

66.92

66

66

4

64

64

64

67.1

66.92

66

66

5

62

64

64

67.1

66.92

66

66

6

66

64

64

67.1

66.92

66

66

7

66

64

64

67.1

66.92

66

66

8

65

64

64

67.1

66.92

66

66

9

69

64

64

67.1

66.92

66

66

10

66

64

64

67.1

66.92

66

66

11

73

64

64

67.1

66.92

66

66

12

69

64

64

67.1

66.92

66

66

13

69

64

64

67.1

66.92

66

66

14

61

64

64

67.1

66.92

66

66

15

64

64

64

67.1

66.92

66

66

16

61

64

64

67.1

66.92

66

66

17

71

64

64

67.1

66.92

66

66

18

67

64

64

67.1

66.92

66

66

19

73

64

64

67.1

66.92

66

66

20

66

64

64

67.1

66.92

66

66