Skip to main content

Sample Tool

Use Sample to limit the data stream to a specified number, percentage, or random set of rows. In addition, the Sample tool applies the selected configuration to the columns you want to group by.

Tool Components

Sample Data anchors.png

Figure: Sample Tool with anchors.

The Sample tool has 2 anchors.

  • Input anchor: Use the input anchor to select the data you want to sample.

  • Output anchor: Outputs the sampled data.

Configure the Tool

  1. Select a sampling method. N is selected using the textbox following the sampling methods and is limited to 16 characters. The options are...

    • First N Rows: Returns every row in the data from the first through row N.

    • Last N Rows: Starting from the row that is N rows away from the end of the data, returns every row through to the end of the data.

    • Skip 1st N Rows: Returns all rows in the data starting after row N.

    • 1 of Every N Rows: Returns the first row of every group of N rows.

    • First N% of rows: Returns N percent of rows. This option requires the data to pass through the tool twice: once to calculate the count of rows and again to return the specified percent of rows.

    • 1 in N Chance to Include Each Row: Randomly determines if each row is included in the sample, independent of the inclusion of any other rows.

      Note

      The option 1 in N Chance to Include Each Row returns an approximation. For example, if you have 1,000 rows, select a random sample, and specify N as 10, you might expect the tool to return 100 rows. However, it could return between 75 and 150 rows.

  2. Enter a number in N= to specify the value for N.

  3. Columns to Group By (Optional): If groups are specified, N rows are returned for each group. This option is not available for the 1 in N Chance to Include Each Row sampling method.

    Note

    If you select to group by a column named City, specify N as 2, and select First N Rows, returns the first 2 rows for each City.