Basic Data Profile Tool

Version:
Current
Last modified: October 21, 2019

The Basic Data Profile tool analyzes data and provides metadata for each column (field) of data.

Use the Basic Data Profile tool to see an overview, or profile, of data and output the information for further analysis. To see a visual representation of the data profile, in addition to metadata, use a Browse tool. See Browse Tool.

Tool configuration

Complete any of the optional configuration options:

  • Limit for Exact Count: The default limit is recommended for best performance. Increase the limit to see profile information for more data. Type or click to select the maximum number of unique values that you want Alteryx to identify in the data.
  • Size Limit to Return All Unique Values (Characters): The default limit is recommended for best performance. Increase the limit to see profile information for more data. Type or click to select the maximum number of characters you want Alteryx to check in a value to determine if the value is unique.
  • Use Metric Units: Select to use metric units of measure. This option only applies to spatial data.

View the output

Depending on the type of data from the connect tool, the data profile information in the Results window varies. See Data Types, for a list of data types.

Results are listed vertically. Scroll to see the metadata for each column in the data.

String data

If a column contains string values, the following metadata is provided:

  • Name: The column name.
  • Data Type: The data type of the selected column.
  • Size: The amount of memory reserved for each record in this column.
  • Source: The origin of the column. This could be the name of the data source or the path to the location where the data is saved.
  • Description: The description of the column, if available. If no description is available, it is [Null].
  • Nulls: The number of values in the column that are null, excluding empty values.
  • Non-Nulls: The number of non-null entries in the column, including empty values.
  • Blanks: The number of empty values.
  • Values with Leading Whitespace: The number of string values with whitespace before the value. Use the Data Cleansing tool or the Formula tool trim function to resolve the problem. See Data Cleansing Tool and Formula Tool.
  • Values with Trailing Whitespace: The number of string values with whitespace after the value.
  • Values with Both Whitespace: The number of string values with whitespace before and after the value.
  • Average Length: The average length of values in the column.
  • Longest Length: The number of characters in the longest value in the column.
  • Longest Value: The longest value in the column.
  • Shortest (Non-Blank) Length: The number of characters in the shortest value in the column.
  • Shortest Value: The shortest value in the column.
  • Minimum: The first string entry in a column that is sorted alphabetically.
  • Maximum: The last string entry in a column that is sorted alphabetically.
  • Uniques: The number of unique values in the field. Use the Unique tool to see a full count of unique and duplicate entries. See Unique Tool.
  • Unique Values: All unique values in the column.

Numeric data

If a column contains numeric values, the following metadata is provided:

  • Name: The column name.
  • Data Type: The data type of the selected column.
  • Size: The amount of memory reserved for each record in this column.
  • Source: The origin of the column. This could be the name of the data source or the path to the location where the data is saved.
  • Description: The description of the column, if available. If no description is available, it is [Null].
  • Nulls: The number of values in the column that are null, excluding empty values.
  • Non-Nulls: The number of non-null entries in the column, including empty values.
  • Minimum: The smallest value in the column.
  • Maximum: The largest value in the column.
  • Average: The average value of values in the column.
  • Standard Deviation: The measure of how dispersed the values are in the chart.
  • Variance: The measure of how far a set of random numbers are dispersed from the mean.
  • Uniques: The number of unique values in the field. Use the Unique tool to see a full count of unique and duplicate entries. See Unique Tool.
  • Unique Values: All unique values in the column.
  • 25th Percentile: The median value in the lower, or first, half of the data.
  • 50th Percentile: The median value of the data.
  • 75th Percentile: The median value in the upper, or second, half of the data.
  • Histogram: The count of values in the column that fall into evenly grouped data. Each group is indicated by a starting value and a count of values in the group, separated by a colon. A group contains values up to but not including the starting value of the next group. 1:23, 2:15,3:0 indicates three groups starting at 1, 2, and 3. Each group has 23, 15, and 0 items respectively.
  • Margin of Error: The possible range of values under and over the calculated value.

Date/Time data

If a column contains date/time data, the following metadata is provided:

  • Name: The column name.
  • Data Type: The data type of the selected column.
  • Size: The amount of memory reserved for each record in this column.
  • Source: The origin of the column. This could be the name of the data source or the path to the location where the data is saved.
  • Description: The description of the column, if available. If no description is available, it is [Null].
  • Nulls: The number of values in the column that are null, excluding empty values.
  • Non-Nulls: The number of non-null entries in the column, including empty values.
  • Date Histogram: (Only for date data) The count of values in the column that fall into evenly grouped data. Each group is indicated by a starting value and a count of values in the group, separated by a colon. A group contains values up to but not including the starting value of the next group. 1:23, 2:15,3:0 indicates three groups starting at 1, 2, and 3. Each group has 23, 15, and 0 items respectively.
  • Minimum: The smallest value in the column.
  • Maximum: The largest value in the column.
  • Uniques: The number of unique values in the field. Use the Unique tool to see a full count of unique and duplicate entries. See Unique Tool.
  • Unique Values: All unique values in the column.

Spatial object data

If a column contains spatial objects, the following metadata is provided:

  • Name: The column name.
  • Data Type: The data type of the selected column.
  • Size: The amount of memory reserved for each record in this column.
  • Source: The origin of the column. This could be the name of the data source or the path to the location where the data is saved.
  • Description: The description of the column, if available. If no description is available, it is [Null].
  • Nulls: The number of values in the column that are null, excluding empty values.
  • Non-Nulls: The number of non-null entries in the column, including empty values.
  • Average Size (Bytes):    The average size in memory that this object takes up.
  • Largest Size (Bytes):    The size in memory of the largest object in the column.
  • Count Point:    The number of spatial objects in the column that are points.
  • Count Line:    The number of spatial objects in the column that are lines.
  • Count PolyPolyline:    The number of spatial objects in the column that are polylines.
  • Count Rectangle:    The number of spatial objects in the column that are rectangles.
  • Count Polygon:    The number of spatial objects in the column that are polygons.
  • Count MultiPoint:    The number of spatial objects in the column that are multi-points.
  • Average Number of Parts: The average number of parts in the spatial objects within the column.
  • Largest Number of Parts: The largest number of parts in the spatial objects within the column.
  • Average Number of Points: The average number of points in the spatial objects within the column.
  • Largest Number of Points: The largest number of points in the spatial objects within the column.
  • Longest Length: The longest length in the spatial objects within the column.
  • Largest Area: The largest area in square miles or square kilometers.
Was This Helpful?

Need something else? Visit the Alteryx Community or contact support.