Data Investigation

The Data Investigation tool category includes tools for understanding the data to be used in a predictive analytics project, and tools for conducting specialized data sampling tasks for predictive analytics.

Association Analysis Tool: The Association Analysis tool determines which fields in a database have a bivariate association with one another.

Basic Data Profile Tool: The Basic Data Profile tool outputs basic metadata such as data type, min, max, average, number of missing values, etc.

Contingency Table Tool: The Contingency Table tool creates a contingency table based on selected fields, to list all combinations of the field values with frequency and percent columns.

Distribution Analysis Tool: The Distribution Analysis tool fits one or more distributions to the input data and compares them based on a number of Goodness-of-Fit* statistics.

Field Summary Tool: The Field Summary tool analyzes data and creates a summary report containing descriptive statistics of data in selected columns. Use the Field Summary Tool to gain insight into data and receive recommendations for managing data.

Frequency Table Tool: The Frequency Table tool produces a frequency analysis for selected fields. The output includes a summary of the selected fields with frequency counts and percentages for each value in a field.

Heat Plot Tool: The Heat Plot tool uses a heat plot color map to show the joint distribution of two variables that are either continuous numeric variables or ordered categories.

Histogram Tool: The Histogram tool provides a histogram plot for a numeric field by showing the frequencies of records falling in a set of continuous value ranges. It also provides a smoothed empirical density plot. Frequencies are displayed when a density plot is not selected, and probabilities when this option is selected.

Importance Weights Tool: The Importance Weights tool provides methods for selecting a set of variables to use in a predictive model based on how strongly related each possible predictor is to the target variable.

Pearson Correlation Tool: The Pearson Correlation tool measures the linear dependence between two variables as well as the covariance.

Plot of Means Tool: The Plot of Means tool takes a numeric or binary categorical field (with the binary categorical field converted into a set of zero and one values) as a response field along with a categorical field and plots the mean of the response field for each of the categories (levels) of the categorical field.

Scatterplot Tool: The Scatterplot tool makes enhanced scatterplots, with options to include boxplots in the margins, a linear regression line, a smooth curve via non-parametric regression, a smoothed conditional spread, outlier identification, and a regression line.

Spearman Correlation Tool: The Spearman Correlation tool assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any other assumptions about the particular nature of the relationship between the variables.

Violin Plot Tool: The Violin Plot tool displays the distribution of a single numeric variable, and conveys the density of the distribution.