The Scatterplot tool makes enhanced scatterplots, with options to include boxplots
in the margins, a linear regression line, a smooth curve via non-parametric
regression, a smoothed conditional spread, outlier identification, and
a regression line. The smooth curve can help a user more readily see the
nature of the relationship between two variables relative to a traditional
scatter plot, particularly in cases where there are many observations
or a high level of dispersion in the data.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
Input
An Alteryx data stream.
Configuration Properties
Configuration
X (horizontal) field: The field to use on the plot's horizontal axis. The choice is limited to numerical fields.
Y (vertical) field: The field to use on the plot's horizontal axis. Either a numerical field or a binary categorical field can be used. If a binary categorical field is selected, a new field (which will have the suffix ".num" appended to the original field name) will be created that has numeric values of either zero or one. If categorical variable with more than two values is selected, the node will return an error.
The Y field is a binary categorical variable: A check-box to indicate that the Y field is a binary categorical variable. When checked, the user is asked to indicate the field value that will correspond to a value of one (the "target"), with entries with the other field value taking the value zero.
Plot elements
Least-squares (regression) line: Displays a simple linear regression line between the X and Y fields.
Included by default.
Smooth line: Displays a non-linear line between the X and Y fields that is created using a loess (non-parametric local regression) model.* Included by default
Span for smooth: A parameter that controls the size of the local area used to construct the loess estimates. The smaller the number, the smaller the area used.
Show spread: Two curves showing the results of loess models to both the root-mean-square positive and negative residuals from the original loess line to display conditional spread and asymmetry in the errors. Included by default.
Marginal boxplots: Includes univariate boxplots of the X and Y field along each respective access. This is useful in assessing the distribution of values for both fields, and they are included by default.
Jitter X: If selected, the X values are randomly perturbed by a small amount. This is useful if a larger number of record in the X field take on one or a small number of values. It only influence the appearance points on the graphs, not the fitted regression and loess lines.
Jitter Y: If selected, the Y values are randomly perturbed by a small amount. This is useful if a larger number of record in the Y field take on one or a small number of values. It only influence the appearance points on the graphs, not the fitted regression and loess lines.
Log X axis: If selected, a natural log transformation is applied to the X values. Doing this is often useful for exploring certain types of non-linear relationships.
Log Y axis: If selected, a natural log transformation is applied to the Y values. Doing this is often useful for exploring certain types of non-linear relationships.
Plot by groups: This option allows for an examination of the effect of a categorical field on the relationship between the X and Y fields, with each value of the categorical resulting in a group of X and Y values. Groups are plotted with different colors and plotting characters. If this option is selected, the user is asked to give the categorical field to be used in creating groups, (optionally) whether they would like regression and loess curves plotted for each group, and the location of the legend that identifies the different groups.
Style options
X axis label (optional): An optional label for the X (horizontal) axis. By default, the name of the X field name is used.
Y axis label (optional): An optional label for the Y (vertical) axis. By default, the name of the Y field name is used.
Point size scale: Controls the size of the points within the display, with larger values resulting in a larger point size.
Axis text size scale: Controls the size of the numbers and tick marks along each axis, with larger values resulting in larger text.
Axis labels text size scale: Controls the size of the axis label along each axis, with larger values resulting in larger text.
Main title text size scale: Controls the size of the main title text, with larger values resulting in larger text.
Graphics Options
Plot size: Specify the width and height dimensions of the resulting plot, using either inches or centimeters.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Base font size (points): The point size of the base font used to produce the title and labels of the plot(s) to be produced. The plotting functions will expand the size of the plot title to be larger than the base font automatically.
Output
An Alteryx R-Graph object that can be used to assist in the creation of custom reports.