The Scatterplot tool makes enhanced scatterplots, with options to include boxplots in the margins, a linear regression line, a smooth curve via non-parametric regression, a smoothed conditional spread, outlier identification, and a regression line. The smooth curve can help a user more readily see the nature of the relationship between two variables relative to a traditional scatter plot, particularly in cases where there are many observations or a high level of dispersion in the data.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Configure the tool
Use the Configuration tab to set the mandatory controls for the scatterplot.
- X (horizontal) field: The field to use on the plot's horizontal axis. The choice is limited to numerical fields.
- Y (vertical) field: The field to use on the plot's horizontal axis. Either a numerical field or a binary categorical field can be used. If a binary categorical field is selected, a new field (which will have the suffix ".num" appended to the original field name) will be created that has numeric values of either zero or one. If categorical variable with more than two values is selected, the node will return an error.
- The Y field is a binary categorical variable: A check-box to indicate that the Y field is a binary categorical variable. When checked, the user is asked to indicate the field value that will correspond to a value of one (the "target"), with entries with the other field value taking the value zero.
Use the Plot elements tab to set the rules for how data is plotted.
- Least-squares (regression) line: Displays a simple linear regression line between the X and Y fields. Included by default.
- Smooth line: Displays a non-linear line between the X and Y fields that is created using a loess (non-parametric local regression) model. Included by default
- Span for smooth: A parameter that controls the size of the local area used to construct the loess estimates. The smaller the number, the smaller the area used.
- Show spread: Two curves showing the results of loess models to both the root-mean-square positive and negative residuals from the original loess line to display conditional spread and asymmetry in the errors. Included by default.
- Marginal boxplots: Includes univariate boxplots of the X and Y field along each respective access. This is useful in assessing the distribution of values for both fields, and they are included by default.
- Jitter X: If selected, the X values are randomly perturbed by a small amount. This is useful if a larger number of record in the X field take on one or a small number of values. It only influence the appearance points on the graphs, not the fitted regression and loess lines.
- Jitter Y: If selected, the Y values are randomly perturbed by a small amount. This is useful if a larger number of record in the Y field take on one or a small number of values. It only influence the appearance points on the graphs, not the fitted regression and loess lines.
- Log X axis: If selected, a natural log transformation is applied to the X values. Doing this is often useful for exploring certain types of non-linear relationships.
- Log Y axis: If selected, a natural log transformation is applied to the Y values. Doing this is often useful for exploring certain types of non-linear relationships.
- Plot by groups: This option allows for an examination of the effect of a categorical field on the relationship between the X and Y fields, with each value of the categorical resulting in a group of X and Y values. Groups are plotted with different colors and plotting characters. If this option is selected, the user is asked to give the categorical field to be used in creating groups, (optionally) whether they would like regression and loess curves plotted for each group, and the location of the legend that identifies the different groups.
Use the Style options tab to set the graph controls, such as labels and scale.
- X axis label (optional): An optional label for the X (horizontal) axis. By default, the name of the X field name is used.
- Y axis label (optional): An optional label for the Y (vertical) axis. By default, the name of the Y field name is used.
- Point size scale: Controls the size of the points within the display, with larger values resulting in a larger point size.
- Axis text size scale: Controls the size of the numbers and tick marks along each axis, with larger values resulting in larger text.
- Axis labels text size scale: Controls the size of the axis label along each axis, with larger values resulting in larger text.
- Main title text size scale: Controls the size of the main title text, with larger values resulting in larger text.
Use the Graphics Options tab to set the controls for the graphical output.
- Plot size: Select inches or centimeters for the size of the graph.
- Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
- Base font size (points): Select the size of the font in the graph.
View the output
An Alteryx R-Graph object that can be used to assist in the creation of custom reports.