You are here: Predictive Analytics > Predictive Grouping > Multidimensional Scaling

# Multidimensional Scaling Tool

Multidimensional Scaling (abbreviated MDS)is a method of separating univariate data based upon variance. Conceptually, MDS takes the dissimilarities, or distances, between items described in the data and generates a map between the items. The number of dimensions in this map are often provided prior to generation by the analyst. Usually, the highest variance dimension corresponds to the largest distances being described in the data. The map solution relies on univariate data, so the rotation and orientation of the map dimensions is not significant. MDS uses dimensional analysis similar to Principle Components. For more information see https://en.wikipedia.org/wiki/Multidimensional_scaling.

Two types of MDS are implemented in this tool: Classical MDS, and Isometric MDS. Classical MDS is the simple and fast approach. Classical MDS works by generating a map by reducing the error between given distances between items and the cartesian distance between the items on the map. Isometric MDS is slightly more complex. If the resulting map of Classical MDS is taken and then adjusted so that the map distances between item pairs are in the same largest-to-smallest order as the original data, that is Isometric MDS. This Isometric MDS is then useful when the exact distance units are less important than the rank of which item pairs are farthest apart or closest together.

An example use of Classical MDS would be the straight-line distance between cities across the USA to produce a map of the USA. An example use of Isometric MDS would be producing a multidimensional food chart based on how similar or different the nutritional value is between food items, where a ranking of the distances is more important than a specific unit coordinate. These methods are often used in a marketing research context to obtain the number and nature of the perceptual dimensions used by customers to judge the similarity between different items.

This tool is not automatically installed with Alteryx Designer. To use this tool, download it from the Alteryx Analytics Gallery.

## Input

An Alteryx data stream configured in either of the following 2 ways:

1. A 3 column stream with each entry representing item pair names and their dissimilarity.
2. An MxM matrix with each column representing an item, each row representing an item, and each intersection representing the dissimilarity value. For more information see https://en.wikipedia.org/wiki/Distance_matrix.

## Configuration Properties

### Model Options

1. Choose Input Type: Select whether to use the 3-column pairwise approach or distance matrix approach for input of dissimilarity information. You must define all pair distances in either case; otherwise an error is thrown.

2. Number of Dimensions to Output: Select the number of dimensions that the map and data will contain in the Data and Plot outputs. Consideration of the level of variance should be made using the eigenvalue plot in the report to choose the best number of dimensions.

3. Choose Multi-Dimensional Scaling Method: Choose between using Classical MDS or Isometric MDS algorithms.

### Plot Options

1. Comma separated list of dimensions to flip: Any numbers in this list will be the dimensions that have their item coordinates multiplied by -1. The MDS algorithms pick dimension polarity arbitrarily, and sometimes can be helped by user input. For instance, in creating a map of the USA based on distances between cities, the direction may be reversed from what is known to be the case.
2. Bar Plot of Eigenvalues: This check-mark decides whether or not the eigenvalues and explanation are included in the report output. This is for helping choose the number of dimensions to keep in the map of the data. Mainly, the bar plot helps with knowing at what point do additional dimensions incorporate only noise or spurious data into the map.

3. Replace item names with numbers in graph for visibility?: The map may contain too many items to identify one name from another. This check-mark decides whether or not to convert all item names into number IDs (i.e., 'jack', 'jill', 'banana'... etc, versus x1,x2,x3, ... x987, x988, etc.).

### Graphics Options

1. Plot size (Inches/Centimeters selection): Specify the width and height dimensions of the resulting plot, using either inches or centimeters.

2. Graph resolution: The resolution (in dots per inch) of any plot(s) produced by the tool. The choices are

• 1x (96 dpi)
• 2x (192 dpi)
• 3x (288 dpi)

The 1x resolution is best for reports intended to be viewed exclusively on a computer screen (e.g., HTML reports), while 3x resolution will be best for PDF files or formats intended to be printed.

3. Base font size (points): The point size of the base font used to produce the title, labels, and items on the plot(s) to be produced. The plotting functions will expand the size of the plot title to be larger than the base font automatically.

## Output

There are 2 output streams:

• D Output: [Data] Contains entries for each item and each dimension's coordinate value.

• P Output: [Plot] Contains report outputs with graphic settings as declared in the tool configuration: (Optional) table and graph depicting the variance of each dimension with explanation of what Eigen values are; Plots of each dimension pair (i.e. {1,2};{1,3};{1,4};{2,3};{2,4};{3,4}) with each item represented by name or (optionally) a numeric identifier.