Find Nearest Neighbors Tool

The Find Nearest Neighbors tool finds the selected number of nearest neighbors in the "data" stream that corresponds to each record in the "query" stream based on their Euclidean distance. The method provides the user a choice of algorithms for finding the nearest neighbors that differ in their speed and possible accuracy. The default is to do the search based on the KD-Tree algorithm that has a generally good combination of speed and accuracy. In addition, the user has a choice of basing the calculations using either the original data or the data can be standardized using either a z-score standardization (which results in all fields having a mean of zero and a standard deviation of one) or a unit-interval transformation (in which the values of each field range from zero to one). It is recommended that some sort of field standardization be used with this tool since the Euclidean distance calculations are very sensitive to differences in field scales (e.g., untransformed household income and age data have very different levels and ranges). Given the nature of this method, only numeric fields can be used as inputs. The tool makes use of the R FNN package.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.

Inputs

Two Alteryx data streams. The right stream is the "query" stream, the rows for which the selected number of nearest neighbors in the left stream (the "data" stream)

Configuration Properties

Outputs

*en.wikipedia.org/wiki/Cover_tree
**en.wikipedia.org/wiki/K-d_tree
***Venables, W. N. and Ripley, B. D. (2002), Modern Applied Statistics with S, 4th ed., Springer, Berlin.