Support Vector Machines (SVM), or Support Vector Networks (SVN), are a popular set of supervised learning algorithms originally developed for classification (categorical target) problems, and late extended to regression (numerical target) problems. SVMs are popular because they are memory efficient, can address a large number of predictor variables (although they can provide poor fits if the number of predictors exceeds the number of estimation records), and are versatile since they support a large number of different "kernel" functions.
The basic idea behind the method is to the predictor variables are to find the best equation of a line (one predictor), a plane (two predictors) , or a hyperplane (three or more predictors) that maximally separates the groups of records, based on a measure of distance, the estimation records into different groups based on the target variable. A kernel function provides the measure of distance that causes to records to be placed in the same or different groups, and involves taking a function of the predictor variables to define the distance metric.
A short video that illustrates how this works can be found here, and a very approachable discussion of the topic can be found here. The extent that groups are separated conditional on the kernel function used is known as the maximal margin. Finally, the separation of the groups may not be perfect, but a cost parameter (which is the cost of placing an estimation record into the "wrong" group) can also be specified.
This tool uses the e1071 R package.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
An Alteryx data stream that includes a target field of interest along with one or more possible predictor fields.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
The model customization section is where the user chooses kernel type and related parameters of each kernel. The Alteryx SVM tool offers the option of allowing the user to directly set the needed parameters (the radio button: âUser provides parametersâ), or provides a range of parameters and finds the best parameters by searching of a grid of possible values (radio button: âMachine tunes parametersâ). Note that the latter is more computationally expensive (and hence takes longer), since it carries out a 10-fold cross validation to test the model on multiple parameter values. However, it is likely to result in a model that better fits the data.
User provides parameters
Kernel Type: Determines the metric used to measure the seperation between groups
Machine tunes parameters
The parameters need to be selected in this case are analogous to those for the case of âUser provides parametersâ section, but with the following the differences:
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Classification
Regression
The report explains how to interpret each performance evaluation measure.
©2017 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademar