âShow Table of Contents
The Naive Bayes Classifier tool creates a binomial or multinomial probabilistic classification model of the relationship between a set of predictor variables and a categorical target variable. The Naive Bayes classifier assumes that all predictor variables are independent of one another and predicts, based on a sample input, a probability distribution over a set of classes, thus calculating the probability of belonging to each class of the target variable.
One of the main advantages of the Naive Bayes Classifier is that it performs well even with a small training set. This advantage derives from the fact that the Naive Bayes classifier is paramaterized by the mean and variance of each variable independent of all other variables. In many maximum likelihood classification problems, the covariance matrix is needed in order to estimate predicted probabilities, but small training sets can lead to a highly variable covariance matrix which, in turn, can degrade the performance of the maxmimum likelihood estimator (MLE). Since the Naive Bayes classifier only requires the calculation of one-dimensional variances for each predictor, the covariance matrix is not needed and thus the MLE does not suffer from the problems of a small training set.
The Naive Bayes Classifier is useful when trying to categorize a set of observations according to a target "class" variable, particularly in cases where only a small training set and a small number of predictors are used. Using an initial training set, the Naive Bayes Classifier develops a model for predicting the probability that a given observation belongs to each class of the target variable.
A simple example would be predicting whether someone leasing a new vehicle will purchase that car at the termination of the lease based on both the characteristics of the vehicle or (e.g., pickup/sedan/SUV) and the customer (e.g., gender, age, etc.). The Naive Bayes Classifier would allow the user to "score" future individuals according to the model produced by the training set. This scoring process would result in a set of probabilities, one for purchase at the end of the lease agreement and one for not purchase at the end of the lease agreement.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
An Alteryx data stream that includes a target field of interest along with one or more possible predictor fields.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Â©2017 Alteryx, Inc., all rights reserved. AllocateÂ®, AlteryxÂ®, GuzzlerÂ®, and SolocastÂ® are registered trademarks of Alteryx, Inc.