Naive Bayes Classifier Tool

The Naive Bayes Classifier tool creates a binomial or multinomial probabilistic classification model of the relationship between a set of predictor variables and a categorical target variable.  The Naive Bayes classifier assumes that all predictor variables are independent of one another and predicts, based on a sample input, a probability distribution over a set of classes, thus calculating the probability of belonging to each class of the target variable.

One of the main advantages of the Naive Bayes Classifier is that it performs well even with a small training set.  This advantage derives from the fact that the Naive Bayes classifier is paramaterized by the mean and variance of each variable independent of all other variables. In many maximum likelihood classification problems, the covariance matrix is needed in order to estimate predicted probabilities, but small training sets can lead to a highly variable covariance matrix which, in turn, can degrade the performance of the maxmimum likelihood estimator (MLE). Since the Naive Bayes classifier only requires the calculation of one-dimensional variances for each predictor, the covariance matrix is not needed and thus the MLE does not suffer from the problems of a small training set.

The Naive Bayes Classifier is useful when trying to categorize a set of observations according to a target "class" variable, particularly in cases where only a small training set and a small number of predictors are used.  Using an initial training set, the Naive Bayes Classifier develops a model for predicting the probability that a given observation belongs to each class of the target variable.  

A simple example would be predicting whether someone leasing a new vehicle will purchase that car at the termination of the lease based on both the characteristics of the vehicle or (e.g., pickup/sedan/SUV) and the customer (e.g., gender, age, etc.). The Naive Bayes Classifier would allow the user to "score" future individuals according to the model produced by the training set. This scoring process would result in a set of probabilities, one for purchase at the end of the lease agreement and one for not purchase at the end of the lease agreement.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.


An Alteryx data stream that includes a target field of interest along with one or more possible predictor fields.

Configuration Properties

Required Parameters

Graphics Options