The Neural Network tool creates a feedforward perceptron neural network model with a single hidden layer. The neurons in the hidden layer use a logistic (also known as a sigmoid) activation function, and the output activation function depends on the nature of the target field. Specifically, for binary classification problems (e.g., the probability a customer buys or does not buy), the output activation function used is logistic, for multinomial classification problems (e.g., the probability a customer chooses option A, B, or C) the output activation function used is softmax, for regression problems (where the target is a continuous, numeric field) a linear activation function is used for the output.
Neural networks represent the first machine learning algorithm (as opposed to traditional statistical approaches) for predictive modeling. The motivation behind the method is mimicking the structure of neurons in the brain (hence the method's name). The basic structure of a neural network involves a set of inputs (predictor fields) that feed into one or more "hidden" layers, with each hidden layer having one or more "nodes" (also known as "neurons").
In the first hidden layer, the inputs are linearly combined (with a weight assigned to each input in each node), and an "activation function" is applied to the weighted linear combination of the predictors. In the second and subsequent hidden layers, output from the nodes of the prior hidden layer are linearly combined in each node of the hidden layer (again with weights assigned to each node from the prior hidden layer), and an activation function is applied to the weighted linear combination. Finally, the results from the nodes of the final hidden layer are combined in a final output layer that uses an activation function that is consistent with the target type.
Estimation (or "learning" in the vocabulary of the neural network literature) involves finding the set of weights for each input or prior layer node values that minimize the model's objective function. In the case of a continuous numeric field this means minimizing the sum of the squared errors of the final model's prediction compared to the actual values, while classification networks attempt to minimize an entropy measure for both binary and multinomial classification problems. As indicated above, the Neural Network tool (which relies on the R nnet package), only allows for a single hidden layer (which can have an arbitrary number of nodes), and always uses a logistic transfer function in the hidden layer nodes. Despite these limitations, our research indicates that the nnet package is the most robust neural network package available in R at this time.
While more modern statistical learning methods (such as models produced by the Boosted, Forest, and Spline Model tools) typically provide greater predictive efficacy relative to neural network models, in some specific applications (which cannot be determined before the fact), neural network models outperform other methods for both classification and regression models. Moreover, in some areas, such as in financial risk assessment, neural network models are considered a "standard" method that is widely accepted.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
Input
An Alteryx data stream that includes a target field of interest along with one or more possible predictor fields.
Configuration Properties
Required Parameters
Model name: Each model needs to be given a name so it can later be identified. Model names must start with a letter and may contain letters, numbers, and the special characters period (".") and underscore ("_"). No other special characters are allowed, and R is case sensitive.
Select the target variable: Select the field from the data stream you want to predict. This target must be a string type.
Select the predictor variables: Choose the fields from the data stream you believe "cause" changes in the value of the target variable.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Use sampling weights in model estimation (Optional): Click the check box and then select a weight field from the data stream to estimate a model that uses sampling weight.
The number of nodes in the hidden layer: The number of nodes (neurons) in the model's single hidden layer. The default is ten.
Include effect plots: If checked, then effect plots will be produced that graphically show the relationship between the predictor variable and the target, averaging over the effect of other predictor fields. The number of plots to produce is controlled by "The minimal level of importance of a field to be included in the plots," which indicates the percentage of the total predictive power of the model a particular field must contribute to the model in order to have a marginal effect plot produced for that field. The higher the value for this selection reduces the number of marginal effects plots produced.
Model Customization
Custom scaling/normalization...: The numeric methods underlying the optimization of the model's weights can be problematic if the inputs (predictor fields) are on different scales (e.g., income which ranges from seven thousand to one million combined with the number of members present in the household that ranges from one to seven).
None: Default.
Z-score: All predictor fields are scaled so that they have a mean of zero and a standard deviation of one.
Unit interval: All predictor fields are scaled so that they have a minimum value of zero and a maximum value of one, with all other values being between zero and one.
Zero centered: All predictor fields are scaled so that they have a minimum value of negative one and a maximum value of one, with all other values being between negative and positive one).
The weight decay: The decay weight limits the movement in the new weight values at each iteration (also called "epoch") of the estimation process. The value of the decay weight should be between zero and one, larger values place a greater restriction of the possible movements of the weights. In general, a weight decay value of between 0.01 and 0.2 often work well.
The +/- range of the initial (random) weights around zero: The weights given to the input variables in each hidden node are initialized using random numbers. This option allows the user to set the range of the random numbers used. Generally, the values should be near 0.5. However, smaller values can be better if all the input variables are large in size. A value of 0 is actually a special value that causes the tool to find a good comprise value given the input data.
The maximum number of weights allowed in the model: This option become relevant when there are a large number of predictor fields and nodes in the hidden layer. Reducing the number of weights speeds up model estimation, and also reduces the chance that the algorithm finds a local optimum (as opposed to a global optimum) for the weights. Weights excluded from the model are implicitly set to zero.
The maximum number of iterations for model estimation: This value controls the number of attempts the algorithm can make in attempting to find improvements in the set of model weights relative to the previous set of weights. If no improvements are found in the weights prior to the maximum number of iterations, the algorithm will terminate and return the best set of weights. This option defaults to 100 iterations. In general, given the behavior of the algorithm, it is likely to make sense to increase this value if needed, at the cost of lengthening the runtime for model creation.
Graphics Options
Plot size: Select inches or centimeters for the size of the graph.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Base font size (points): Select the size of the font in the graph.
Output
Object (O) output: Consists of a table of the serialized model with its model name.
Report (R) output: Consists of the report snippets generated by the Neural Network tool: a basic model summary, as well as main effect plots for each class of the target variable.