Simulation Scoring Tool
The Simulation Scoring tool takes a sample from an approximation of a model object error distribution. Whereas standard scoring attempts to predict the mean predicted value, Simulation Scoring also considers the error distribution to provide a range of possible values.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Connect inputs
- M anchor: The model object produced by one of the R-based predictive modeling tools.
- V anchor: Optional. The validation dataset to use when connecting a non-Linear Model (non-LM). Alteryx tools that create non-LM models are Logistic Regression Tool,Count Regression Tool,Gamma Regression Tool, Boosted Model Tool,Decision Tree Tool,Forest Model Tool,Naive Bayes Classifier Tool, Neural Network Tool,Spline Model Tool,Stepwise Tool, andSupport Vector Machine Tool.
- If you are scoring an LM model, the error distribution can be directly sampled due to the properties of LMs.
- If you are scoring other models (non-LM), homoscedasticity of the error distributions with respect to the predictors is assumed. This allows a single error distribution to be calculated by scoring the model against a validation set. That error distribution is then sampled and added to the score results for the incoming data.
- S anchor: The simulation data to score. This must contain all of the fields (with identical types and names) used to create the associated predictive model.
Warning
Do not connect this input when the incoming model object uses aLinear Regression Tool.
Configure the tool
- Name results of score simulation: The field name for the generated results. The field name must start with a letter and may contain letters, numbers, and the special characters period(".") and underscore ("_"). Note that R is case-sensitive.
- The number of records to score at a time: The tool can break the input data into chunks, score a chunk at a time, and thereby avoid R's in-memory processing limitation. This option controls the maximal number of incoming records contained in each chunk of data.
- How many samples from error distribution per iteration: The number of draws from the model's error distribution for each incoming record.
- Set Random Seed: (Optional) Specify a random seed. This option is hidden if there is a seed field in the data to be scored.
View the output
- D anchor: The data to be scored, along with the simulated score.