The Boosted Model tool creates generalized boosted regression models based on Gradient Boosting methods. The models are created by serially adding simple decision tree models to a model ensemble to minimize an appropriate loss function. These models use a method of statistical learning that:
Use the Boosted Model tool for classification, count data, and continuous target regression problems.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
The Boosted Model tool requires an input data stream with:
The function used in model estimation varies based on the input data stream.
Compared to the open source R functions, the RevoScaleR-based function can analyze much larger datasets. However, the RevoScaleR-based function must create an XDF file, which increases the overhead cost, uses an algorithm that makes more passes over the data, increasing runtime, and cannot create some model diagnostic outputs.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
If a field is used as both a predictor and a sample weight, the output weight variable field will be prepended with âRight_â.
These options can be used to modify the model settings.
In a case with 5 folds, the data is divided into 5 unique subsamples and 5 different models are created, each using data from 4 of the subsamples. The final subsample is withheld from model creation, and is used to test prediction accuracy.
A small shrinkage value may require the value of Set maximum number of decision trees to increase to guarantee an optimal number of trees.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
The Boosted Model tool supports Microsoft SQL Server 2016 in-database processing. See In-Database Overview for more information about in-database support and tools.
To access the In-DB version of the Boosted Model tool:
See Predictive Analytics for more about predictive in-database support.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Use sampling weights in model estimation: An option that allows you to select a field that weights the importance placed on each record when creating a model estimation.
If a field is used as both a predictor and a sample weight, the output weight variable field will be prepended with âRight_â.
These options can be used to modify the model settings.
For a continuous target, minimize a loss function based on the Gaussian distribution.
For a binary categorical target, minimize a loss function based on the Bernoulli distributions.
For a multinomial categorical target, minimize a loss function based on a multinomial logistic loss function, a multinomial generalization of the Bernoulli loss function.
A small shrinkage value may require the value of Set maximum number of decision trees to increase to guarantee an optimal number of trees.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Connect a Browse tool to each output anchor to view results.
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx,