The Boosted Model tool creates generalized boosted regression models based on Gradient Boosting methods. The models are created by serially adding simple decision tree models to a model ensemble to minimize an appropriate loss function. These models use a method of statistical learning that:
Use the Boosted Model tool for classification, count data, and continuous target regression problems.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
The Boosted Model tool requires an input data stream with:
The input data can be an Alteryx data stream or XDF metadata stream. An Alteryx data stream uses the open source R gbm function for model estimation. An XDF metadata stream, coming from either an XDF Output Tool or XDF Input Tool, uses the RevoScaleR rxBTrees function for model estimation.
Compared to the open source R functions, the RevoScaleR-based function can analyze much larger datasets. However, the RevoScaleR-based function must create an XDF file, which increases the overhead cost, and cannot create some model diagnostic outputs.
These options are required to generate a boosted model.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
If a field is used as both a predictor and a sample weight, the output weight variable field will be prepended with âRight_â.
These options can be used to modify the model settings.
In a case with 5 folds, the data is divided into 5 unique subsamples and 5 different models are created, each using data from 4 of the subsamples. The final subsample is withheld from model creation, and is used to test prediction accuracy.
A small shrinkage value may require the value of Set maximum number of decision trees to increase to guarantee an optimal number of trees.
These options control the settings of the output graph.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
The Boosted Model tool has two output anchors:
The Boosted Model tool supports Microsoft SQL Server 2016 in-database processing. See In-Database Overview for more information about in-database support and tools.
To access the In-DB version of the Boosted Model tool:
See Predictive Analytics for more about predictive in-database support.
The Boosted Model In-DB tool requires an input with:
The data can be from an SQL Server in-database data stream. The Microsoft R Server uses the rxBTrees function from the RevoScaleR package for model estimation, which requires the local machine and server must be configured with Microsoft R Server to allow processing on the database server. This can result in a significant performance improvement.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Use sampling weights in model estimation: An option that allows you to select a field that weights the importance placed on each record when creating a model estimation.
If a field is used as both a predictor and a sample weight, the output weight variable field will be prepended with âRight_â.
These options can be used to modify the model settings.
For a continuous target, minimize a loss function based on the Gaussian distribution.
For a binary categorical target, minimize a loss function based on the Bernoulli distributions.
For a multinomial categorical target, minimize a loss function based on a multinomial logistic loss function, a multinomial generalization of the Bernoulli loss function.
A small shrinkage value may require the value of Set maximum number of decision trees to increase to guarantee an optimal number of trees.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Connect a Browse tool to each output anchor to view results.
©2017 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx,