The Decision Tree tool creates a set of if-then split rules to optimize model creation criteria based on Decision Tree Learning methods. Rule formation is based on the target field type:
Use the Decision Tree tool when the target field is predicted using one or more variable fields, such as a classification or continuous target regression problem.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
The Decision Tee tool requires an input with:
The function used in model estimation varies based on the input data stream.
Compared to the open source R functions, the RevoScaleR-based function can analyze much larger datasets. However, the RevoScaleR-based function must create an XDF file, which increases the overhead cost, uses an algorithm that makes more passes over the data, increasing runtime, and cannot create some model diagnostic outputs.
These options are required to generate a decision.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Click Customize to adjust additional settings.
Model: The options that change how the model evaluates data and is built.
Choose algorithm: Select the rpart function or the C5.0 function.
rpart: An algorithm based on the work of Breiman, Friedman, Olshen, and Stone; considered the standard. Use rpart if you are creating a regression model or if you need a pruning plot.
Model Type and Sampling Weights: Controls for the type of model based on the target variable and the handling of sampling weights.
If a field is used as both a predictor and a sample weight, the output weight variable field is prepended with âRight_â.
Splitting Criteria and Surrogates: Controls for how the model determines a split and how surrogates are used in assessing data patterns.
The splitting criteria when using a Regression model is always Least Squares.
The Gini impurity is used.
HyperParameters: Controls for the model's prior distribution.
This option only applies when the input into the tool is an XDF metadata stream. The Revo ScaleR function (rxDTree) that implements the scalable decision tree handles numeric variables via an equal interval binning process to reduce the computation complexity.
C5.0: An algorithm based on the work of Quinlan; use C5.0 if your data is sorted into one of a small number of mutually exclusive classes. Properties that may be relevant to the class assignment are provided, although some data may have unknown or non-applicable values.
Structural Options: Controls for the model's structure. By default, the model is structured as a decision tree.
Detailed Options: Controls for the model's splits and features.
Numerical Hyperparameters: Controls for the model's prior distribution that are based on a numeric value.
Cross-Validation: Controls for customizing a method of validation with efficient use of available information.
Plots: Select and configure what graphs appear in the output report.
Display static report: Select to display a summary report of the model from the R output anchor. Selected by default.
Tree Plot: A graph of decision tree variables and branches.
Display tree plot: Click to include a graph of decision tree variables and branches in the model report output.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Prune Plot: A simplified graph of the decision tree.
Display prune plot: Click to include a simplified graph of the decision tree in the model report output.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
The Decision Tree tool supports Microsoft SQL Server 2016 and Teradata in-database processing. See In-Database Overview for more information about in-database support and tools.
When a Decision Tree tool is placed on the canvas with another In-DB tool, the tool automatically changes to the In-DB version. To change the version of the tool, right-click the tool, point to Choose Tool Version, and click a different version of the tool. See Predictive Analytics for more about predictive in-database support.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
The weight variable appears in the model call in the output with the string "Right_" prepended to it.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
Connect a Browse tool to each output anchor to view results.
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx, Inc.