Check out this content on our new Help site.

Regression Machine Learning Tool

Use the Regression tool as part of a machine-learning pipeline to identify a trend. The tool provides several algorithms you can use to train a model. The tool also allows you to tune a model using many parameters.

Configure the Tool

This section contains info about how to configure the Regression tool.

Select Algorithm

Select what algorithm you want to use. You can choose Linear Regression, Decision Tree, or Random Forest.

Configure Parameters

Configure the parameters. Each algorithm has specific parameters. Each algorithm also has both general and advanced parameters. General parameters are integral to creating an accurate model, even for beginners. Advanced parameters might improve accuracy, but require in-depth understanding of what they do.

Reference the table for each algorithm to see what parameters do:

Linear Regression

Name Description Options Default
Fit Intercept Decide whether you want the algorithm to calculate the intercept for your linear-regression model. Also known as the “constant,” the intercept is the expected mean value of y where x equals 0.
  • On
  • Off
On
Normalize Decide whether you want the algorithm to normalize your targets. Normalization adjusts your targets in such a way that you can compare them on a common scale with other data, which may help you identify associations in your data.
  • On
  • Off
On

Decision Tree

Name Description Options Default
Bootstrap Bootstrapping, the foundation of bagging, is a method used to sample the dataset for purposes of training. This method involves iteratively creating subsamples of your dataset to simulate new, unseen data, which you can use to improve the generalizability of your model.
  • On
  • Off
On
Criterion Use the Criterion parameter to select a method to measure how well the decision-tree algorithm splits your data into different nodes.
  • Mean Squared Error (MSE)
  • Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Max Depth Max Depth is the longest path from a root to a leaf of a tree. Deeper trees have more splits and capture more information about the data.
  • Unlimited: Nodes expand until all leaf nodes are pure (in other words, consist completely of data that belong to a single class) or until all leaf nodes contain less than what you specify in the Min Samples Split parameter.
  • Limited: Limits the expansion by splits.
Limited: 100
Max Features Max Features sets the maximum number of features your decision tree considers when looking for a best first split.
  • Auto: Evaluate a number of features equal to the total number of features in the dataset.
  • None: Evaluate a number of features equal to the total number of features in the dataset.
  • Square Root: Evaluate a number of features equal to the square root of the total number of features in the dataset.
  • Log2: Evaluate a number of features equal to the binary logarithm of the total number of features.
  • User-Selected Integer: Evaluate a number of features at each split equal to the number you select.
  • User-Selected Fraction: Evaluate a number of features equal to a user-selected fraction of the total number of features.
Auto
Max Leaf Nodes Max Leaf Nodes is the upward limit on the total number of leaf nodes your algorithm can generate. It grows nodes up to the maximum number in a best-first manner. The algorithm determines what nodes are best based on their capacity for impurity reduction. Use the Criterion parameter to specify how you want to measure impurity reduction. Any integer or None. None
Min Impurity Decrease Min Impurity Decrease sets the minimum threshold of impurity reduction required for the decision tree to split into a new node. So a split occurs where it would decrease impurity by an amount equal to or greater than Min Impurity Decrease, a split occurs. Use the Criterion parameter to specify how you want to measure impurity reduction. Any float. 0.0
Min Samples Split Min Samples Split sets the minimum threshold of samples required for the decision tree to split into a new node. The algorithm can consider as few as one sample or as many as all samples. Any integer or fraction. Integer: 2
Min Weight Fraction Leaf Min Weight Fraction Leaf is the minimum threshold of weight required for the decision tree to split into a new node. That threshold is equal to the minimum fraction of the total weights for all samples. The decision-tree algorithm assumes equal weights by default. Any float. 0.0
Presort Use this parameter to presort the data, which might help the algorithm find best splits faster.
  • On
  • Off
Off
Random Seed Random Seed specifies the starting number for generating a pseudorandom sequence. If you select None, a random-number generator picks a starting number.
  • Seed
  • None
None
Splitter Splitter is the strategy used for splitting at a node. It includes options for the best first split and the best random split. The algorithm determines what nodes are best based on their capacity for impurity reduction.
  • Best: This option requires more computational power and might risk overfitting.
  • Random: This option might find paths through the tree if certain associations have weak signals.
Best

Random Forest

Name Description Options Default
Bootstrap Bootstrapping, the foundation of bagging, is a method used to sample the dataset for purposes of training. This method involves iteratively creating subsamples of your dataset to simulate new, unseen data, which you can use to improve the generalizability of your model.
  • On
  • Off
On
Criterion Use the Criterion parameter to select a method to measure how well the random-forest algorithm splits your data into different nodes, which comprise the many different trees in your random forest.
  • Mean Squared Error (MSE)
  • Friedman Mean Squared Error (FMSE)
  • Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Max Depth Max Depth is the longest path from a root to a leaf for each tree in the forest. Deeper trees have more splits and captures more information about the data.
  • Unlimited: Nodes expand until all leaf nodes are pure (in other words, consist completely of data that belong to a single class) or until all leaf nodes contain less than what you specify in the Min Samples Split parameter.
  • Limited: Limits the expansion by splits.
Unlimited
Max Features Max Features sets the maximum number of features each decision tree in the forest considers when looking for a best first split.
  • Auto: Evaluate a number of features equal to the total number of features in the dataset.
  • None: Evaluate a number of features equal to the total number of features in the dataset.
  • Square Root: Evaluate a number of features equal to the square root of the total number of features in the dataset.
  • Log2: Evaluate a number of features equal to the binary logarithm of the total number of features.
  • User-Selected Integer: Evaluate a number of features at each split equal to the number you select.
  • User-Selected Fraction: Evaluate a number of features equal to a user-selected fraction of the total number of features.
Auto
Min Impurity Decrease Min Impurity Decrease sets the minimum threshold of impurity reduction required for a decision tree to split into a new node. So a split occurs where it would decrease impurity by an amount equal to or greater than Min Impurity Decrease. Use the Criterion parameter to specify how you want to measure impurity reduction. Any float. 0.0
Min Samples Split Min Samples Split sets the minimum threshold of samples required for the decision tree (in a random forest) to split into a new node. The algorithm can consider as few as one sample or as many as all samples. Any integer or fraction. Integer: 2
Min Weight Fraction Leaf Min Weight Fraction Leaf is the minimum threshold of weight required for a decision tree to split into a new node. That threshold is equal to the minimum fraction of the total weights for all samples. The random-forest algorithm assumes equal weights by default. Any float. 0.0
Number of Estimators Number of Estimators is the number of trees you want to create as part of the forest. Any integer. 100
Random Seed Random Seed specifies the starting number for generating a pseudorandom sequence. If you select None, a random-number generator picks a starting number.
  • Seed: Select an integer for the random number generator.
  • None: No repeatability.
None