Logistic Regression Classifier
The Logistic regression tool creates a model that estimates the probability that the target (what you want to predict) will be one of two possible outcomes. It does this by relating a binary target variable (such as yes/no, pass/fail) to one or more features. To learn more about the Scikit-learn algorithm, Logistic Regression, or to see a table to understand the allowable combination of parameters, visit Scikit-learn.
Before using the tool
Start with an existing workflow. You should first clean and prep your dataset. Once your dataset contains only the relevant data you need for your business use case, then start building a pipeline using the Machine Learning tools.
Add the tool
- Click the Classification tool or the Regression tool in the Machine Learning tool palette and drag it to the workflow canvas, connecting it to your dataset.
- In Algorithm, select the algorithm tool you want to configure.
- Configure the tool.
Configure the tool
Configure the parameters or use the default settings. Parameters are set to ayx-learn defaults to ensure accuracy and reproducibility. Each use case is different. The default settings do not represent a single global best combination for all use cases. Understand the parameters before changing them. For best practices, avoid making assumptions, and use a test dataset to assess the performance of your model whether your objective is prediction or not.
To reset to defaults, click the reset icon. To find out more about a parameter, click the parameter's tooltip.
Multi class default is Auto. Multi class tells the algorithm, or model, what type of data you have.
- Options:
- Ovr (one vs. rest) - A binary problem is fit for each label. Use Ovr for binary classification, i.e., if you want to know if something is A or B. For example, you want to classify an animal as a dog or a cat.
- Multinomial - The loss minimized is the multinomial loss fit across the entire probability distribution, even when the data is binary. Not available for liblinear. Use Multinomial if you have more than two classes. For example if you want to classify a species as plant, animal, or other. With Multinomial, the probability that the data belongs to one of the classes always adds up to 100%. In other words, it could be 55% likely that it's a plant, 40% likely that it's an animal, and 5% likely that it's other.
- Auto - Selects Ovr if the data is binary or if the solver is liblinear. Otherwise, selects Multinomial.
Multi class determines how the decision for the output is made. Consider applying this parameter if you have more than two classes or you want to force a multinomial dataset into Ovr classification.
Solver
The recommended solver is liblinear. Liblinear may not be optimal for multinomial datasets.
Solver is the engine that runs when the model is training. It is the "guts" of the training for logistic regression.
Scikit-learn uses the solver to figure out what the weights are.
Consider the following guidelines when selecting a solver to use:
- Smaller datasets - Use liblinear.
- Larger datasets - Use sag or saga for faster speed.
- Multiclass problems - Only newton-cg, sag, saga, and lbfgs handle multinomial loss. Liblinear is limited to one-versus-rest (Ovr) schemes.
- Penalty L2 - only newton-cg, lbfgs, and sag handle L2 penalty.
- Penalty L1 - only liblinear and saga handle L1 penalty.
- Fast convergence - When using sag and saga, fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
Penalty
Penalty default is L2. Penalty, or regularization, refers to modifying the loss function to penalize certain values of the weights your model is learning.
One of the main errors you want to avoid when training your model is overfitting. One way to address this is to apply a penalty to the weights. Penalty specifies the norm used in the penalization.
- Options:
- L2 (default) - This is the most common type of penalty. L2 results in weights being small but non-zero.
- L1 - Often results in many weights being exactly zero.
L1 is almost never used while L2 is more sensitive to outliers.
Max iterations default is 100. Max iterations is the maximum number of iterations taken for the solvers to converge. In other words, this is the number of iterations (run over and over) it takes for some of the solvers (newton-cg, sag, and lbfgs) to converge on the right parameters for your dataset.
This parameter has an impact on model accuracy. The default is 100 and so a large change in either direction will have an equal impact. For example, 2 is not enough iterations for the model to learn but 1000 iterations would likely result in overfitting.
Add more iterations when you think you can squeeze in more accuracy by iterating more. Use less iterations if you think your model is susceptible to overfitting.
Type an integer in the text box or use the arrows to apply an amount.
Regularization tuner default is 1.0. This parameter adjusts regularization strength. It must be a positive float. Regularization tuner generalizes the model so that it is not specific to the training data. Apply this parameter when you suspect your model is overfit or it isn't performing well (e.g., the regularization was too strong).
Type an integer in the text box or use the arrows to apply an amount.
Problem formulation default is Primal. Sometimes a primal optimization problem can be transformed to a dual, or related, problem, making it easier to solve. Dual formulation is only implemented for L2 penalty with liblinear solver.
Fit intercept
Fit intercept default is True. Fit intercept specifies if a constant (i.e., bias or intercept) should be added to the decision function.
- Options:
- True - The y-intercept is determined by the line of best fit.
- False - The y-intercept is set to 0.
Random seed (known as random state in Scikit-learn) is the setting that makes randomness repeatable or not. Therefore, it is a computation that is used for reproducibility of models. All algorithms used to build a model use randomness in some sense: for example, how to split the data into validation sets, sampling purposes, and for constructing helper functions. Set this parameter when you want to ensure you can reproduce results and that any changes you see are from parameter choices, not the underlying randomness necessary for building the algorithm.
- Options:
- Random - Select an integer for the random number generator.
- None - None.
Random seed makes randomness repeatable when you don't want to reuse the same seed for every iteration. You should apply it when you want to create a seed that makes the randomness repeatable.
Class weight default is none. Class weights are used to balance the classes in the dataset when they are unbalanced. This works by assigning a higher weight, or higher importance, to the minority class and results in improved model performance. All classes have a weight of one unless another weight is assigned.
- Options:
- None: No change to the default. Class weight is one for all classes by default.
- Balanced: Balanced mode fills in the class weights for you. It uses the values of the target variable to automatically adjust class weights. For example if there are two classes, 0 and 1, and there are many more samples for one of the classes, it will assign a higher weight to the class with fewer samples.
Class weight is used when there is a class imbalance in the dataset. Examples of class imbalance include fraud detection where a detected fraud event is rare (i.e., the majority of the transactions collected show no detected fraud event), and electronic component failure data where only a few collected transactions show component failure.
When a class imbalance is not accounted for in the training dataset, the algorithm, or model, will learn to classify every transaction as clean (e.g., non fraudulent, non failure) because doing so results in extremely high accuracy.
When should I apply it?
Apply this parameter when there is a clear class imbalance and when the accuracy of a model seems too good to be true.
Fit intercept is a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. This parameter only applies if one of the following is selected: liblinear or Fit intercept (fit_intercept).
The synthetic feature weight is impacted by Penalty (L1/L2 regularization). An increase in intercept scaling decreases the effect of the penalty on the synthetic feature weight (and therefore on the intercept).
Apply this parameter only when using Fit intercept.
Type an integer in the text box or use the arrows to apply an amount.
The recommended tolerance is .0001. Tolerance is tolerance for the stopping criteria. It tells the model to stop searching for a minimum (or maximum) once a tolerance is achieved, i.e., once you're close enough.
Adjusting this parameter is a tradeoff between having a model that is accurate vs. overfit. Setting tolerance too low for your training data may result in the model being overfit, or too specific. That is, a model that only fits the training data. If the model is overfit, introducing unseen data causes the model accuracy to decrease. Use tolerance to set a stopping criteria and avoid overfitting.
Type an integer in the text box or use the arrows to apply an amount.