Problem formulation (Dual)
Problem formulation default is Primal. Sometimes a primal optimization problem can be transformed to a dual, or related, problem, making it easier to solve. Dual formulation is only implemented for L2 penalty with liblinear solver.
Fit intercept
Fit intercept default is True. Fit intercept specifies if a constant (i.e., bias or intercept) should be added to the decision function.
- Options:
- True - The y-intercept is determined by the line of best fit.
- False - The y-intercept is set to 0.
Random seed
Random seed (known as random state in Scikit-learn) is the setting that makes randomness repeatable or not. Therefore, it is a computation that is used for reproducibility of models. All algorithms used to build a model use randomness in some sense: for example, how to split the data into validation sets, sampling purposes, and for constructing helper functions. Set this parameter when you want to ensure you can reproduce results and that any changes you see are from parameter choices, not the underlying randomness necessary for building the algorithm.
- Options:
- Random - Select an integer for the random number generator.
- None - None.
Random seed makes randomness repeatable when you don't want to reuse the same seed for every iteration. You should apply it when you want to create a seed that makes the randomness repeatable.
Class weight
Class weight default is none. Class weights are used to balance the classes in the dataset when they are unbalanced. This works by assigning a higher weight, or higher importance, to the minority class and results in improved model performance. All classes have a weight of one unless another weight is assigned.
- Options:
- None: No change to the default. Class weight is one for all classes by default.
- Balanced: Balanced mode fills in the class weights for you. It uses the values of the target variable to automatically adjust class weights. For example if there are two classes, 0 and 1, and there are many more samples for one of the classes, it will assign a higher weight to the class with fewer samples.
Class weight is used when there is a class imbalance in the dataset. Examples of class imbalance include fraud detection where a detected fraud event is rare (i.e., the majority of the transactions collected show no detected fraud event), and electronic component failure data where only a few collected transactions show component failure.
When a class imbalance is not accounted for in the training dataset, the algorithm, or model, will learn to classify every transaction as clean (e.g., non fraudulent, non failure) because doing so results in extremely high accuracy.
When should I apply it?
Apply this parameter when there is a clear class imbalance and when the accuracy of a model seems too good to be true.
Intercept scaling
Fit intercept is a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. This parameter only applies if one of the following is selected: liblinear or Fit intercept (fit_intercept).
The synthetic feature weight is impacted by Penalty (L1/L2 regularization). An increase in intercept scaling decreases the effect of the penalty on the synthetic feature weight (and therefore on the intercept).
Apply this parameter only when using Fit intercept.
Type an integer in the text box or use the arrows to apply an amount.
Tolerance
The recommended tolerance is .0001. Tolerance is tolerance for the stopping criteria. It tells the model to stop searching for a minimum (or maximum) once a tolerance is achieved, i.e., once you're close enough.
Adjusting this parameter is a tradeoff between having a model that is accurate vs. overfit. Setting tolerance too low for your training data may result in the model being overfit, or too specific. That is, a model that only fits the training data. If the model is overfit, introducing unseen data causes the model accuracy to decrease. Use tolerance to set a stopping criteria and avoid overfitting.
Type an integer in the text box or use the arrows to apply an amount.