Iv. selection of Algorithms
This step makes me very excited, finally talked about the algorithm, although no code, no formula. Because the tutorial does not want to go deep to explore the details of the algorithm, so focus on the application of the algorithm from the scenario, the shortcomings of the algorithm, how to choose the algorithm to expand vertically.
Our training model is generally divided into supervised learning, unsupervised learning and reinforcement learning three kinds. The first two are mentioned in the tutorial, and the training algorithm is divided into regression, classification and clustering. Among them, regression and classification belong to supervised learning, and clustering belongs to unsupervised learning.
The proposed algorithm for supervised learning is mentioned in the tutorial:
Linear regression (Linear regression),
Lasso regression (Lasso regression),
Ridge regression (Ridge regression),
Elastic network regularization (Elastic net regularization),
p.s. logical regression (logistic regression).
and the Integrated Learning (Ensemble learning) algorithm:
An integrated algorithm is a new classification algorithm that is formed by integrating multiple classifiers. Also known as the meta-algorithm (META-ALGORITHM).
The Regression tree, the Taxonomy Tree (classification tree), collectively called the cart. The regression tree is the algorithm under the regression model, the algorithm under the classification tree classification model, which are all a kind of decision tree (decision trees).
When the integration model is a decision tree, there are generally two integrated learning algorithms: Random Forest (Forest), gradient enhancement tree (Gradient Boosted trees).
1. Regression
The most common way to use linear regression is the least squares (Least squares), the least squares objective function:
$$ Min_{\omega}\space\space\space\vert X_{\omega}-y \vert_{2}^{2} $$
Although least squares is the most common method of analysis, however, it has two drawbacks: overfitting (overfitting) and non-linear (i.e. discrete) relationships are not easily expressed.
(1) How to prevent overfitting? The better method is to join regularization (regularization), which is a common optimization algorithm: Lasso regression (least squares + L1 norm Regular term (1 order norm)), Ridge regression (least squares + L2 norm Regular term (2 order norm)), Elastic network regularization (L1 + L2 Norm Regular term). The coefficients of l1,l2 norm regularization are called penalty factors (penalty term).
Target function of Lasso:
$$ Min_{\omega}\space\space\space\frac{1}{2n}\vert X_{\omega}-y \vert_{2}^{2} + \alpha\vert\omega\vert_{1} $$
$$ where \alpha is a constant (that is, the penalty factor), \vert\omega\vert_{1} is the l_{1} norm $$.
Ridge is suitable for situations where the training model has been fitted, the virtual variable traps (referred to in the fourth article), and so on, Ridge's minimized objective function:
$$ Min_{\omega}\space\space\space\vert X_{\omega}-y \vert_{2}^{2} + \alpha\vert\omega\vert_{2}^{2} $$
$$ here \alpha \geq 0. $$
Target function of elastic-net:
$$ Min_{\omega}\space\space\space\frac{1}{2n}\vert X_{\omega}-y \vert_{2}^{2} + \alpha\rho\vert\omega\vert_{1} + \ Frac{\alpha (1-\rho)}{2}\vert\omega\vert_{2}^{2} $$
2. Classification
Logistic regression (logistic regression), logistic regression is the corresponding algorithm under the classification task of linear regression.
L2 Norm-logistic regression model:
$$ Min_{\omega,c}\space\space\space\frac{1}{2}\omega^{t}\omega + c\sum_{i=1}^{n}log (E^{-y_{i} (X_{i}^{T}\omega + c)} + 1) $$
L1 Norm-logistic regression model:
$$ Min_{\omega,c}\space\space\space\vert\omega\vert_{1} + c\sum_{i=1}^{n}log (E^{-y_{i} (X_{i}^{T}\omega + c)} + 1) $$
3. Integrated Learning
1. Random forest is an integrated learning algorithm based on bagging thought;
2. Gradient enhancement tree is an integrated learning algorithm based on boosting thought.
Resources
Some noun concepts reference:
https://www.zhihu.com/question/23194489
https://www.zhihu.com/question/269063159
https://www.zhihu.com/question/28221429
https://www.zhihu.com/question/20473040
Https://elitedatascience.com/algorithm-selection
Fitting method and Formula reference:
Http://scikit-learn.org/stable/modules/linear_model.html
Https://en.wikipedia.org/wiki/Lasso_ (statistics)
Https://en.wikipedia.org/wiki/Tikhonov_regularization
Some explanations on two kinds of integrated learning ideas are as follows:
https://www.zhihu.com/question/29036379
https://www.bbsmax.com/A/lk5axwNJ1O/
50285817
46507387
Http://scikit-learn.org/stable/modules/ensemble.html#classification
Kaggle Machine Learning Tutorial Study (v)