1. Individual and Integration
Integrated Learning (Ensemble learning) accomplishes learning tasks by building and combining multiple learners, sometimes referred to as multi-classifier systems (Multi-classifier system).
The general structure of integrated learning is shown in the following figure:
Create a set of "individual learners" and then combine them with some sort of strategy. Individual learners usually have an existing learning algorithm from the training data, such as: Decision Tree, BP Neural network and so on.
Integration is divided into "homogeneous" and "heterogeneous". The integration method containing the same type of individual learner is "homogeneous", otherwise "heterogeneous".
The result of integrated learning is generated by voting method, so to get good integration, individual learners should be "good and different", that is, individual learners have certain accuracy and diversity.
According to the method of individual learner's generation, the current integrated learning methods can be broadly divided into two categories, namely, the existence of strong dependencies between individuals, the serialization method that must be serialized, and the parallel method that no strong dependency relationship can generate simultaneously. The former is represented by the boosting, the latter being the bagging and the random forest (randomness Forest) 2. Boosting
Boosting is a family of algorithms that can promote a weak learner to a strong learner. Its working mechanism is similar: first train a base learner from the initial training set, and then adjust the distribution of training samples according to the performance of the base learner, so that the training samples of the previous base learners are more concerned and then train the next base learner based on the adjusted sample distribution. Until the number of base learners reaches the pre-specified value T T, the T-T base learner is weighted together. 2.1 AdaBoost and its derivation
The most famous of the
Boosting family algorithm is the adaboost algorithm, which has several derivation methods, which is easier to understand is based on the additive model, the linear combination of the base learner
H (x) =∑t=1tαtht (x) h (x) =\sum_{t=1}^t\alpha _{t}h_{t} (x)
to minimize the exponential loss function. The minimization of the exponential loss function is obtained by biasing the H (x) H (x) in the exponential loss function equal to 0:
H (x) =12LNP (f (x) =1|x) P (f (x) =−1|x) h (x) =\ Frac{1}{2}ln\frac{p (f (x) =1|x)}{p (f (x) =-1|x)}
Therefore, there are:
sign (H (x)) =sign (12lnP (f (x) =1|x) P (f (x) =−1|x)) =argmaxp ( F (x) =y|x), sign (H (x)) =sign (\frac{1}{2}ln\frac{p (f (x) =1|x)}{p (f (x) =-1|x)}) =argmaxp (f (x) =y|x),
y∈−1,1 y\in{-1,1
This means that sign (h (x)) sign (H (x)) has reached the Bayesian optimal error rate, in other words, if the exponential loss function is minimized, the classification error rate will also be minimized. Because the exponential loss function is continuous, we use it to replace the 0/1 loss function as the optimization target.
in the AdaBoost algorithm, the first base classifier H1 H1 is obtained by directly using the base learning algorithm for the initial data distribution, and thereafter the generation of the iteration HT H_{t} and a