As an important decision, we may consider absorbing multiple experts and not just one person's opinion. So is the problem with machine learning, which is the idea behind the meta-algorithm (META-ALGORITHM) .
meta-algorithm is a way to combine other algorithms , and one of the most popular algorithms is the adaboost algorithm . Some people think that AdaBoost is the best way to supervise learning , so this method is one of the most powerful tools in the Machine learning Toolkit.
The general structure of an integrated learning or meta-algorithm is to create a set of "individual learners"and then combine them with some sort of strategy. An individual learner is usually generated from training data by an existing learning algorithm.
According to the method of individual learner's generation , the current integrated learning methods can be broadly divided into two categories , namely
1. There are strong dependencies between individual learners, and serialization methods that must be serially generated, typically represented by boosting, where AdaBoost is boosting One of the most popular versions of
2. There is no strong dependency between individual learners, concurrent generation of parallelization methods , typically represented by bagging and "random Forest" (Forest)
AdaBoost
Advantages: Low generalization error rate, easy coding, can be applied to most classifiers, no parameter adjustment
Cons: Sensitive to outliers.
Working with Data types: numeric and nominal data
Bagging: A classifier construction method based on data random re-sampling
The bootstrap aggregation method (bootstrap aggregating), also known as the bagging method , is directly based on the self-Service sampling method (bootstrap samping).
Given a dataset containing m samples , we first randomly take a sample into the sample set and put the sample back into the initial data set so that the sample may still be selected at the next sample, so that the m-Random sampling operation , we get a sample set with M samples. This makes the T- new datasets available after selecting T from the original dataset, and the size of each new dataset is equal to the size of the original dataset. After the T new data set is built, a learning algorithm is used for each dataset to get the t classifier . When we want to classify new datasets, we can classify them by applying this T classifier. At the same time, select the most categories in the classifier poll results as the final classification result (weights equal).
Boosting
Boosting is a very similar technique to bagging. The types of multiple classifiers that they use are consistent.
in boosting , different classifiers are obtained through serial training , and each new classifier is trained according to the performance of the trained classifier . Boosting is the acquisition of new classifiers by focusing on those data that have been wrongly divided by existing classifiers .
The result of the boosting classification is based on the weighted summation of all classifiers , the classifier weights in the bagging are equal, and the classifier weights in boosting are not equal , Each weight represents the success of its corresponding classifier in the previous iteration.
Now introduce the AdaBoost
Machine learning--adaboost meta-algorithm