Machine Learning Classic algorithm and Python implementation--meta-algorithm, AdaBoost

Source: Internet
Author: User
Tags svm

in the first section, the meta-algorithm briefly describes

In the case of rare cases, the hospital organizes a group of experts to conduct clinical consultations to analyze the case to determine the outcome. As with the panel's clinical consultations, it is often better to summarize a large number of individual opinions than a person's decision. Machine learning also absorbed the ' Three Stooges top Zhuge Liang ' (essentially by the three tsuginoikusanokimi the top of the slip of the head of the evolution of the idea), this is the idea of meta-algorithm. Meta-algorithms (META-ALGORITHM) are also called integration methods (Ensemble method), by combining other algorithms to form a better algorithm, the combination includes: the integration of different algorithms, Different parts of the data set use different algorithms to classify the integration or the same algorithm under different settings of integration.

With the idea of meta-algorithm, the equivalence of weak learning algorithm and strong learning algorithm in PAC (Probably approximately Correct) learning model-that is, combining any given weak learning algorithm, can it be promoted to strong learning algorithm? If they are equivalent, It is only necessary to promote the weak learning algorithm to strong learning algorithm, instead of looking for hard to obtain strong learning algorithm. The theory proves that, in fact, as long as the number of weak classifiers tends to infinity, the error rate of the combination of the strong classifiers will tend to zero.

Weak Learning Algorithm---Recognition error rate is less than 1/2 (that is, the accuracy rate is only slightly higher than the random guessing learning algorithm)

Strong Learning Algorithm---a learning algorithm that recognizes high accuracy and can be completed within acceptable time

This paper introduces several important methods of integrating multiple classifiers into one classifier--boostrapping method, bagging method and boosting algorithm.

1) Bootstrapping:

i) repeatedly sample n samples from a sample set D, there may be duplicate values in the new sample, or some values of the original sample collection are missing.

II) Statistical learning for the set of sub-samples per sample to obtain the hypothesis hi

III) combine several assumptions to form a final hypothesis hfinal

IV) Use the final assumptions for specific classification tasks

2) Bagging method

i) training classifier-sampling n < n samples from the overall sample set, training classifier CI for sampled sets, many kinds of extraction methods

II) The classifier votes, the final result is the winner of the classifier vote, each classifier weight is equal

3) Boosting

Boosting is a technique similar to bagging, where the types of multiple classifiers used are consistent. However, in the former, different classifiers are obtained through serial training, and each new classifier is trained on the performance of the trained classifier, and the new classifier is obtained by focusing on the data that has been wrongly divided by the classifier. The result of the boosting classification is based on the weighted summation of all classifiers, the classifier weights are not equal, each weight represents the success of its corresponding classifier in the previous iteration. There are many kinds of boosting algorithms, AdaBoost (Adaptive Boost) is one of the most popular, with SVM classification and called machine learning the most powerful learning algorithm.

AdaBoost is an iterative algorithm whose core idea is to train m weak classifiers for the same training set, each weak classifier assigns different weights, and then the weak classifiers are assembled to construct a stronger final classifier, the detailed process of the AdaBoost algorithm is elaborated in this paper.

section II, AdaBoost algorithm(i) Understanding AdaBoost

AdaBoost algorithm has ADABOOST.M1 and adaboost.m2 two kinds of algorithms, ADABOOST.M1 is what we usually call discrete AdaBoost, and adaboost.m2 is the generalization of M1. One conclusion about the AdaBoost algorithm is that when the weak classifier algorithm uses a simple classification method, the effect of boosting is obviously better than bagging. Boosting is better than bagging when the weak classifier algorithm uses C4.5, but there is no obvious one. Later, some scholars put forward the ADABOOST.MH and adaboost.mr algorithm to solve the multi-labeling problem, in which a form of ADABOOST.MH algorithm is called real boost algorithm---weak classifier output a possible degree, the range of the value is the entire R, and the corresponding weight adjustment, the AdaBoost algorithm generated by the strong classifier.

In fact: discrete adaboost means that the output value of the weak classifier is limited to { -1,+1}, and the corresponding weights are adjusted, and the AdaBoost algorithm is generated by the strong classifier. This article will explain the two classification of the adaboost algorithm, others refer to the ' AdaBoost principle, algorithm and application '.

The hypothesis is a two value classification problem, x represents the sample space, and Y={-1,+1} represents the sample classification. Make s={(xi,yi) |i=1,2,..., m} A sample training set, where Xi∈x,yi∈y. Again, we assume that the distribution of statistical samples is evenly distributed, so that in two classification categories (category 1 or 1) you can set the threshold to 0. In the actual training data, the sample is often unbalanced, and an algorithm is needed to select the optimal threshold value (such as Roc curve). The AdaBoost algorithm is to learn a classifier ym (x)--composed of M weak classifiers. When classifying, the new data point x is entered, if the YM (X) is less than 0 assigns the category of x to-1, if the YM (X) is greater than 0 assigns the category of X to 1. The threshold value of uniform distribution is 0, and the unbalanced distribution is based on the ROC curve and other methods to determine the optimal threshold of a classification.


Basic process: train a basic classifier (weak classifier) for different training sets, then integrate to form a stronger final classifier (strong classifier). Different training sets are implemented by adjusting the weights corresponding to each sample in the training data. The weights of each sample are determined after each training based on whether each sample in the training set is correctly classified and the accuracy of the last population classification. The new data that modifies weights is given to the lower classifier for training, and then the classifier that is trained each time is combined as the final decision classifier.

Each weak classifier can be any one of the machine learning algorithms, such as logistic regression, SVM, decision tree, etc.

AdaBoost has many advantages:

1) AdaBoost is a classifier with high accuracy

2) A sub-classifier can be constructed using various methods, and the AdaBoost algorithm provides a framework

3) When using a simple classifier, the calculated results are understandable, and the weak classifier is extremely simple to construct

4) Simple, no feature screening

5) Don't worry about overfitting

(ii) adaboost algorithm process

The complete AdaBoost algorithm is as follows (the total number of training samples is n, and M is the number of weak classifiers after the iteration has stopped (the cumulative error rate is 0 or the maximum number of iterations has been reached).

Given a training data set t={(X1,y1), (x2,y2) ... (Xn,yn)}, where instance, and instance space, Yi belongs to the tag set { -1,+1},adaboost is the purpose of learning a series of weak classifiers or basic classifiers from the training data, and then combining these weak classifiers into a strong classifier, the process is as follows:

Initially, the weights for each sample are the same (1/m), and a basic classifier H1 (x) is trained under this sample distribution. For samples with H1 (x) errors, the weights of their corresponding samples are increased, and for samples with the correct classification, their weights are reduced. This allows the wrong sample to be highlighted and a new sample distribution is obtained. At the same time, according to the situation of the wrong points to give H1 (x) a weight, indicating the importance of the basic classifier, the less the wrong points the greater the weight. Under the new sample distribution, the basic classifier is trained again, and the basic classifier H2 (x) and its weight are obtained. And so on, through the cycle of M times, I get M basic classifier and corresponding weight. Finally, the M basic classifier is summed up by a certain weight, and the desired strong classifier ym (X) is obtained. The stopping condition of the iteration is to reach the Training sample cumulative classification error rate of 0.0 or to reach the maximum iteration number.

(i) Initialize the weight distribution of the training data, and each training sample is given the same weight at the very beginning: 1/n.


(ii) for multi-round iterations, the stopping condition of the iteration is to reach the Training sample cumulative classification error rate of 0.0 or to reach the maximum iteration number L. with M = ..., M represents the number of rounds of iteration, that is, how many weak classifiers are obtained, m<=l.

a. Use the training data set with the weighted distribution DM to learn from the basic classifier:

             

B.calculating the classification error rate of the GM (x) on the training data set

             

From the above formula, it is known that the error rate of GM (x) on the training data set is the sum of the weights of the wrong sample by the weakly classifier GM (x). It is here that the training sample weighting factor has taken effect, all pointing to the current weak classifier error. Improve the weight of the classification error sample, the next classifier in the learning of its "status" is improved (with a single-layer decision tree as an example, because each time to get the current training sample of the smallest EM decision-making piles); and if this weak classifier again points wrong points, then its error rate of EM is also greater, The result is that the classifier's weight in the entire hybrid classifier is low---make good classifiers the overall weight value, and the lower classifier weight.

c. Calculate the weight coefficient of GM (x), am denotes the importance of GM (X) in the final classifier (purpose: To get the weight of the basic classifier in the final classifier):

               

It is known that EM <= 1/2 (two classification adaboost algorithm em can not be greater than the end of the), am >= 0;am with the reduction of EM increase, meaning that the smaller the classification error rate of the sub-class in the final classifier of the greater role.

In addition, if a classifier classification error rate of 0 calculates am will have a divisor of 0 of the exception, which is a boundary processing. At this point, you can set the error rate to a very small number, for example, 1e-16, depending on the data set. Observe the sample weight update to know: There is no wrong points, all the weight of the sample will not be further adjusted, the sample weight is equal to unchanged. Of course, the weak classifier weight alpha will be large, but because the algorithm does not stop, if there are other weak classifiers can also reach the training error rate of 0, also have a larger weight, so as to avoid a single weak classifier fully determine the case of strong classifiers. Of course, if the first weak classifier error rate is 0, then the entire classification is complete, and it has a large weight alpha is no harm. The following modification schemes are used:

Alpha = float (0.5*log (1.0-error)/max (error,1e-16))

d. Update the weight distribution of the training data set (objective: To obtain a new weight distribution of the sample) for the next iteration. This causes the weights of the samples that are classified by the basic classifier GM (x) to be increased, and the weights of the correctly categorized samples are reduced. In this way, the adaboost algorithm improves the ' status ' of the more difficult to classify samples.

ZM means that the sum of the weight factor is 1.0, so that vector d is a probability distribution vector. It is defined as

                  

(iii) The combination of each weak classifier to obtain the final classifier, as follows:

                 (iii) Python implementation of single decision tree AdaBoost algorithm

A single-layer decision tree (decision Stump, also called a decision tree stump) is a simple decision trees with only one stump in the decision tree, which is the decision classification based only on the individual characteristics of the sample. Single-layer decision tree is the most popular weak classifier in AdaBoost algorithm.

AdaBoost combines a number of different decision trees in a non-random way, showing amazing performance. First, the accuracy rate of decision tree is greatly improved, and can be comparable with SVM. Second, the speed is fast, and the basic parameters are not adjusted. Third, almost no overfitting. In this section, we implement the AdaBoost algorithm with a plurality of single-layer decision trees, it is worth noting that each basic classifier single-layer decision tree decision-making using the classification of the characteristics are in the sample n characteristics of the best choice (that is, at the level of classification feature selection, each single-layer decision tree is completely independent of each other, It is possible that several single-layer decision trees are based on the same sample feature, rather than the concatenation of sample features.

The version of the AdaBoost classification algorithm contains the decisionstump.py (Decisionstump object, whose properties are the decision stumps that contain the dim, Thresh, Ineqtype Three domains, methods have Buildstump (), Stumpclassify () and so on. ), adaboost.py, object_json.py, test.py, where adaboot.py implements the classification algorithm, object adaboost contains the attribute classifier dictionary adaboostclassifierdict and AdaBoost Train&classify methods and so on. To store and transfer fewer bytes, you can also add a new class to the AdaBoost module Adaboostclassifier only to store the taxonomy dictionaries and classification algorithms (not in this package). The test module contains an example of classifying using the AdaBoost classifier.

Since each basic classifier of the AdaBoost algorithm can adopt any kind of classification algorithm, the common scheme is to use Dict to store the learned AdaBoost classifier, such as:


AdaBoost objects can define private weak classification algorithms for decision Trees, SVM, etc., the train and classifier methods will create weak classifier instances of the response based on the current weak classifier type and call the private weak classification Train\classifer method to complete train\ Classify. It is to be remembered that the weak classifier object created by the AdaBoost train method is only used to invoke the corresponding weak classifier method, and all properties of the weakly categorical instance are stored in adaboostclassifierdict, which reduces the number of weak classifier instances. In addition, methods Jsondumpstransfer () and Jsonloadtransfer () support JSON storage and parsing by removing \ Creating instances based on the weak classifier type supported in Adaboostclassifierdict.

Using the classifier storage scheme and the corresponding classification function, AdaBoost supports each basic classifier to make the optimal choice in the decision tree, Bayesian, SVM and other supervised learning algorithms. Classification where the Classifiertype users in Adaboostclassifierdict can specify their own, thus on the basis of the above classified storage structure to do some good for the classifier program to write adjustments. The single-layer decision tree I implemented adaboost specifies that Classifiertype is desicionstump, that is, the basic classifier uses desicionstump, and each weak classifier is a DS object. So the storage structure can be adjusted as shown (for classification functions):


By adjusting the number of weak classifiers in the adaboost algorithm, the adaboost classifier with different classification error rates will be obtained. The test proves that the error rate is the lowest when numit=50.

The AdaBoost classification algorithm learning package is:

Machine learning AdaBoost

(iv) AdaBoost applications

Because the AdaBoost algorithm is a simple and easy to implement algorithm, it should be said to be a very suitable algorithm for the application in various classification scenarios. Some of the actual scenarios that the AdaBoost algorithm can use are:

1) Application scenarios for two classification or multi-classification

2) baseline--for classification tasks, simple, no overfitting, no classifier

3) for Feature selection (feature selection)

4) The boosting framework is used for the correction of the badcase-only new classifiers need to be added, without the need to change the original classifier

Machine Learning Classic algorithm and Python implementation--meta-algorithm, AdaBoost

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.