Machine Learning Concepts

Source: Internet
Author: User
Tags rounds

I. bootstrps bagging boosting

These concepts are frequently used. Now I have carefully studied them:
They all belong to the integrated learning method. , (For example, bagging, boosting, stacking), which integrates the training learner, and the principle is derived from the PAC learning model (probably approximately correctk ). Kearns and valiant indicate that in the PAC learning model, if one
Polynomial-level learning Algorithm To identify a group of concepts, and the recognition accuracy rate is very high, this group of concepts is strongly learnable. If the learning algorithm recognizes a group of concepts, the accuracy rate is only slightly better than random prediction, this group of concepts is weak and learnable. They proposed the equivalence between weak learning algorithms and strong learning algorithms, that is, whether they can be upgraded to strong learning algorithms. If the two are equivalent, then when learning a concept, you only need to find a weak learning algorithm that is slightly better than random prediction, and then you can promote it to a strong learning algorithm, instead, you do not have to directly find the hard-to-obtain algorithm.
Bootstraps : The name comes from the idiom "pull up by your own bootstraps", which means that it relies on your own resources. It is a sampling method with a replacement, we also found a method called jackknife, which removes a sample every time.
Bagging : The abbreviation of Bootstrap aggregating. This learning algorithm is trained in multiple rounds. Each training set is composed of N training records randomly retrieved from the initial training set, the initial training example can appear multiple times in a certain training set or does not appear at all. Then, a prediction function sequence H is obtained ., the final prediction function H of regression H uses the voting method for classification issues, and uses the simple average method to identify the new example.
-(Train the r classifier fi. If the classifier is the same, the parameters are different. FI is obtained by randomly retrieving (retrieving and returning) n documents from the training set (N documents. -For the new document D, use the r classifier to classify. The most obtained category is the final category of D .)
Boosting: The main feature is Adaboost (Adaptive boosting ). During initialization, an equal weight of 1/N is assigned to each training instance. Then, the learning algorithm is used to train t-round training in the training set. After each training, assign a large weight to the training examples that fail to be trained, that is, enable the learning algorithm to concentrate on learning the training samples that are difficult to compare in subsequent learning, in this way, a prediction function sequence H is obtained, where H. it also has a certain weight. The prediction function with good prediction performance has a large weight, and vice versa. The final prediction function H uses a weighted voting method for classification issues. The weighted average method is used to identify the regression problem.
(Similar to the bagging method, but the training is carried out in serial, the K classifier is focused on the documents of the error in the first K-1 classifier, that is, not random, instead, it increases the probability of getting these documents .)
The difference between Bagging and boosting is that the training set selection of bagging is random, and the training sets of each round are independent of each other, while the training set of boostlng is independent, the selection of training sets is related to the learning results of the previous rounds. The prediction functions of bagging have no weight, while boosting has weight. The prediction functions of bagging can be generated in parallel, the prediction functions of boosting can only be generated sequentially. For extremely time-consuming learning methods such as neural networks. Bagging can save a lot of time overhead through parallel training.
Both bagging and boosting can effectively improve the accuracy of classification. In most data sets, boosting is more accurate than bagging. In some data sets, boosting will cause degradation. --- Overfit

The voting method (voting) used in text classification is a typical integrated machine learning method. It combines multiple weak classifiers to obtain a strong classifier, including bagging and boosting. The main difference between the two is that the sampling method is different. Bagging uses even sampling, while boosting uses the error rate for sampling. Therefore, boosting has a better classification accuracy than bagging. Although the voting classification method has a high classification accuracy, it takes a long time to train. An Improved AdaBoost method of boosting has good performance in mail filtering and text classification.

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/miyalu/archive/2010/05/16/5598360.aspx

 

========================================================== ==============================================

========================================================== ==============================================

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.