July algorithm--December machine learning online Class-11th lesson notes-random forest and ascension

Last Update:2016-05-13 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

July Algorithm--December machine Learning online Class -11th lesson notes-random forest and ascension

July algorithm (julyedu.com) December machine Learning Online class study note http://www.julyedu.com

Random forest: Multiple trees, dividing the current node is the most important

1, decision Tree

Decision Tree Learning adopts a top-down recursive method, whose basic idea is to construct a tree with entropy as the fastest descending value.

The entropy at the leaf node is zero , and the instances in each leaf node belong to the same class.

The following focus is on choosing what entropy to drop the fastest

1.2, the decision Tree generation algorithm:

The key to establishing a decision tree is to select which attribute to use as the basis for classification in the current state.

According to different objective functions, the following three algorithms are established in decision tree.

ID3 C4.5 CART, Three kinds of learning ideas like

1.2.1 Information gain (ID3)

1, the concept: the corresponding entropy and conditional entropy are respectively called empirical entropy and empirical condition entropy.

Information gain: the degree to which the uncertainty of the information of Class X is reduced by indicating the information of feature a.

Definition: feature A to the information gain G (d,a) of the training data set D, defined as the empirical entropy H (d) of the set D and the empirical condition entropy H (d| A) The difference, namely:

is essentially computing H (D), H (d| A

2, Basic mark

3, the calculation method of information gain

Calculate Empirical entropy for DataSet D

Traverse all features, for feature a:

Calculates the empirical condition Entropy H (d|) for the data set D of the feature a A

Calculate the information gain of feature A: g (d,a) =h (D) –h (d| A

H (d| A) is calculated as follows:

Select the feature with the greatest information gain as the current split feature, which calculates the largest selection of each feature.

1.2.2,c4.5 (Information gain rate)

Information Gain Rate:

Gini coefficient:

Discussion on the coefficient of Gini

Second definition of Gini coefficients

The higher The information gain (rate) of an attribute, the greater the/gini exponent, indicating that the attribute has a stronger ability to reduce the entropy of the sample, which makes the data more capable of becoming deterministic from uncertainty .

1.3 Decision tree over-fitting

The decision tree has a good ability to classify the training, but the unknown test data may not have good classification ability, and the generalization ability is weak, that is, there may have been a fitting phenomenon.

Pruning and random forest means to prevent overfitting

A, bagging's strategy (plus a random sampling)

1,bootstrap Aggregation

2, resampling from sample set (duplicates) to select N Samples

3, on all attributes, create a classifier for the N samples (ID3, C4.5, CART, SVM, logistic regression

4, repeat the above two steps m, that is, to obtain a M classifier? Put the data on this m classifier,

5, finally, according to the voting results of the M classifier, determine which category the data belongs to.

B, Random Forest

Unlike bagging's strategy: (not done on all attributes, equivalent to increased randomness)

1, n samples were selected from sample concentration by bootstrap sampling;

2, randomly select K attributes from all attributes, select the best segmentation attribute as node point to establish the cart decision tree ;

3, repeat the above two steps m times, that is, the establishment of M-cart decision Tree

4, the M-cart forms a random forest, which, by voting results, determines which category of data belongs

1.4 Voting mechanisms

One possible scenario

1.5 Adaboost

The most critical of these are the coefficients of the GM (X), and the points to note:

For m=1,2,... M, made a m classifier

A, updating the weight distribution of the training data set

ZM is a normalization factor

B, construct a linear combination of basic classifiers

C, get the final classifier

If this classification is wrong, the next time, the weight will be increased, if the classification is correct, the weight will be reduced

July algorithm--December machine learning online Class-11th lesson notes-random forest and ascension

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More