"Algorithm" decision tree

Source: Internet
Author: User

This article aims to use the shortest article, the most popular description, so that readers quickly grasp the decision tree is what? Do? How to use it? Three major problems. It only takes you 25 minutes to focus.

1. Understand the concept:

A) Decision tree is a classification algorithm, through training data set to build decision tree, can efficiently classify unknown data, mainly used to make predictions

b) The decision tree is a tree structure, each leaf node corresponds to a classification, the non-leaf node corresponds to the division of a property, according to the different values of the sample on the attribute to divide it into several subsets.

2. How do I use it?

A). Based on some characteristic values, a tree structure is established, which can be predicted according to the criteria of tree structure evaluation. b). The internal node of the tree represents the judgment of an attribute, and the branch of the node is the corresponding judgment result; the leaf node represents a class label 3. Key point: The core problem of constructing decision tree is how to select the appropriate attribute to split the sample at each step. This will use information entropy, the greater the entropy value, the higher the uncertainty, you can put this attribute in the root node of the tree. 4. For example: If you want to pass the past weather, whether the weekend, whether to promote the relationship between three properties and sales to predict the level of future sales, then you can use the previous data, select the three attributes, by calculating the value of information entropy to sort, the sample is divided. So you can form a tree. 5. Steps to establish a decision tree: a) Calculate the total information entropy B) Calculate the information entropy for each test attribute C) Calculate weather, whether weekends and whether promotional properties information gain value information gain value = Total Information entropy-Test attribute information entropy D) For each branch node continues the calculation of the information gain, so repeatedly, until there is no new node branch, eventually forming a decision tree e) When a new sample comes in, you can use this decision tree to make predictions 6. Calculation of information entropy of supplementary ————

The information gain is based on the Shannon theory, and it finds that the properties R has the following characteristics: The information gain before and after the splitting of the attribute R is greater than the other properties. The information here is defined as follows:

where m represents the number of Class C in DataSet D, Pi represents the probability that any record in D belongs to CI, and pi= (the number of records in the collection of CI classes in D) is calculated/| d|). Info (d) indicates the amount of information required to separate the classes of DataSet D from each other.

Example: a) For example, we throw a cube a into the air, the surface of the land when the ground is f1,f1 value of {1,2,3,4,5,6}, F1 entropy entropy (F1) =-(1/6*log (1/6) +...+1/6*log (1/6)) =-1*log (1/6 =2.58;B) Now we change the cube A to the positive tetrahedron B, the ground surface for the F2,F2 is the value of {1,2,3,4}, F2 entropy entropy (1) =-(1/4*log (quarter) +1/4*log (section) +1/4*log (1/4) + 1/4*log (=-log) =2;c) If we change to a ball C, the ground surface is F3 when the floor is dropped, obviously no matter how the ground is thrown is the same face, that is, the value of the F3 is {1}, so its entropy entropy (F3) =-1*log (1) = 0. Conclusion: The more the number of polygons can be seen, the greater the entropy value, and when there is only one sphere, the entropy value is 0, at this point the uncertainty is 0, that is to say, the downward surface is determined.

"Algorithm" decision tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.