A decision tree algorithm for the introduction of machine learning

Source: Internet
Author: User

1. What are decision Trees (decision tree)          Decision tree is a tree structure similar to a flowchart, where each tree node represents a test on an attribute, Each branch represents the output of a property, and each leaf node represents the distribution of a class or class, and the topmost layer of the tree is the root node of the Tree.       cite an example. Xiao Ming students want to enjoy swimming according to the weather:      There are 6 properties, a sample is an example, the concept of learning is "whether to enjoy the movement", learning objective function: F:X->Y.       Based on the examples in the table above, we can try to use a tree structure flowchart to indicate that Xiaoming decides whether to enjoy the motion according to the 6 attributes:      from the tree above, we can see that A total of 14 instances (there are 9 instances of out-of-motion, 5 instances with no movement), looking down from the top of the tree, first see the Diamond options, meaning what is the weather like? Then three branches were separated--sunny, cloudy, rainy. The example is the weather attribute for sunny days, and decided to go to sports there are 2, do not go to sports there are 3, the weather is cloudy, and decided to go to the movement of 4, do not exercise 0; weather properties for rainy days, and decided to exercise 3, do not exercise 2. We can see that when a positive or inverse example of a tag is 0, the tree does not continue to expand (for example, when the weather attribute is cloudy, There are 0 instances that do not go to motion). If either the positive or the inverse is not 0 o'clock, then the tree will continue to expand according to the Attribute.       Decision tree is an important algorithm in machine learning classification Method.  2, How to construct a decision tree algorithm       (1) information entropy           information is an abstract concept, so how to do a quantitative operation of information? In the 1948, Shannon put forward the concept of "information entropy". The amount of information in a message is directly related to its uncertainty, to find out a very uncertain thing, or if we know nothing about a thing, we need to understand a lot of information, the measurement of the amount is equal to the amount of Uncertainty.           for example, the NBA Finals championship team, assuming you know nothing about the NBA team, how many times do you have to guess? (assuming that the odds are the same for each Team) here we can number the NBA teams that get into the playoffs (after the NBA seasonRace to pick 16 teams), and then use the dichotomy to guess (guess the champions team between Team 1-8, yes, in the second, not in the 9-16 team), so we have to guess the most times is 4 times (2^4=16).           information entropy uses BITS to measure the amount of information, calculated as follows:         -(P1*LOG2 p1+p2*log2 p2+ ... P16*LOG2 P16)----> calculates the information entropy value of the NBA playoff championship's championship, meaning that each Team's championship probability is multiplied by 2 for the base of the team to win the Log. P1, P2 ... PN indicates which team has the probability of winning the championship, assuming that each team wins the same probability, then the entropy value is 4, of course, this situation is unlikely to exist, because each team's strength is not the Same.           The greater the uncertainty of the variable, the greater the entropy Value.       (2) decision tree Induction algorithm (ID3)           This algorithm was invented by J.ross.quinlan in 1970-1980.           In the decision tree algorithm, the more important point is how we determine which attribute should be selected first, which property should be selected as the node of the Tree. Here is a new concept called Information acquisition, the formula is as follows:          Gain (A) =info (d)-info_a (d)----the value of the information acquisition of the >a attribute is equal to the amount The amount of information that is not categorized by any attribute plus the amount of information that is categorized by the attribute of a (note that the sign of the information here is the minus sign, so say "plus").           The example of whether or not to purchase a computer, gives 14 examples, as shown in:

 

          do not classify by any attribute, calculate information acquisition Info (D):           to classify with age attribute, calculate information acquisition Volume:            so, Gain (age) =0.940-0.694=0.246 bits          similarly, we can To figure out Gain (income) = 0.029, Gain (student) = 0.151, Gain (credit_rating) = 0.048. Compare size, age information Gets the most, so choose Age as the first root Node. again, the subsequent selection of nodes also determines which attribute to use as a node in this way.       (3) end condition           When we use recursive methods to create decision trees, it is critical to stop the creation of Nodes. In summary, the conditions for stopping node creation are as follows:          a, when all sample properties of a given node belong to the same tag, for example in (2), there are three branches under the node created with age attribute:                            senior,youth, Middle_age. Where all instances of middle_age are labeled yes, which means that the middle-aged will buy a computer, in which case the node can be set as a leaf node.           b, when no remaining attributes are used to further divide the sample, the creation of the node is stopped and a majority vote is Taken.           c, branching  3, other algorithms      c4.5, cart Algorithm. These algorithms are greedy algorithms, top-down, just choose the attributes of the measurement method is Different.  4, tree pruning leaves (avoid overfitting)       when the depth of the tree is too large, the design algorithm in the training set performance will be better, but the performance on the test set will be very general, then we will be a certain crop of trees:      (1) first pruning           When you get to a certain level, you don't grow trees down.       (2) after pruning           The tree is completely built, and the tree is cut according to the purity of the class. 5. The advantages of the decision tree:      intuitive, easy to understand, small-scale datasets effective       6. The disadvantage of decision tree:      processing continuous variable is not good; when the categories are more, the error increases faster; scale is general.

Decision tree algorithm for getting started with machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.