Decision tree--Summary of ID3 algorithm

Source: Internet
Author: User
Tags id3

The ID3 algorithm (iterative dichotomiser 3 iterative binary Tree 3) is an algorithm used by Ross Quinlan for decision Trees, and the simple theory is that the smaller decision trees are better than the larger decision trees.Algorithm Induction: 1. Use all unused attributes and calculate the associated sample entropy value; 2, select the attribute with the lowest entropy value 3, generate the node 4 containing the attribute, and use the new branch table to proceed to the previous step the ID3 algorithm is based on information theory, and the data entropy and information gain are used as the standard to realize the inductive classification ; So in the final analysis, it is an inductive way to generate decision trees from a bunch of data;Specific Introduction: 1, Information entropy: The concept of entropy mainly refers to the degree of confusion of the information, the greater the uncertainty of the variable, the greater the value of the entropy; if the full probability of event A is divided (A1, A2, A3, ..., an), the probability of each part occurring is (P1, P2, P3, ..., Pn), The information entropy formula can be as follows: info (A) = Entropy (P1, p2, ..., pn) =-p1 * LOG2 (p1)-P2 * log2 (p2)-...-pn * LOG2 (PN); 2, Information gain: Information gain refers to the change of entropy before and after the division; or in this sense: In a case, the information gain of the property value A of class S = class Information entropy info (S)-Information entropy of the attribute info (A); 3, a case will always have a class-oriented, can also be understood as the result, and the resulting statistical information may have a number of related properties, when we use the information gain calculation, and in the information gain of the multiple properties of the information gain of an attribute x is the maximum value, In fact, it also chooses the basis of the first branch from the root node in the decision tree; After finding the first classification node, if X has three branches x1,x2,x3, our next calculation is actually to divide the original table into three tables according to the three cases of X, then we can get the whole decision tree by repeated calculation of information gain. ; Example: http://www.cnblogs.com/zhangchaoyang/articles/2196631.htmlPros and cons: The advantages: The theory is clear, the method is simple, the disadvantage: detachment relatively small data set is effective, and the noise is more sensitive, when the training data set is enlarged, the decision tree may change with it;related: "Don't waste more things, do things that are less, and do the same thing," said Ames. , that is, if there are many theories of the same problem, each of which can make the same exact predictions, then the least of the jiading that is used in it should be chosen. Although the more complex methods are usually able to make the better language, the fewer assumptions are better without regard to language proficiency. Solomon's Inductive reasoning theory is the mathematical formula of the Ames Razor: In all computable theories that can perfectly describe existing observations, the shorter computable theory has a greater weight in the probability of the next observational result being solitary. Extra-Curricular: ID3 is also a metadata container abbreviation, many for the MP3 format of audio files, he can be related to the song name, singers, transfer, track number and other information stored in the MP3 file. ID3 is usually located at the beginning or the end of a MP3 file in a number of bytes, appended to the mp3 of the singer, title, album name, age, style and other information, this information is called ID3 information, ID3 information is divided into two versions. The V1 version of ID3 at the end of the mp3 file is 128 bytes, starting with the tag three characters followed by the previous area information. V2 version is generally located at the beginning of MP3, can store lyrics, the album's Pictures and other large-capacity information;

Decision tree--Summary of ID3 algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.