HJR-ID3 Decision Tree Algorithm

Source: Internet
Author: User
Tags id3

Information Entropy Purpose Step

Information Entropy

The more information you know, the smaller the entropy, the less you know, the greater the entropy, or the more unexpected the more uncertain information entropy. Purpose

The basic idea of constructing decision trees is that the entropy of nodes decreases rapidly with the increase of tree depth. The faster the entropy decreases, the better it is, and hopefully we'll get a decision tree of the shortest height possible. Steps

1, according to the classification results known in the training data, the prior probability is obtained and the classified information entropy is obtained through a priori probability.
For example, scissors I have 3 times the Stone 2 times the cloth, then 3/5 and 2/5 is the pre-test probability, the pre-test probability into E (classification) =-∑p (xi) log (2,p (xi)) (i=1,2,.. N) calculates the entropy of information.
For example, if you want to predict what I'm going to do, your current uncertainty is the entropy of information you just got, so the uncertainty or entropy of information is, of course, as small as possible, and the decision tree is constructed to reduce entropy.
2, in order to reduce the entropy, first of all attribute information entropy, first find out
{
E (attribute a) =
Cumulative (0 to J) count (Aj)/count (A) * (-(P1J*LOG2 (p1j) +p2j*log2 (p2j) +p3j*log2 (p3j)))

N= Split Interval number
count (AJ) = number of occurrences of attribute A in the J split interval
count (a) = attribute a total number of records (number of training data)
P1j =count (a1j)/count (AJ): Attribute A J split interval and is the number of 1 in the division of attribute A in the number of split interval of section J
p2j =count (a2j)/count (Aj): attribute A J Division interval and is the number of classification 2 for attribute A in the division of the number of section J divide
p3j = Count (a3j)/count (Aj): Attribute A J split interval and is the number of 3 in the division of attribute A in the number of divisions of the J partition
}
the same as the attribute b,c,d ...
3, the classification information entropy and each attribute information entropy to subtract the information gain, information gain large attribute as the root, that is, the attribute as the root of the system will make the total information entropy of the fastest descent, the attribute of the split interval as a branch, the next need to determine the remaining I-1 properties on which branch.
4, according to the NI division interval (Number of i= division interval), statistical training data, the root attribute is in the N1 division interval record, repeat the previous 1,2,3,4 steps, the remaining properties of the information gain the largest attribute as a N1 node, and then statistical training data, the root attribute in the N2 division of the record , the method does not change, find out n2,n2 ... Node.
5, each node with its split interval as a branch, statistical training data, while satisfying the tree root branches and node branches of the records, and then remove the root attribute and the properties of the node outside the attribute to seek the maximum information gain as the next layer of nodes, the other branches of the same, the second layer of the rest of the same same, Until there is only one record in the training data that satisfies a branch from the tree root to a line, the classification result of this record is the leaf of the branch on that line.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.