Decision Tree Algorithm (v)--dealing with some special classifications

Source: Internet
Author: User

In the previous decision tree algorithm, we have explained the function module of constructing decision tree algorithm from data set.

The first is to create a dataset, then calculate Shannon Entropy, and then divide the dataset based on the best attribute values, because the eigenvalues may be more than two, so there may be a dataset partition that is larger than two branches. After the first partition, the data is passed down to a node in the tree branch, where we can partition the data again, so we can use recursive principles to process the data set.

The end condition for recursion is to iterate through the properties of all the partitioned datasets, or all instances under each branch have the same classification. If all instances have the same classification, then a leaf node or a terminating block is obtained.

According to the characteristics of the attribute, we know that each division of the classification will consume a lattice of eigenvalues, if we use all the features but the category has not been divided then we use the majority of voting method to determine the leaf node.

For example, after we have used all the eigenvalues, the final data set is the following:
[[' Yes '],[' yes '],[' maybe ']

But now that we have no eigenvalues, we can't calculate the maximum information gain using the Shannon entropy method, which is then sorted by voting.

So we found out that our data set ' Yes ' is two, ' maybe ' is one so we'll just follow this to separate them.
Let's complete the code here:

DefMajoritycnt(classlist):# The parameter passed in is the remaining data set after all the features have been divided.#例如 [[' Yes '],[' yes '],[' maybe '] classcount={}  #参数是已经划分完所有特征之后剩余的数据集,  #例如 [[' Yes '],[' yes '],[' maybe '] classcount={}  #创建一个字典 for vote in Classlist: if vote not in Classcount.keys (): classcount[vote] = 0 classcount[vote] + = 1 Span class= "Hljs-comment" ># according to the above statement, as well as our example, we can finally get the following results: {' yes ': 2, ' Maybe ': 1} sortedclasscount = sorted ( Classcount.iteritems (), Key=operator.itemgetter (1), reverse= true)  #这个语句比较复杂, we'll explain in detail below. # use dictionary iteritems return sortedclasscount[ 0] [0]              

Let's analyze the more complex code in this code:

sorted(classCount.iteritems(), key=operator.itemgetter(1),reverse=True)

Here we use the Iteritems () function to get all the elements in our dictionary, which is a set of key-value pairs.
Then we define a function called key, which can be arbitrarily taken, as everyone defines it, and then we sort the elements in our dictionary by itemgetter this function. Operator.itemgetter (1) indicates how much the classification appears by the second of the elements. Each element of our dictionary is made up of two parts, that is, sorted by value, and reverse=true is sorted in descending order.

SORTEDCLASSCOUNT[0][0] is the class that represents the element with the most number of categories.

Summarize

For some datasets that have already used all of the features, we can not clearly separate some of the classes, and we will divide the category with the highest number of statistics.

Either do it or do it the best you can.

Decision Tree Algorithm (v)--dealing with some special classifications

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.