Machine Learning notes-----ID3 algorithm for Python combat

Source: Internet
Author: User
Tags id3

This article affirms that: the original Chong , if you have Reprint Please declaration. the numbers are from the "beauty"Peter . Harrington wrote "MachineLearning in action" this Books , Invasion and deletion .

Hello, and we met again, today in a surprisingly good mood, do not know why. Is good ... 10,000 words omitted here ... The last time I talked to you about the theory part of the decision tree, we'll do it today. Help an ophthalmologist make a system that lets the system learn to give a suggestion to the user who needs the invisible eye, so that the user can know what kind of eye they are fit for. The system first learns from the data.

One: Calculate the Shannon entropy of a given data set

We all remember the formula for the gain of information in our last lecture: first we ask H (d), H (d) for the empirical entropy of data D, the formula is:. The code for this formula is as follows:

The higher the entropy, the more data is mixed. Vice versa. Once again, we recommend the Wu of the Great God of "data beauty". By code discovery, language is really just a tool. JAVA Python is our slave. Isn't that right? So we don't need to be afraid of our slaves, we just have to know him and conquer him.

Two: Dividing data sets

If a demon catches your goddess. The Devil to give you a problem, let you contain black beans white beans red beans three kinds of beans according to different colors, white and white together, black together, red together. This is not very simple, actually partitioning the data set is so simple. Look at a feature item in the data, and then put together the same item in the item and separate it. This is the partitioning of the data set. The code is as follows:

Three: Choose the best way to divide the data set

You divide the data set, but you do not know whether the data set you divide is the best division, we all know that the core part of the ID3 algorithm is based on the information gain to judge this division is not good. Once you've scratched the first one, you're ready to go. The code is as follows:

In fact, the above 123 part is to ask us the information gain ratio formula. So the next step is to construct a decision tree and then cut out the extra branches, right? Haha, listen to the simple, in fact, it is very simple.

Four: Building a decision Tree 1: Majority vote

In fact, sometimes the number of features is not reduced every time the data is divided, so we have to calculate the number of columns before the algorithm starts running, let us know if the algorithm uses all the attributes. If the dataset has processed all the properties. But the class label is still not unique, so we need to decide how to define the leaf stage. So how do we define the leaf nodes?

Think about this is not a small classification problem, since it is a classification problem, then we can not use the last one said the KNN algorithm in the majority of voting methods. (Most of the voting methods are like our democratic vote, Celestial, you know.) I am quite sympathetic to the democratic election in the United States.

Most voting codes are as follows:

The KNN code is as follows:

We can compare, is not very similar.

2: Creating achievements

As we have said above, after the first decision, we simply need to invoke the decision function recursively.

The two criteria for the end of recursion are:

1: All class tags are exactly the same, return the class label (this is not nonsense, all the same, the class of the hair)

2: Using all the groupings or not dividing the dataset into groups that contain only unique categories, since we cannot return a unique one, then we are represented by a wave. Is our majority voting mechanism above, returning the category with the most occurrences. This is not the NPC,.

The code is as follows:

People can not understand the private talk about me, I help you answer.

Now let's test the test results as follows:

Gee, it seems to be very useful to look ...

Five Predicting contact lens types using decision Trees

( 1 ) Collect Data

( 2 ) Prepare the data

( 3 ) Analyze Data

( 4 ) Training Data

( 5 ) test Data

( 6 ) using the algorithm

These six steps are the six steps we must take to study machine learning, and we must remember.

Test data results for contact lenses:

Link: Http://pan.baidu.com/s/1bpolbBL Password: MZJJ This is the source code for this experiment. Suitable for the 2.7 version of the Python environment, please use the 3.x children's shoes, modified according to the new features.

Machine Learning notes-----ID3 algorithm for Python combat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.