Machine Learning notes-----ID3 algorithm for Python combat

Last Update:2016-09-09 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article affirms that: the original Chong , if you have Reprint Please declaration. the numbers are from the "beauty"Peter . Harrington wrote "MachineLearning in action" this Books , Invasion and deletion .

Hello, and we met again, today in a surprisingly good mood, do not know why. Is good ... 10,000 words omitted here ... The last time I talked to you about the theory part of the decision tree, we'll do it today. Help an ophthalmologist make a system that lets the system learn to give a suggestion to the user who needs the invisible eye, so that the user can know what kind of eye they are fit for. The system first learns from the data.

One: Calculate the Shannon entropy of a given data set

We all remember the formula for the gain of information in our last lecture: first we ask H (d), H (d) for the empirical entropy of data D, the formula is:. The code for this formula is as follows:

The higher the entropy, the more data is mixed. Vice versa. Once again, we recommend the Wu of the Great God of "data beauty". By code discovery, language is really just a tool. JAVA Python is our slave. Isn't that right? So we don't need to be afraid of our slaves, we just have to know him and conquer him.

Two: Dividing data sets

If a demon catches your goddess. The Devil to give you a problem, let you contain black beans white beans red beans three kinds of beans according to different colors, white and white together, black together, red together. This is not very simple, actually partitioning the data set is so simple. Look at a feature item in the data, and then put together the same item in the item and separate it. This is the partitioning of the data set. The code is as follows:

Three: Choose the best way to divide the data set

You divide the data set, but you do not know whether the data set you divide is the best division, we all know that the core part of the ID3 algorithm is based on the information gain to judge this division is not good. Once you've scratched the first one, you're ready to go. The code is as follows:

In fact, the above 123 part is to ask us the information gain ratio formula. So the next step is to construct a decision tree and then cut out the extra branches, right? Haha, listen to the simple, in fact, it is very simple.

Four: Building a decision Tree 1: Majority vote

In fact, sometimes the number of features is not reduced every time the data is divided, so we have to calculate the number of columns before the algorithm starts running, let us know if the algorithm uses all the attributes. If the dataset has processed all the properties. But the class label is still not unique, so we need to decide how to define the leaf stage. So how do we define the leaf nodes?

Think about this is not a small classification problem, since it is a classification problem, then we can not use the last one said the KNN algorithm in the majority of voting methods. (Most of the voting methods are like our democratic vote, Celestial, you know.) I am quite sympathetic to the democratic election in the United States.

Most voting codes are as follows:

The KNN code is as follows:

We can compare, is not very similar.

2: Creating achievements

As we have said above, after the first decision, we simply need to invoke the decision function recursively.

The two criteria for the end of recursion are:

1: All class tags are exactly the same, return the class label (this is not nonsense, all the same, the class of the hair)

2: Using all the groupings or not dividing the dataset into groups that contain only unique categories, since we cannot return a unique one, then we are represented by a wave. Is our majority voting mechanism above, returning the category with the most occurrences. This is not the NPC,.

The code is as follows:

People can not understand the private talk about me, I help you answer.

Now let's test the test results as follows:

Gee, it seems to be very useful to look ...

Five Predicting contact lens types using decision Trees

( 1 ) Collect Data

( 2 ) Prepare the data

( 3 ) Analyze Data

( 4 ) Training Data

( 5 ) test Data

( 6 ) using the algorithm

These six steps are the six steps we must take to study machine learning, and we must remember.

Test data results for contact lenses:

Link: Http://pan.baidu.com/s/1bpolbBL Password: MZJJ This is the source code for this experiment. Suitable for the 2.7 version of the Python environment, please use the 3.x children's shoes, modified according to the new features.

Machine Learning notes-----ID3 algorithm for Python combat

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More