This article affirms that: the original Chong , if you have Reprint Please declaration. the numbers are from the "beauty"Peter . Harrington wrote "MachineLearning in action" this Books , Invasion and deletion .
Hello, and we met again, today in a surprisingly good mood, do not know why. Is good ... 10,000 words omitted here ... The last time I talked to you about the theory part of the decision tree, we'll do it today. Help an ophthalmologist make a system that lets the system learn to give a suggestion to the user who needs the invisible eye, so that the user can know what kind of eye they are fit for. The system first learns from the data.
One: Calculate the Shannon entropy of a given data set
We all remember the formula for the gain of information in our last lecture: first we ask H (d), H (d) for the empirical entropy of data D, the formula is:. The code for this formula is as follows:
The higher the entropy, the more data is mixed. Vice versa. Once again, we recommend the Wu of the Great God of "data beauty". By code discovery, language is really just a tool. JAVA Python is our slave. Isn't that right? So we don't need to be afraid of our slaves, we just have to know him and conquer him.
Two: Dividing data sets
If a demon catches your goddess. The Devil to give you a problem, let you contain black beans white beans red beans three kinds of beans according to different colors, white and white together, black together, red together. This is not very simple, actually partitioning the data set is so simple. Look at a feature item in the data, and then put together the same item in the item and separate it. This is the partitioning of the data set. The code is as follows:
Three: Choose the best way to divide the data set
You divide the data set, but you do not know whether the data set you divide is the best division, we all know that the core part of the ID3 algorithm is based on the information gain to judge this division is not good. Once you've scratched the first one, you're ready to go. The code is as follows:
In fact, the above 123 part is to ask us the information gain ratio formula. So the next step is to construct a decision tree and then cut out the extra branches, right? Haha, listen to the simple, in fact, it is very simple.
Four: Building a decision Tree 1: Majority vote
In fact, sometimes the number of features is not reduced every time the data is divided, so we have to calculate the number of columns before the algorithm starts running, let us know if the algorithm uses all the attributes. If the dataset has processed all the properties. But the class label is still not unique, so we need to decide how to define the leaf stage. So how do we define the leaf nodes?
Think about this is not a small classification problem, since it is a classification problem, then we can not use the last one said the KNN algorithm in the majority of voting methods. (Most of the voting methods are like our democratic vote, Celestial, you know.) I am quite sympathetic to the democratic election in the United States.
Most voting codes are as follows:
The KNN code is as follows:
We can compare, is not very similar.
2: Creating achievements
As we have said above, after the first decision, we simply need to invoke the decision function recursively.
The two criteria for the end of recursion are:
1: All class tags are exactly the same, return the class label (this is not nonsense, all the same, the class of the hair)
2: Using all the groupings or not dividing the dataset into groups that contain only unique categories, since we cannot return a unique one, then we are represented by a wave. Is our majority voting mechanism above, returning the category with the most occurrences. This is not the NPC,.
The code is as follows:
People can not understand the private talk about me, I help you answer.
Now let's test the test results as follows:
Gee, it seems to be very useful to look ...
Five Predicting contact lens types using decision Trees
( 1 ) Collect Data
( 2 ) Prepare the data
( 3 ) Analyze Data
( 4 ) Training Data
( 5 ) test Data
( 6 ) using the algorithm
These six steps are the six steps we must take to study machine learning, and we must remember.
Test data results for contact lenses:
Link: Http://pan.baidu.com/s/1bpolbBL Password: MZJJ This is the source code for this experiment. Suitable for the 2.7 version of the Python environment, please use the 3.x children's shoes, modified according to the new features.
Machine Learning notes-----ID3 algorithm for Python combat