ID3: The numerical data can not be processed directly, but it is possible to quantify the numerical data processing Cheng the data, but it involves too many feature divisions and does not recommend
Decision Tree: The biggest advantage is that it can give the intrinsic meaning of data, the data form is very easy to understand;
Decision Tree Description: Decision tree classifier is a flowchart with planting, terminating block indicates classification result
Advantages: The computational complexity is not high, the output is easy to understand, the missing sense of the middle value, can deal with irrelevant data; this classifier can be stored on the hard disk, it is a persistent classifier
Cons: An over-matching problem may occur
Working with Data types: numeric and nominal
KNN: Not easy to show the intrinsic meaning of the data; Learn every time you use it, not the persistence classifier
Concept Introduction:
Information Gain, entropy:
Definition of information:
Entropy definition: Entropy is the sum of expected value of information gain = maximum information gain, entropy is the representation of data inconsistency
* (extended reading) Gini purity: Randomly select items from a dataset to measure their probability of being incorrectly assigned to other groups
Decision Tree Process
1, Collect data: can use any method
2, prepare the data: The construction algorithm is only applicable to nominal-type data, so the numerical data needs to be discrete
3, analysis data: can use any method, constructs the book to complete, we should check whether the graph conforms to the anticipation
·· Data Set Partitioning:
Measure the data set, measure the entropy of the data set, judge whether the current data set is correctly divided, imagine a two-bit spatial scatter plot, and apply the line to divide
Partitioning operations: Create a new List object, extract the data that meets the requirements
·· Choose the best data set:
* Create a unique category label list
* Calculate information entropy for each partitioning method
* Calculate the best information gain
·· Recursive decision Tree:
* Cyclic call partitioning function
* Set up the ending point: the number of maximum groupings that can be divided; automatically cycle to the group number invariant state; if it does not stop, use the majority voting method to determine the classification of leaf nodes.
The categories are identical; the most frequently returned when all features are finished; Get list contains all attributes
* Call Matplob construct diagram (arrow Flip, data point digital display, coloring)
Define text box and arrow formatting
Receipts with arrows for comments
* Construct Note tree
* Test node's data type dictionary
* Fill the text between parent and child nodes your information
* Calculation width and height
* Tag child node attribute values
* Reduce y Offset
4, the test algorithm: the use of experience to calculate the accuracy rate
Test and storage classifier
* Test algorithm: Use decision tree to perform classification: Convert tag string to index
* Convenient cabinet whole tree, compare the values in variables with the value of the tree node, if the leaf node is reached, the current category label is returned
5, using the algorithm: Decision tree Storage (this step can be applied to any supervised learning algorithms, but using decision trees to better understand the intrinsic meaning of the data)
Decision tree Pseudo-code:
To create a branch pseudo-code function creatbranch ()
To detect whether each subkey in the dataset belongs to the same category
If so return class label;
Else
Finding the best features for dividing data sets
Partitioning data sets
To create a branch node
For each subset of partitions
Call the function creatbranch and increase the return result to the branch node
Return Branch Node
Example: Predicting contact lens types using decision Trees
1. Collection of data: text files provided
2. Preparing Data: Parsing tab-separated data rows
3. Analyze data: Quickly check data, ensure correct parsing of data content, use Createplot () function Receipts final tree diagram
4. Training algorithm: Using Createtree function
5, test algorithm: Write test function Validation decision tree can correctly classify a given data instance
6. Using algorithms: Store data structures so that the next time you do not need to refactor the decision tree
Decision Tree-ID3