Decision Tree-ID3

Source: Internet
Author: User
Tags id3

ID3: The numerical data can not be processed directly, but it is possible to quantify the numerical data processing Cheng the data, but it involves too many feature divisions and does not recommend

Decision Tree: The biggest advantage is that it can give the intrinsic meaning of data, the data form is very easy to understand;

Decision Tree Description: Decision tree classifier is a flowchart with planting, terminating block indicates classification result

Advantages: The computational complexity is not high, the output is easy to understand, the missing sense of the middle value, can deal with irrelevant data; this classifier can be stored on the hard disk, it is a persistent classifier

Cons: An over-matching problem may occur

Working with Data types: numeric and nominal

KNN: Not easy to show the intrinsic meaning of the data; Learn every time you use it, not the persistence classifier

Concept Introduction:

Information Gain, entropy:

Definition of information:

Entropy definition: Entropy is the sum of expected value of information gain = maximum information gain, entropy is the representation of data inconsistency

* (extended reading) Gini purity: Randomly select items from a dataset to measure their probability of being incorrectly assigned to other groups

Decision Tree Process

1, Collect data: can use any method

2, prepare the data: The construction algorithm is only applicable to nominal-type data, so the numerical data needs to be discrete

3, analysis data: can use any method, constructs the book to complete, we should check whether the graph conforms to the anticipation

·· Data Set Partitioning:

Measure the data set, measure the entropy of the data set, judge whether the current data set is correctly divided, imagine a two-bit spatial scatter plot, and apply the line to divide

Partitioning operations: Create a new List object, extract the data that meets the requirements

·· Choose the best data set:

* Create a unique category label list

* Calculate information entropy for each partitioning method

* Calculate the best information gain

·· Recursive decision Tree:

* Cyclic call partitioning function

* Set up the ending point: the number of maximum groupings that can be divided; automatically cycle to the group number invariant state; if it does not stop, use the majority voting method to determine the classification of leaf nodes.

The categories are identical; the most frequently returned when all features are finished; Get list contains all attributes

* Call Matplob construct diagram (arrow Flip, data point digital display, coloring)

Define text box and arrow formatting

Receipts with arrows for comments

* Construct Note tree

* Test node's data type dictionary

* Fill the text between parent and child nodes your information

* Calculation width and height

* Tag child node attribute values

* Reduce y Offset

4, the test algorithm: the use of experience to calculate the accuracy rate

Test and storage classifier

* Test algorithm: Use decision tree to perform classification: Convert tag string to index

* Convenient cabinet whole tree, compare the values in variables with the value of the tree node, if the leaf node is reached, the current category label is returned

5, using the algorithm: Decision tree Storage (this step can be applied to any supervised learning algorithms, but using decision trees to better understand the intrinsic meaning of the data)

      

Decision tree Pseudo-code:

To create a branch pseudo-code function creatbranch ()

To detect whether each subkey in the dataset belongs to the same category

If so return class label;

Else

Finding the best features for dividing data sets

Partitioning data sets

To create a branch node

For each subset of partitions

Call the function creatbranch and increase the return result to the branch node

Return Branch Node

Example: Predicting contact lens types using decision Trees

1. Collection of data: text files provided

2. Preparing Data: Parsing tab-separated data rows

3. Analyze data: Quickly check data, ensure correct parsing of data content, use Createplot () function Receipts final tree diagram

4. Training algorithm: Using Createtree function

5, test algorithm: Write test function Validation decision tree can correctly classify a given data instance

6. Using algorithms: Store data structures so that the next time you do not need to refactor the decision tree

Decision Tree-ID3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.