Machine learning Algorithms Interview-Dictation (4): Decision Tree

Source: Internet
Author: User
Tags id3

This series is to deal with the job interview when the interviewer asked the algorithm, so just also thanks to the brief introduction of the algorithm, the latter will be supplemented in the

Algorithm of Common face problems!

Decision tree is a kind of tree based on the choice of strategy, is a kind of prediction tree based on classification and training, predicts and classifies the future according to the known.

The establishment of decision tree is the process of classifying data by using the characteristics of data constantly, the main problem is how to choose the characteristics of Division;

Several decision tree algorithms are commonly used, such as ID3, C4.5, cart and so on, in which ID3 uses the method of information entropy gain selection to divide the data, C4.5 is to divide the data using the method of gain ratio selection, and the cart uses the partition method of the Gini index selection;

ID3:

The algorithm is based on information theory, which is used to measure the entropy and the information gain, and to realize the classification of the data. The ID3 algorithm tends to favor the selection of properties with more values, and in many cases the most important attributes are not always the most valuable. Moreover , the ID3 algorithm cannot process attributes with continuous values, nor can it handle properties with missing data.

How can you see an example: http://wenku.baidu.com/link?url=v_- Eh4p8uvav93xt2mkulbdvt1k1b9khzna1hjob1fx0mntdalylnqs4chlz5nervttrg7v60rzpggzuzk26gyocfyxblizhz7vjdfqjfhe

C4.5:

Using the method of dividing the gain rate is an improvement of ID3, which has high accuracy and can deal with continuous attributes. Pruning during the construction of the tree, using pessimistic pruning methods (using the error rate to evaluate)! In the process of constructing a tree, multiple sequential scans and sorts of trees are required, so the efficiency is low and C4.5 only applies to datasets that can be stuck in memory.

How can you see an example:http://blog.csdn.net/xuxurui007/article/details/18045943

For tree Pruning, refer to: http://blog.csdn.net/woshizhouxiang/article/details/17679015

CART:

use of the Gini index classification criteria; by minimizing the degree of impurity at each step, the cart can handle the outliers and be able to handle the vacancy values.

The termination condition of the tree partition: 1, the node achieves the complete purity; 2, the depth of the tree reaches the depth of the user

3, the number of samples in the node belongs to the user specified number;

Pruning method of tree is a pruning method of cost complexity;

See details: http://blog.csdn.net/tianguokaka/article/details/9018933


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Machine learning Algorithms Interview-Dictation (4): Decision Tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.