Decision Tree algorithm

Source: Internet
Author: User
Tags id3

FM 96.8 There is a game: in the game, the writer writes down one thing, others need to guess what this thing is. Of course, if the rules of the game are just that, it's almost impossible to guess, because the problem is too big. In order to reduce the difficulty of the game, the answer can be asked questions to the question, and the subject must be accurately answered yes or no, the answer according to the answers to ask the next question, if you can determine the number of times within the answer, that is to win.

Let's experiment, now that I've written an object, and you and I have the following record:

    • Is it a man? Y

    • Are they Asian? Y

    • Is it Chinese? N

    • Are they Indians? Y

    • ......

In the above game, we pointed out the question, each problem can reduce the scope of our answers, in the question and the respondents have the same knowledge background premise, the difficulty of the answer is much smaller than we think.

In each node, according to the answer to the question, you can divide the answer into two branches, the left branch represents yes, the right branch represents the no, although to simplify, we only draw one of the paths, but it can be clearly seen that this is a tree structure, this is the decision tree prototype.

Decision Tree Algorithm

The samples we face often have many characteristics, so the so-called judgments of things cannot be judged only from an angle, how do they combine different characteristics? The idea of a decision tree algorithm is to start with a feature, as in our game above, since it is not possible to classify directly, it is based on a feature classification, although the classification results do not reach the ideal effect, but through this classification, our problem size has become smaller, The subsets are more easily categorized than the original sample set. This process is then repeated for the subset of samples that were last categorized. In an ideal situation, after a multi-layered decision classification, we will get a completely pure subset, that is, each sub-set of samples belong to the same classification. By this classification process to form a tree-shaped decision model, each non-leaf node of the tree is a feature segmentation point, leaf node is the final decision classification.

Above, we introduce the idea of decision tree algorithm, can be summarized as follows two points:

    • Categorize a sample set each time you select one of the features

    • Step 1 for recursive subset of categories

In the first step, we need to consider one of the most important strategy is to choose what characteristics can achieve the best classification effect, and the so-called classification effect is good or bad, it must also need a evaluation of the indicators.

Intuitively speaking, the collection of the sample belongs to the category of comparative concentration, the most ideal is that the samples belong to the same classification. The purity of the sample set can be measured by entropy.

In the information theory, entropy represents the degree of chaos in a system, and the greater the entropy, the lower the purity of our data set, the entropy is 0 when our datasets are the same category, and the entropy is calculated as follows:

where P (xi) represents the probability, B takes 2 here. For example, when tossing a coin, the positive probability is 1/2, the negative probability is also 1/2, then the entropy of the process is:

It can be seen that because the coin toss is a completely random event, its result is equal probability of the positive and negative, so it has high entropy.

If we observe the direction of the final flight of the coin, then the probability of the last drop of the coin is 1, the probability of flying to the sky is 0, into the formula above, you can get the entropy of this process is 0, so, the smaller the entropy, the greater the predictability of the result. In the decision tree generation process, our goal is to divide the subset of its minimum entropy, so that the subsequent iteration, it is easier to classify it.

Since it is a recursive process, you need to make a recursive stop rule .

In both cases we stop dividing the subset further, one is that the division has reached the desired effect, and the other is that the further division has little to no avail.

The termination conditions are summarized in the following terms:

    1. The entropy of a subset reaches the threshold value

    2. Subset size is small enough

    3. Further partitioning of the gain is less than the threshold value

wherein, the gain in Condition 3 represents a division to the data purity of the promotion effect, that is, after the division, the more entropy reduction, the greater the gain, then this division is more valuable, the formula for the gain is as follows:

The above formula can be understood as: Calculate the division after the entropy of two subsets and relative division before the entropy reduced how much, it should be noted that the sum of the entropy of the calculation subset needs to multiply the weights of each subset, the weight of the calculation method is a subset of the size of the pre-segmentation of the parent set of the proportion, such as the pre- The sizes are M and N, the entropy is E1 and E2 respectively, then the gain is e-m/(M + N) * e1-n/(M + N) * E2.

implementation of decision tree algorithm

With these concepts, we can begin the training of decision trees and the training process is divided into:

    1. Select a feature to split a sample set

    2. Calculate the gain, if the gain is large enough, the segmented sample set as a child node of the decision tree, otherwise stop splitting

    3. Recursive execution of the previous two steps

The above steps are based on the ID3 algorithm (feature selection and splitting according to the information gain), in addition to the decision tree algorithms such as C4.5 and cart.

Class DecisionTree (object):    def fit (self, X, y):        # Generate decision tree based on input sample        self.root = Self._build_tree (X, y)     def _ Build_tree (self, X, Y, current_depth=0):        #1. Select the best segmentation feature to generate the left and right node        #2. Generates          a subtree def predict_value (self, x, for left and right node recursion) Tree=none):        # enters the input sample into the decision tree, from top to bottom to determine        # reach the leaf node is the predicted value

In the above code, the key to implement the decision tree is the recursive construction of the subtree process, in order to achieve this process, we need to do three things: the definition of nodes , the best Segmentation Feature Selection , recursive generation of sub-tree .

Zppenny
Links: http://www.jianshu.com/p/c4d0837e9439
Source: Pinterest
Summarize

Decision tree is a simple and common classifier , which can be efficiently classified by well-trained decision tree.

The decision tree model is of good readability and descriptive, and it is helpful to assist the artificial analysis.

Decision tree Classification is efficient, can be reused once built, and the number of calculations per prediction does not exceed the depth of the decision tree.

The decision tree also has its drawbacks:

For continuous characteristics, it is more difficult to deal with.

For the multi-classification problem, the computational quantity and the accuracy rate are not ideal.

In the actual application, because its bottom leaf node is generated by a single rule in the parent node, it is easy to cheat the classifier by manually modifying the sample characteristics, for example, in the volley message recognition system, the user may cheat the spam identification system by modifying a key feature. From the implementation point of view, because the tree generation uses recursion, as the size of the sample increases, the computational amount and memory consumption will become more and more large.

Overfitting is also a problem for decision trees, fully trained decision trees (no paper cuts, no limits on gain thresholds) can 100% accurately predict training samples, because it is a complete fitting of the training samples, but for the sample other than the training sample, the prediction effect may not be ideal, this is overfitting.

To solve the cross-fitting of decision trees, in addition to the above-mentioned thresholds of setting gain as a stop condition, the decision tree is usually pruned, and the common pruning strategies are

1.Pessimistic Error Pruning: Pessimistic false pruning

2.Minimum Error Pruning: Minimum error pruning

3.cost-complexity Pruning: Cost Complex pruning

4. Error-based pruning: Based on the error pruning, that is, each node is tested with a set of test data sets, if after splitting, can reduce the error rate, and then continue to split into two subtrees tree, or directly as a leaf node.

5. Critical Value pruning: Pruning of key values, which is the threshold value of the setting gain mentioned above as the stop condition.

The simplest way to demonstrate the implementation of the ID3 decision tree is to refer to this link if you want to understand the differences between different types of decision trees.
In addition, for the implementation of various machine learning algorithms, it is highly recommended to refer to the GitHub warehouse Ml-from-scratch, after downloading the code, pip install -r requirements.txt you can run the code by installing a dependent library.

Decision Tree algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.