English Name: Decision Tree
Decision tree is a typical classification method, first processing the data, using the inductive algorithm to generate readable rules and decision trees, and then using the decision to analyze the new data. In essence, decision trees are the process of classifying data through a series of rules.
Decision Tree is a supervised learning method, which is mainly used for classification and regression. The goal of the algorithm is to create a model that predicts the target variable by inferring the data characteristics and learning the decision rules.
The decision tree is similar to the IF-ELSE structure, and the result is that you want to generate a tree that can judge the leaf nodes from the root. But the If-else judging condition here is not set manually, but the computer is automatically generated according to the algorithm we provide.
Decision Tree Composition elements
is the choice of several possible options, the best option for the last choice. If the decision is a multi-level decision, there can be multiple decision points in the middle of the decision tree, which is the final decision-making point.
The economic effect (expectation) of the alternative is represented by the comparison of the economic effects of each state node, and the best scheme can be chosen according to certain decision criteria. A branch derived from a state node is called a probability branch, and the number of probability branches indicates the probability that the state appears on each branch by indicating the number of natural states that may appear.
Mark the profit and loss of each scenario in a variety of natural states at the right end of the result node
Advantages and disadvantages of decision tree groups
Decision Tree Advantages
Easy to understand, the principle is clear, the decision tree can realize the visualization
The inference process is easy to understand, and the decision inference process can be expressed as if-else form
The inference process is completely dependent on the property variables ' value characteristics
can automatically ignore the target variable does not contribute to the attribute variables, but also to determine the importance of attribute variables, reduce the number of variables to provide a reference
Decision Tree Disadvantage
It is possible to create overly complex rules that are overfitting.
Decision trees are sometimes unstable, because small changes in data can result in a completely different decision tree.
Learning the optimal decision tree is a NP-complete problem. Therefore, the actual decision tree Learning algorithm is based on the heuristic algorithm, such as the greedy algorithm to achieve local optimal value at each node. Such an algorithm is not guaranteed to return a global optimal decision tree. You can mitigate this problem by randomly selecting features and samples to train multiple decision trees.
Some problems are difficult to learn because decision trees are difficult to express. such as: XOR problem, parity, or multiplexer issues
If some factors dominate, the decision tree is biased. Therefore, it is recommended to balance the influence factors of the data before fitting the decision tree.
Common algorithms for Decision trees
There are many algorithms for decision tree, such as cart, ID3, C4.5, C5.0, and so on, ID3, C4.5 and C5.0 are all based on information entropy, and cart is similar to Entropy index as classification decision, and then pruning after forming decision tree.
Entropy (Entropy): The degree of clutter in the system
ID3 algorithm
ID3 algorithm is a classification decision tree algorithm. Through a series of rules, he classifies the data into the form of decision trees, which are based on entropy.
ID3 algorithm is a kind of classical decision tree Learning algorithm, which is proposed by Quinlan. The basic idea of the ID3 algorithm is to use information entropy as a measure to select the attribute of the decision tree node, and to select the most informative attribute each time, that is, to make the entropy change to the smallest attribute, in order to construct a decision tree with the lowest entropy value, the entropy value at the leaf node is 0. At this point, the instances in the instance set corresponding to each leaf node belong to the same class.
Through the ID3 algorithm to realize the customer churn of the early warning analysis, to find out the characteristics of customer churn to help telecom companies targeted to improve customer relations, to avoid customer churn
Using decision tree method for data mining, there are generally the following steps: Data preprocessing, decision tree mining operation, pattern evaluation and application.
C4.5 algorithm
C4.5 is a further extension of the ID3, which removes the restriction of features by discretization of continuous attributes. C4.5 converts the training tree to a series of if-then syntax rules. The accuracy of these rules can be determined to determine which should be adopted. If a rule is removed, the accuracy can be improved, then pruning should be practiced.
C4.5 and ID3 in the core of the algorithm is the same, but a little bit of the approach is different, C4.5 using the information gain rate as the basis of the Division, overcome the ID3 algorithm using information gain division caused the attribute selection bias to take the value of many properties.
C5.0 algorithm
C5.0 uses smaller memory than C4.5, creating smaller decision rules that are more accurate.
Cart algorithm
Classification and regression trees (cart--classification and Regression tree) are a very interesting and effective nonparametric classification and regression method. It achieves the prediction goal by constructing the binary tree. The classification and regression tree cart model was first proposed by Breiman and others, and it has been widely used in statistical field and data mining technology. It constructs predictive criteria in a completely different way from traditional statistics, which is given in the form of a two-fork tree, which is easy to understand, use, and interpret. In many cases, the prediction tree constructed by the cart model is more accurate than the usual statistical methods, and the more complex the data and the more variables, the more obvious the superiority of the algorithm. The key to the model is the construction of the prediction criteria, accurate. Definition: Classification and regression Firstly, the prediction criterion is constructed by using the known multivariable data, and a variable is predicted according to the other variable values. In the classification, people tend to measure a certain object first, then use certain classification criteria to determine the category of the object belonging to it. For example, given the identification characteristics of a fossil, it is predicted that the fossil belongs to that branch, that genus, or even that one. Another example is the availability of geological and geochemical information in a given area to predict whether there is a mine in the area. Regression is different from the classification, it is used to predict the object of a certain value, rather than the classification of the object. For example, given a region's characteristics of mineral resources, predict the amount of resources in the area.
The cart and C4.5 are similar, but it supports the target variable (regression) of the numeric value and does not produce a decision rule. The cart uses features and thresholds to obtain the maximum information gain on each node to build the decision tree.
Scikit-learn is using the CART algorithm
Example code:
#! /usr/bin/env python#-*-coding:utf-8-*-from sklearn import treeimport numpy as np# scikit-learn the decision tree algorithm used is cartx = [[0,0],[1 , 1]]y = ["A", "B"]CLF = tree. Decisiontreeclassifier () CLF = Clf.fit (x, y) data1 = Np.array ([2.,2.]). Reshape (1,-1) print clf.predict (data1) # prediction category Print Clf.predict_proba (data1) # predict the probability of belonging to each class
All right, that's it, I hope it helps you.
This article GitHub address:
20170619_ decision Tree algorithm. MD
Welcome to Supplement