Decision tree algorithm and Decision Algorithm
English name: demo-tree
Decision tree is a typical classification method. It processes data first, generates readable rules and decision trees using an induction algorithm, and then analyzes new data using a decision. In essence, a decision tree is a process of classifying data through a series of rules.
Decision tree is a supervised learning method mainly used for classification and regression. The goal of an algorithm is to create a model that predicts the target variables by inferring data features and learning decision-making rules.
The decision tree is similar to the if-else structure. The result is that you need to generate a tree that can continuously judge and select the leaf node from the root of the tree. However, the if-else condition here is not set manually, but automatically generated by the computer based on the algorithm we provide.
Decision tree components
Is the choice of several possible solutions, that is, the final selection of the best solution. If a decision is a multi-level decision, multiple decision points can exist in the middle of the decision tree. The decision points at the root of the decision tree are the final decision-making scheme.
Represents the economic effect (expectation) of the alternative scheme. By comparing the economic effect of each State node, the optimal scheme can be selected according to certain decision criteria. A branch derived from a State node is called a probability branch. The number of probability branches indicates the number of natural states that may occur. Each branch must indicate the probability of this state.
Mark the profit and loss values obtained by each scheme in various natural states on the right end of the result node.
Advantages and disadvantages of decision tree groups
Easy to understand, clear principles, and visualization of Decision Trees
The reasoning process is easy to understand. The decision-making reasoning process can be expressed in the if-else form.
The reasoning process depends entirely on the value characteristics of attribute variables.
You can automatically ignore attribute variables that are not contributed by the target variable. It also provides reference for determining the importance of attribute variables and reducing the number of variables.
Disadvantages of decision tree
An overly complex rule, that is, overfitting, may be created.
Decision Trees are sometimes unstable because of small changes in data, which may generate completely different decision trees.
Learning the optimal decision tree is a complete NP problem. Therefore, the actual decision tree learning algorithm is based on the test algorithm, for example, the greedy algorithm that implements the local optimal value on each node. Such an algorithm cannot guarantee that a global optimal decision tree is returned. Multiple decision trees can be trained by randomly selecting features and samples to solve this problem.
Some problems are difficult to learn, because decision trees are difficult to express. For example: exception or problem, parity or multiplexing Problem
If some factors are dominant, the decision tree is biased. Therefore, we recommend that you balance the data impact factors before fitting the decision tree.
Common Decision Tree Algorithms
Decision Tree algorithms include CART, ID3, C4.5, C5.0, etc. ID3, C4.5, and C5.0 are based on information entropy, while CART uses entropy-like indexes as classification decisions, pruning is required after the decision tree is formed.
Entropy: the degree of disorder in the system
ID3 algorithm
ID3 is a classification decision tree algorithm. Based on a series of rules, the data is finally classified into decision trees based on entropy.
ID3 is a classic decision tree learning algorithm proposed by Quinlan. The basic idea of the ID3 algorithm is to use information entropy as a metric for Attribute selection of Decision Tree nodes. Each time the attribute with the most information is preferentially selected, that is, the entropy value can be changed to the smallest attribute, construct a decision tree with the highest entropy reduction. The entropy at the leaf node is 0. At this time, the instances in the instance set corresponding to each leaf node belong to the same class.
The ID3 algorithm is used to realize the early warning analysis of customer loss and identify the characteristics of customer loss, so as to help telecom companies improve customer relationships and avoid customer loss.
Data mining using the decision tree method generally involves the following steps: data pre-processing, decision tree mining, pattern evaluation, and application.
C4.5 algorithm
C4.5 is a further extension of ID3. By discretization continuous attributes, features are removed. C4.5 converts the training tree to a series of if-then syntax rules. The accuracy of these rules can be determined to determine which rules should be used. If a rule is removed and the performance is improved accurately, trim the rule.
C4.5 is the same as ID3 in the core algorithm, but one method is different. C4.5 uses the information gain rate as the basis for division, it overcomes the information gain division in ID3 algorithm and leads to a property selection that tends to have more values.
C5.0 Algorithm
C5.0 uses smaller memory than C4.5, and establishes smaller decision-making rules, making it more accurate.
CART Algorithm
The CART -- Classification And Regression Tree is an interesting And effective non-parameter Classification And Regression method. It builds a binary tree for prediction. The CART model of the classification and regression tree was first proposed by Breiman and others and has been widely used in the statistical field and data mining technology. It uses a different method from traditional statistics to construct a Prediction Criterion. It is presented in the form of a binary tree, which is easy to understand, use, and explain. In many cases, the prediction tree built by the CART model is more accurate than the statistical method used to construct the mathematical forecasting criterion. The more complex the data and the more variables the algorithm has, the more obvious the superiority of the algorithm. The key to a model is the construction and accuracy of prediction criteria. Definition: Classification and regression first use known multi-variable data to build a prediction criterion, and then predict a variable based on other variable values. In classification, people often perform various measurements on an object first, and then use certain classification criteria to determine the category of the object. For example, given the identification characteristics of a rock, we can predict the species, genus, and even species of the fossil. Another example is to know the geological and geophysical information of a region and determine whether there is any ore in the area. Unlike classification, regression is used to predict a value of an object, rather than the classification of an object. For example, given the mineral resources characteristics of a region, the amount of resources in the region is predicted.
CART is similar to C4.5, But it supports the objective variable (regression) of values without generating decision rules. CART uses features and thresholds to obtain the maximum information gain at each node to construct a decision tree.
Scikit-learn uses the CART algorithm.
Sample Code:
#! /Usr/bin/env python #-*-coding: UTF-8-*-from sklearn import treeimport numpy as np # the decision tree algorithm used by scikit-learn is CARTX = [[0, 0], [1, 1] Y = ["A", "B"] clf = tree. decisionTreeClassifier () clf = clf. fit (X, Y) data1 = np. array ([2 ., 2.]). reshape (1,-1) print clf. predict (data1) # prediction class print clf. predict_proba (data1) # predict the probability of each class
Okay, that's all. I hope it will help you.
Github address:
Bytes
Please add