Decision tree and its variants are another kind of algorithms that divide the input space into different regions with independent parameters. Decision tree classification algorithm is a case-based inductive learning method. It can extract the tree type classification model from the given disordered training samples. Each non leaf node in the tree records which feature is used to judge the category, and each leaf node represents the final category. A classification path rule is formed from the root node to each leaf node. When testing a new sample, we only need to start from the root node, test at each branch node, recursively enter the subtree along the corresponding branch, and then test again until reaching the leaf node. The category represented by the leaf node is the prediction category of the current test sample.
Compared with other machine learning classification algorithms, the decision tree classification algorithm is relatively simple. As long as the training sample set can be represented by feature vectors and categories, the decision tree classification algorithm can be constructed. The complexity of the prediction classification algorithm is only related to the number of decision tree layers, which is linear and has high data processing efficiency. It is suitable for real-time classification.
In machine learning,
decision tree is a prediction model. It represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, each branch path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path from the root node to the leaf node. The decision tree only has a single output, if you want to have complex output, you can establish an independent decision tree to deal with different outputs. In data mining, decision tree is a frequently used technology, which can be used to analyze data, as well as to make predictions. The machine learning technology of generating decision tree from data is called decision tree learning, which is generally called decision tree learning.
The
decision tree algorithm includes two stages: training and testing: in the training stage, the training sample set needs to be divided into several subsets by certain standards and rules, and then each subset is segmented with the same rules. The recursive process stops until each subset contains only samples belonging to the same class. In the training process, each segmentation node needs to save the classification attribute number. In the test phase, the test sample is discriminated from the root node to see which sub node the sample belongs to, and it is executed recursively until the sample is divided into the leaf node, and at this time, the sample belongs to the category of the current leaf node.
Because of the instability of decision tree classification method, when the number of samples in training sample set is small, small changes in the sample set may also lead to great changes in the structure of decision tree. Bagging technology can be used to improve the stability of decision tree classification. The decision tree algorithm is trained for several rounds, and the category prediction of test samples is carried out by voting.
The branch node of the decision tree represents the attribute, also called the decision node; the leaf node represents the class label, also known as the decision result. The decision tree is from the top to the bottom of the root node in turn, according to the difference of attribute threshold, it extends to every place until the next attribute node, until the last leaf node to complete the prediction.
Decision tree is a tree structure, which has three different nodes: decision node: it represents an intermediate process, which is mainly used to compare the values of each attribute in a data set, so as to judge the trend of decision-making in the next step. State node: represents the expected value of alternatives, and the best result can be selected through the comparison of each state node. Result node: it represents which category the class belongs to. At the same time, you can clearly see how many categories there are in the model. Finally, a data instance gets its decision node according to the value of each attribute.
Decision tree training in statistics, data mining and machine learning uses decision tree as a prediction model to predict the class label of samples. This kind of decision tree is also called classification number or regression number. In the structure of these trees, the leaf node gives the class label and the internal node represents an attribute.
Decision tree learning: according to the attribute of data, the decision model is established by tree structure. Decision tree model is often used to solve classification and regression problems. In machine learning, decision tree is a prediction model, which represents a kind of mapping between object attributes and object values. Each node in the tree represents the judgment condition of object attributes, and its branches represent objects with symbolic node conditions. The leaf node of the tree represents the prediction result to which the object belongs.
Decision tree is a kind of supervised learning. According to the structure of decision tree, decision tree can be divided into binary decision tree and multi branch tree. For example, some decision tree algorithms only produce binary tree (in which each internal node bifurcates two branches), while other decision tree algorithms may generate non binary tree.
Decision tree is a decision tree learned from training data set with class name. Each non leaf node in the tree represents the discriminant condition of an attribute, each branch represents an output of the criterion, and each leaf node represents a category name. The first node of the tree is the follower.
After the construction of the decision tree model, the decision model is applied to classify a given tuple x with unknown class label. By testing the attribute value of the tuple x, a path from the root node to the leaf node is obtained, and the leaf node stores the class prediction of the tuple. In this way, an unknown class label tuple data classification is completed, and the decision tree can also be expressed as classification rules.
Decision tree component algorithm has the advantages of fast construction, obvious structure and high classification accuracy. Decision tree is an inductive classification method based on instance. It extracts classification rules in the form of decision tree from a set of unordered data sets without special domain knowledge, including branch node, leaf node and branch structure. It uses the top-down recursive method to construct the tree structure. During the decision-making, the branch nodes make classification selection based on the attribute values. The branch nodes cover the possible classification results, and finally the branch nodes connect the leaf nodes representing the classification results. The connecting nodes in the process of classification represent a classification pattern, and the set of these classification patterns constitutes the framework of decision tree.
Decision tree is a classification algorithm based on inductive learning, which mainly includes two stages: Construction and pruning. The construction process of decision tree is a top-down, recursive divide and conquer process. The key step of creating decision tree from decision table is to select branch attributes and divide sample set. Pruning decision tree is one of the methods to stop the splitting of decision tree. First pruning is to complete the pruning operation at the same time in the process of decision tree generation, and stop the classification of nodes in advance. The key of pruning algorithm is to select the appropriate measure value. First pruning algorithm avoids unnecessary waste of computation and can directly generate the final classification number, so it is widely used. The post pruning algorithm is to replace the node from branch to leaf by specifying the corresponding measure value after the decision tree grows freely. The post pruning strategy will increase the calculation amount of decision tree algorithm, but the classification results are slightly accurate.
Pruning of decision tree: pruning is one of the methods to stop branching of decision tree. Pruning can be divided into pre pruning and post pruning. The computational cost of post pruning method is much higher than that of pre pruning method, especially in large sample set. However, for small samples, post pruning method is better than pre pruning method.