Course Address: Https://class.coursera.org/ntumltwo-002/lecture
Important! Important! Important ~
I. Decision trees (decision tree), Pocket (Bagging), Adaptive Enhancement (AdaBoost)
- When the bagging and AdaBoost algorithms are reclassified, it is time for all weak classifiers to function simultaneously. The difference between them is whether each weak separator has the same weight as the subsequent blending generation G.
- Decision Tree is a conditional fusion algorithm that allows a classifier to function only on a per-condition basis at a time.
Second, the basic decision tree algorithm
1. From the perspective of recursive view decision tree, according to the characteristics of the selection of branching conditions, the continuous generation of sub-tree, all the sub-numbers constitute the final decision tree.
For example, the following is a decision tree based on the characteristics of home time, dating situation, job deadline, and whether to watch Mooc online courses.
2. Basic Decision Tree Algorithm description
- Determines the branch condition. Branching conditions can be determined by a person or generated by an algorithm
- Dividing training data by branching criteria D
- The subtree is constantly recursively generated according to the branching conditions until the termination condition is met
- In order to prevent overfitting, limit the complexity of the model, usually by pruning (pruning) to regularization decision tree
Three, cart algorithm (categorical regression tree)
Lin said here is easy to understand, can refer to: http://blog.csdn.net/u011067360/article/details/24871801?utm_source=tuicool&utm_medium= Referral
- Cart is a two-fork tree with only two branches
- Determine the condition of the branch according to the purity. For the usual selection of Gini for classification, LSD or lad can be used for successive target variables.
Gini index: Number between 0~1, 0-exactly equal, 1-completely unequal. The more cluttered the categories that are contained in the population, the greater the Gini index (similar to the concept of entropy).
- Find the best feature to be segmented according to Gini
- If the node cannot be divided, save the node as a leaf node
- Execute Two-dollar segmentation
- In the right subtree recursive call Createtree () method, create subtree
- In the right subtree recursive call Createtree () method, create subtree
Four, cart and AdaBoost meta-algorithm application comparison
Cart is more efficient than adaboost because the former is "conditionally cut" and the latter is completely "horizontal and vertical".
Five, the characteristics of the cart practice
Machine learning techniques-decision tree and CART classification regression tree construction algorithm