Cart return tree (CHAP9) machine learning in action learning notes

Source: Internet
Author: User

Follow-up study again, now understand a little vague.
Advantages:Can model complex and non-linear data Disadvantages:The results are difficult to understand applicable data type:Numeric (converted to two-value) and nominal type data
The general method of tree regression collects data: Collects data in any way. Prepare data: Numeric data is required, and nominal data should be mapped to two-value data. Analyze data: Draw a two-dimensional visual display of the data, creating a tree in a dictionary. Training algorithm: Most of the time is spent on the construction of the leaf node tree model. Test algorithm: Use the R2 value on the test data to analyze the effect of the model. Using algorithms: Using trained trees to make predictions, predictions can also be used to do a lot of things
The regression tree is similar to the classification tree, but the data type of the leaf node is not discrete, but continuous type.
Use a Dictionaryto store data structure of the tree, the dictionary will contain:
    • Features to be sliced.
    • The characteristic value to be sliced.
    • Right sub-tree. It can also be a single value when the Shard is no longer needed.
    • Left dial hand tree. Similar to the right subtree.

calculates the degree of chaos of a continuous number:The mean value of all data is computed first, and then the difference between the value of each data and the mean is calculated. In order to treat the positive and negative values equally, the absolute value or the square is generally used instead of the above difference. Like variance, variance is the mean (mean variance) of the squared error, and what is needed here is the total value of the squared error (the gross variance). The total variance can be obtained by multiplying the mean variance by the number of sample points in the data set.

pseudo-code for Function Createtree ():Find the best feature to split: If the node cannot be divided, save the node as a leaf node to perform a two-tuple shard call in the right subtree called the Createtree () method on the left subtree called the Createtree () method
pseudo-code for Function Choosebestsplit ():
For each feature: Divide the data set into two parts the error of the calculation slice if the current error is less than the current minimum error, then the current tangent is set as the best slice and the minimum error is updated to return the feature and threshold of the best slice.
The process of avoiding overfitting by reducing the complexity of the decision tree is called Pruning
Pre-pruning: Set termination conditions in advance
Post-pruning: Using test sets and training sets

post-pruning:Divide the dataset into test sets and training sets. Specify the parameters first so that the tree is large enough, complex enough, and easy to prune. Next, the leaf nodes are found from the top down, and the test set is used to determine whether the merging of these leaf nodes can reduce the test error. If so, merge.
the pseudo-code for the function prune () is as follows:Based on the existing tree segmentation test data: If any subset is a tree, then the recursive pruning process of the subset calculates the error of merging the current two leaf nodes to calculate the non-merging error if merging reduces the error, the leaf node is merged
Using a tree to model data, in addition to simply setting the leaf node as a constant value, there is also a way to set the leaf node as a piecewise linear function, here the so-called piecewise Linear(piecewise linear) refers to a model consisting of multiple linear fragments.




From for notes (Wiz)

Cart return tree (CHAP9) machine learning in action learning notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.