Scikit-learn and the return tree

Last Update:2018-07-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

principle of regression algorithm

The CART (classification and regression tree) algorithm is one of the most mature algorithms in decision trees, and its application scope is very wide. It can be used for classification.
Western prediction theory is generally based on regression, cart is a decision tree method to achieve regression algorithm, it has many other global regression algorithms do not have the characteristics.
When the regression model is created, the sample value is divided into observation value and output value of two kinds, observation value and output value are continuous, unlike the classification function, there is no classified label, only according to the data characteristics of data set to create a predictive model, reflecting the trend of the curve. In this case, the optimal partitioning rule of the original classification tree no longer works.in the prediction, CART uses the minimum residual variance (squared residuals minimization) to determine the optimal partition of the regression tree, and this criterion expects the least error variance of the subtree and the sample point after partition.。 The decision tree divides the dataset into many sub model data, and then uses the linear regression technique to model it. If the subset of data after each segmentation is still difficult to fit, continue to slice. In this method, each leaf node is a linear regression model created by the prediction tree.These linear regression models reflect the patterns contained in the sample set (observation set), also known as the Model tree。 Therefore, cart not only supports Roman prediction, but also supports the prediction of local patterns, and the ability to find patterns from the whole, or to combine them into a whole. The combination of the whole and the pattern is of great value for the prediction analysis. Therefore, the application of the CART decision tree algorithm in the prediction is very extensive.
The following is an introduction to the algorithm flow of cart:
(1) Decision tree Main function: The main function of decision tree is a recursive function. The main function of the function is to grow each branch node of the decision tree according to the rules of the cart and end the algorithm according to the termination condition.
A. Enter the data sets and category labels that you want to categorize.
B. Use the minimum residual variance to determine the optimal partition of the regression tree and create the partition node of the feature--the minimum residual variance child function.
C. Dividing the data set of nodes into two parts--two-point data collection function.
D. Construct a new left-right node based on the results of the binary data, as two branches of tree growth.
E. Verify compliance with recursive termination conditions.
F. Recursively perform the above steps by incorporating the data sets and category labels that are divided into new nodes as input.
(2) The minimum residual variance function is used to compute the optimal partition variance, dividing column and dividing value of each column in the dataset.
(3) Binary data set: The data set is divided into two parts according to the given delimited column and the separated value, and is returned separately.minimum residual variance method

In the regression tree, the dataset is continuous. The processing method of continuous data is different from the discrete data, and the discrete data is divided according to the value of each feature, while the continuous feature is to calculate the optimal partition point. However, it is very simple to compute the linear correlation on the continuous dataset, and the algorithm is derived from the least square method.
Minimum residual variance method, firstly, the mean value and total variance of the data column are obtained. There are two ways to calculate the total variance
calculate the mean STD, compute the variance of each data point and STD, and then sum the N points .
The variance var is evaluated, and then Var_sum = Var*n,n is the number of data set data.
So the selection process for each of the best branching features is as follows.
(1) The best variance of shillings is infinitely large Bestvar = INF.
(2) This traversal of all feature columns and all the sample points of each feature column (this is a two loop), the binary data set at each sample point.
(3) Calculates the total variance currentvar after the binary data set, if Currentvar < Bestvar, then Bestvar = Currentvar.
Returns the optimal branching feature column, the branch eigenvalue (the value of the contiguous feature is the dividing point), and the left and right branch data set to the main program. Model Tree

Using cart to predict is to set the leaf node as a series of piecewise linear functions, these piecewise linear functions are a simulation of the source data curve, each linear function is called a model tree. The model tree has many excellent properties, and it contains the following characteristics.
In general, the overall reproducibility of the sample is not very high, but the local pattern is often repeated, that is, the history is not simple repetition, but will repeat itself. The model is more useful than the overall forecast for the future.
The model gives the range of the data, which may be a time range or a spatial range, and the model also gives a trend of change, either a curve or a straight line, depending on the regression algorithm used. These factors make the model have strong explanatory ability.
The traditional regression method, whether linear regression or nonlinear regression, is not as rich as the information contained in the model tree, so the model tree has higher prediction accuracy. Scikit-learn Implementation

#!/usr/bin/python
# Created by Lixin 20161118
import numpy as NP-
numpy import * from
sklearn.tree imp ORT decisiontreeregressor
import Matplotlib.pyplot as PLT


def plotfigure (X,X_TEST,Y,YP):
        plt.figure ()
        Plt.scatter (x,y,c= "K", label= "data")
        Plt.plot (x_test,yp,c= "R", label= "max_depth=5", linewidth=2)
        Plt.xlabel ("Data")
        Plt.ylabel ("target")
        plt.title ("Decision tree regression")
        plt.legend (loc= ' Upper right ')
        plt.show ()
        #plt. Savefig ('./res.png ', format= ' png ')


x = Np.linspace ( -5,5,200)
Siny = Np.sin (x)
x = Mat (x). T
y = siny + np.random.rand (1,len (siny)) *1.5
y = y.tolist () [0]
CLF = decisiontreeregressor (max_depth=4)
Clf.fit (x,y)

X_test = Np.arange ( -5.0,5.0,0.05) [:, Np.newaxi
YP = clf.predict (x_test)

plotfigure (X,X_TEST,Y,YP)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Scikit-learn and the return tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Scikit-learn and the return tree

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support