30 minutes learn to use Scikit-learn's basic regression methods (linear, decision Tree, SVM, KNN) and integration methods (random forest, AdaBoost and GBRT)

Source: Internet
Author: User
Tags cos sin svm zip

Note: This tutorial is I try to use scikit-learn some experience, Scikit-learn really super easy to get started, simple and practical. 30 minutes learning to call the basic regression method and the integration method should be enough.
This article mainly refers to the official website of Scikit-learn.
Preface: This tutorial mainly uses the most basic function of numpy, used to generate data, matplotlib used for drawing, Scikit-learn is used to call machine learning method. If you're not familiar with them (I'm not familiar with it), it's okay to look at NumPy and Matplotlib's simplest tutorials. Our program for this tutorial does not exceed 50 lines 1. Data Preparation

For experimentation, I wrote a two-dollar function, Y=0.5*np.sin (x1) + 0.5*np.cos (x2) +0.1*x1+3. The value range of the X1 is the 0~50,X2 range is -10~10,x1 and X2 training set a total of 500, the test set has 100. Among them, a -0.5~0.5 noise is added to the training set. The code for generating the function is as follows:
def f (x1, x2):
    y = 0.5 * Np.sin (x1) + 0.5 * Np.cos (x2)  + 0.1 * x1 + 3 
    return y

def load_data ():
    X1_tra in = Np.linspace (0,50,500)
    X2_train = Np.linspace ( -10,10,500)
    data_train = Np.array ([[X1,x2,f (X1,X2) + ( Np.random.random (1) -0.5)] x1,x2 in Zip (X1_train, X2_train)])
    x1_test = Np.linspace (0,50,100) + 0.5 * Np.random.random (+)
    x2_test = Np.linspace ( -10,10,100) + 0.02 * Np.random.random (+)
    data_test = Np.array ([[ X1,x2,f (X1,X2)] for x1,x2 in Zip (X1_test, x2_test)])
    return data_train, Data_test

The image of the training set (the random noise with -0.5~0.5 on y) and the test set (no noise) are as follows:
2. The simplest introduction of Scikit-learn.

Scikit-learn is very simple, just instantiate an algorithm object, then call the Fit () function, then fit, you can use the Predict () function to predict, and then use the score () function to evaluate the difference between the predicted value and the real value, the function returns a score. For example, the method of invoking a decision tree is as follows

in [6]: from Sklearn.tree import decisiontreeregressor in [7]: CLF = Decisiontreeregressor () in
           [8]: Clf.fit (X_train,y_train) out[11]: decisiontreeregressor (criterion= ' MSE ', Max_depth=none, Max_features=none, Max_leaf_nodes=none, Min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, Presort=False, R Andom_state=none, splitter= ' best ') in [all]: result = Clf.predict (x_test) in [+]: Clf.score (x_test,y_test) out [+]: 0.96352052312508396 in [+]: Result out[17]: Array ([2.44996735, 2.79065744, 3.21866981, 3.20188779, 3.04219101 , 2.60239551, 3.35783805, 2.40556647, 3.12082094, 2.79870458, 2.79049667, 3.62826131, 3.66788213, 4 .07241195, 4.27444808, 4.75036169, 4.3854911, 4.52663074, 4.19299748, 4.42235821, 4.48263415, 4.161 92621, 4.40477767, 3.76067775, 4.35353213, 4.6554961, 4.99228199, 4.29504731, 4.55211437, 5.08229167,

Next, we can draw an image based on the predicted value and the truth. The code for drawing is as follows:

    Plt.figure ()
    Plt.plot (Np.arange (len (Result)), y_test, ' go-', label= ' true value ')
    plt.plot (Np.arange (Len ( result), result, ' ro-', label= ' predict value ')
    plt.title (' Score:%f '%score)
    plt.legend ()
    plt.show ()

The image is then displayed as follows:
3. Start experimenting with various regression methods

To speed up the test, a function is written that takes the object of a different regression class, and then it draws the image and gives the score.
The functions are basically as follows:

def try_different_method (CLF):
    clf.fit (x_train,y_train)
    score = Clf.score (X_test, y_test)
    result = Clf.predict (x_test)
    plt.figure ()
    Plt.plot (Np.arange (len (Result)), y_test, ' go-', label= ' true value ')
    Plt.plot (Np.arange (result), result, ' ro-', label= ' predict value ')
    plt.title (' Score:%f '%score)
    plt.legend ()
    plt.show ()
Train, test = Load_data ()
x_train, Y_train = Train[:,:2], train[:,2] #数据前两列是x1, x2 the third column is Y, here Y has random noise
x_test, Y_test = Test[:,:2], test[:,2] # ditto, but y doesn't have noise here.
3.1 General regression methods

Conventional regression methods include linear regression, decision tree regression, SVM and K-nearest neighbor (KNN) 3.1.1 Linear regression .

In [4]: from Sklearn import Linear_model in

[5]: Linear_reg = Linear_model. Linearregression () in

[6]: Try_different_method (Linar_reg)

3.1.2 Number regression

From Sklearn import tree
Tree_reg = tree. Decisiontreeregressor ()
Try_different_method (Tree_reg)

The image of the decision tree regression is then displayed:
3.1.3 SVM regression

In [7]: From Sklearn import, SVM in

[8]: SVR = SVM. SVR () in

[9]: Try_different_method (SVR)

The resulting image is as follows:
3.1.4 KNN

In [all]: from Sklearn import neighbors in

[]: KNN = neighbors. Kneighborsregressor () in

[+]: Try_different_method (KNN)

Even KNN, the worst computational algorithm, works best.
3.2 Integrated methods (random forest, AdaBoost, GBRT) 3.2.1 Random Forest

in [+]: from Sklearn Import Ensemble in

[+]: RF =ensemble. Randomforestregressor (n_estimators=20) #这里使用20个决策树 in

[+]: Try_different_method (RF)

3.2.2 Adaboost

In []: Ada = ensemble. Adaboostregressor (N_ESTIMATORS=50) in

[+]: Try_different_method (ADA)

The image is as follows:
3.2.3 GBRT

In []: GBRT = ensemble. Gradientboostingregressor (n_estimators=100) in

[+]: Try_different_method (GBRT)

The image is as follows
4. There are many other methods of Scikit-learn, which can be tested by the user manual. 5. Complete code

I write the code here in Pycharm, but in the Pycharm does not display graphics, so you can copy the code into Ipython, using the%paste method to copy the code slice. The
then imports the algorithm with reference to each of the above methods and uses the Try_different_mothod () function to paint. The
complete code is as follows:

Import NumPy as NP import Matplotlib.pyplot as Plt def f (x1, x2): y = 0.5 * Np.sin (x1) + 0.5 * Np.cos (x2) + 3 + 0.1 * X1 return y def load_data (): X1_train = Np.linspace (0,50,500) X2_train = Np.linspace ( -10,10,500) data_t Rain = Np.array ([[X1,x2,f (X1,X2) + (Np.random.random (1) -0.5)] for x1,x2 in Zip (X1_train, X2_train)]) X1_test = Np.lins Pace (0,50,100) + 0.5 * Np.random.random (+) X2_test = Np.linspace ( -10,10,100) + 0.02 * Np.random.random (+) Data_ Test = Np.array ([[X1,x2,f (X1,X2)] for x1,x2 in Zip (X1_test, x2_test)]) return Data_train, Data_test train, test = Loa D_data () x_train, Y_train = Train[:,:2], train[:,2] #数据前两列是x1, x2 the third column is Y, here Y has random noise x_test, y_test = Test[:,:2], test[:,2] # Ditto, but y here does not have noise def try_different_method (CLF): Clf.fit (x_train,y_train) score = Clf.score (X_test, y_test) Resul t = clf.predict (x_test) plt.figure () Plt.plot (Np.arange (len (Result)), y_test, ' go-', label= ' true value ') Plt.plo T (Np.arange (len (Result)), result, ' ro-', label= ' predict value ') Plt.title (' Score:%f '%score) plt.legend () plt.show ()
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.