Summary of Gaussian kernel parameters for support vector machine

Source: Internet
Author: User
Tags svm rbf kernel

In the kernel function of support vector Machine (hereinafter referred to as SVM), Gauss Kernel (hereinafter referred to as RBF) is the most commonly used, theoretically speaking, the RBF must not be worse than the linear kernel function, but in practical application, it is faced with several important problems of super-parameter tuning. If the adjustment is not good, it may be worse than the linear kernel function. So we can choose the linear kernel function to get better results with linear kernel function in practical application. If the linear kernel is not good, we need to use RBF, before we enjoy the good classification effect of RBF on nonlinear data, we need to select the main super parameters. In this paper, we make a summary of the parameters of SVM RBF in Scikit-learn.

1. SVM RBF main Hyper-parameter overview

If it is a SVM classification model, these two super parameters are the coefficients of the penalty coefficient $c$ and the RBF kernel function $\gamma$ respectively. Of course, if it is nu-svc, the penalty coefficient $c$ is substituted for the classification error rate Upper Nu, because the penalty coefficient $c$ and the classification error rate upper Nu function equivalence, so this article only discusses the classification SVM with the penalty coefficient C.

The penalty coefficient $c$ is the coefficient of the relaxation variable we talked about in the previous schematic. In the optimization function, the relationship between the complexity of the support vector and the rate of mis-classification can be understood as regularization coefficients. When the $c$ is larger, our loss function will be larger, which means we are unwilling to abandon the far outliers. This allows us to have more support vectors, which means that the support vectors and the hyper plane models become more complex and easier to fit. Conversely, when the $c$ is relatively small, it means that we do not want to ignore those outliers, will choose fewer samples to do support vectors, the final support vector and the super-plane model will be simple. The default value in Scikit-learn is 1.

Another super parameter is the parameter $\gamma$ of the RBF kernel function. Recall the RBF kernel function $k (x, Z) = exp (\gamma| | x-z| | ^2) \;\;\gamma>0$,$\gamma$ mainly defines the effect of a single sample on the entire classification of the super-plane, when the $\gamma$ is relatively small, the impact of a single sample on the entire classification of the super-plane, more easily selected as a support vector, conversely, when the $\gamma$ is relatively large , the impact of a single sample on the entire categorical hyper-plane is small, not easily chosen as a support vector, or the entire model will have fewer support vectors. The default value in Scikit-learn is the number of $\frac{1}{sample features}$

If the penalty coefficient $c$ and the coefficient of the RBF kernel function $\gamma$ together, when the $c$ is larger, $\gamma$ compared to the hour, we will have more support vectors, our model will be more complex, easy to fit some. If the $c$ is small and the $\gamma$ is larger, the model becomes simple and the number of support vectors is less.

The above is the SVM classification model, we will look at the regression model.

The RBF kernel of SVM regression model is more complicated than the classification model, because in addition to the coefficient $\gamma$ of the penalty coefficient $c$ and the RBF kernel function, we also have a loss distance measure $\epsilon$. In the case of NU-SVR, the loss distance metric $\epsilon$ is substituted for the categorical error rate Upper Nu, because the loss distance metric $\epsilon$ and the classification error rate upper Nu function are equivalent, so this article only discusses regression SVM with distance metric $\epsilon$.

For the coefficient $\gamma$ of $c$ and RBF kernel function, the function of regression model and classification model is basically the same. For the loss distance metric $\epsilon$, it determines the distance loss from the sample point to the super plane, and when the $\epsilon$ is larger, the loss $|y_i-w \bullet \phi (x_i)-b| -\epsilon$ smaller, more points within the range of loss distance, without loss, the model is simpler, and when the $\epsilon$ is relatively small, the loss function will be larger, the model will become complex. The default value in Scikit-learn is 0.1.

If the penalty coefficient $c$,rbf kernel function coefficient $\gamma$ and loss distance measurement $\epsilon$ together, when $c$ is larger, $\gamma$ smaller, $\epsilon$ compared to the hour, we will have more support vectors, our model will be more complex, Easy to fit some. If the $c$ is smaller, the $\gamma$ is larger, and the $\epsilon$ is larger, the model becomes simple and the number of support vectors is less.

2. SVM RBF Main method of parameter adjustment

For the RBF kernel of SVM, our main method of parameter tuning is cross-validation. Specifically in Scikit-learn, the main use of grid search, that is, the GRIDSEARCHCV class. Of course, you can use the Cross_val_score class to adjust the parameters, but personally feel that there is no GRIDSEARCHCV convenience. In this paper, we only discuss the parameters of the RBF kernel using GRIDSEARCHCV for SVM.

The parameters we will pay attention to when we use the GRIDSEARCHCV class for the SVM RBF parameter are:

1) Estimator: Our model, here we are the SVC or SVR with the Gaussian nucleus

2) Param_grid: That is, we want to adjust parameter list. For example, if we use the SVC classification model, then the Param_grid can be defined as {"C": [0.1, 1, ten], "gamma": [0.1, 0.2, 0.3]}, so we will have 9 kinds of super-parameter combination for the grid search, select the best fit fraction of the super-plane coefficient.

3) Cv:s the folded number of cross-validation, and how many copies of the training set will be divided for cross-validation. The default is 3,. If the sample is larger, the CV value can be increased moderately.

After the grid search is over, we can get the best model estimator, the best parameter combination in Param_grid, the best model score.

Below, I use a specific classification example to observe the process of SVM RBF parameter tuning

3. An example of a SVM RBF classification assistant

Here we use an example to explain the SVM RBF classification parameters. It is recommended to run the following example in Ipython notebook.

First we load the definitions of some classes.

Import NumPy as NP Import Matplotlib.pyplot as Plt  from Import datasets, SVM  from Import SVC  from Import make_moons, Make_circles, make_classification%matplotlib Inline

Then we generate some random data to sort it out later, and for the data to be difficult, we add some noise. Data normalization at the same time that it is generated

X, y = Make_circles (noise=0.2, factor=0.5, random_state=1);  from Import  = Standardscaler (). Fit_transform (X)

Let's take a look at what my data looks like, and here's a visualization of the following:

 from  matplotlib.colors import   LISTEDCOLORMAPCM  = plt.cm.RdBucm_bright  = Listedcolormap ([ '   #FF0000  " ,  "  #0000FF   " ) ax  = Plt.subplot () ax.set_title (  " input data   " )  #   Plot the training points  Ax.scatter (x[:, 0], x[:, 1], c=y, Cmap=cm_bright) Ax.set_xticks (()) Ax.set_yticks (()) Plt.tight_layout () plt.show ()  

The resulting graph is as follows, because it is randomly generated, so if you run this code, the resulting diagram might be a little different.

Well, now we're going to do SVM for this data set, we used a grid search, we chose the best super-parameters in 9 cases of c= (0.1,1,10) and gamma= (1, 0.1, 0.01), and we used 40 percent cross-validation. Here is just one example, in practice, you may need more parameter combinations to make the argument.

 from Import  = GRIDSEARCHCV (SVC (), param_grid={"C""gamma": [1, 0.1, 0.01]}, cv=4) grid.fit (X, y)print("Thebestparameters is%s with a score of%0.2 F"      % (Grid.best_params_, grid.best_score_))

The final output is as follows:

The best parameters is {' C ': Ten, ' Gamma ': 0.1} with a score of 0.91

That is, through the grid search, in our given 9 sets of super-parameters, c=10, gamma=0.1 score the highest, this is our final parameter candidate.

Here, we have an example of the argument over. However, we can look at our common SVM-categorized visualizations. Here we put these 9 kinds of combinations after each training, through the grid in the point prediction to color, observe the classification. The code is as follows:

X_min, X_max = x[:, 0].min ()-1, x[:, 0].max () + 1y_min, Y_max= x[:, 1].min ()-1, x[:, 1].max () + 1xx, yy= Np.meshgrid (Np.arange (X_min, x_max,0.02), Np.arange (Y_min, Y_max,0.02)) forI, CinchEnumerate ((0.1, 1, 10)):     forJ, GammainchEnumerate ((1, 0.1, 0.01)): Plt.subplot () CLF= SVC (C=c, gamma=Gamma) Clf.fit (x, y) Z=clf.predict (Np.c_[xx.ravel (), Yy.ravel ())#Put The result into a color plotZ =Z.reshape (Xx.shape) Plt.contourf (xx, yy, Z, CMap=plt.cm.coolwarm, alpha=0.8)        #Plot also the training pointsPlt.scatter (x[:, 0], x[:, 1], c=y, cmap=plt.cm.coolwarm) Plt.xlim (Xx.min (), Xx.max ()) Plt.ylim (Yy.min (), Yy.max ()) Plt.xticks (()) Plt.yticks (()) Plt.xlabel ("gamma="+ STR (gamma) +"c="+str (C)) plt.show ()

The resulting 9 combinations are as follows:

The above is the SVM RBF some summary of the parameter, hope can help friends.

(Welcome reprint, reproduced please indicate the source.) Welcome to communicate: [email protected])

Support vector machine Gaussian kernel parameter summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.