Sklearn Learning-SVM Routine Summary 3 (grid search + cross-validation-find the best super parameter)

Source: Internet
Author: User
Tags split svm vmin

Grid Search + cross-validation--searching for optimal hyper-parameters

1548962898@qq.com

Three blogs were written for three days in a row, mainly to understand the important knowledge beyond the algorithms in machine learning as soon as possible, and that knowledge could be migrated to every algorithm, or, perhaps, the basis for learning and applying other algorithms. Three days is too short, some knowledge can only be read, understanding is not thorough, but at least for every point has an impression, and later in the integration with other algorithms, to do many times to understand. And some of the better online blog I have collected, and so on in the face of similar knowledge, and then repeated reading, deepen understanding. I am sorry, the previous several original blog is very water, without their own ideas, I will be in the learning process in the future to repeat the previous blog, add their own ideas. After all, I have just started, since last November counted down also has 4 months of time, feel oneself still did not get started, really not simple, but I like to accept the challenge, I believe there are many people like me. Say more, back to, the previous several blog mentioned, feature selection, regularization, as well as unbalanced data and outlier classification problems, but also related to matplotlib in the method of drawing. Today we will talk about how to choose the super parameters in the modeling process: Grid search + Cross validation. In this paper, we first give a sample of SVM in Sklearn, then explain how the parameters are affected by the results, and finally give the blog links and related reference links that Daniel wrote.

RBF SVM Parameters

By means of the Heat force diagram, the function of the parameter gamma and C is explained, and the code is as follows:

"' ================== RBF SVM Parameters ================== ' # # # # #交叉验证, select the hyper-parameter issue print (__doc__) import NumPy as NP Impo RT Matplotlib.pyplot as PLT from matplotlib.colors import Normalize to SKLEARN.SVM import SVC from Sklearn.preprocessin G Import Standardscaler from sklearn.datasets import load_iris from sklearn.model_selection Import stratifiedshufflesplit# layered shuffle split cross validation from sklearn.model_selection import GRIDSEARCHCV # Utility function to move the Midpoi

NT of a colormap to be around # the values of interest. Class Midpointnormalize (Normalize): Def __init__ (self, vmin=none, Vmax=none, Midpoint=none, clip=false): Self
        . Midpoint = Midpoint normalize.__init__ (self, vmin, Vmax, clip) def __call__ (self, Value, Clip=none): X, y = [Self.vmin, Self.midpoint, Self.vmax], [0, 0.5, 1] return Np.ma.masked_array (Np.interp (value, x, y)) # # # # # # # # # # # ########################################################################## # Load and prepare Data set # # DatasET for grid Search iris = Load_iris () X = Iris.data y = iris.target # Dataset for decision function Visualization:we on Ly keep the first and features in X and sub-sample the dataset to keep only 2 classes and # make it a binary classificat
Ion problem.
#保留原分类中的2, Class 3, becomes a two classification problem. x_2d = x[:,: 2] x_2d = x_2d[y > 0] #选出类为1, 2 X y_2d = y[y > 0] y_2d-= 1 # It's usually a good idea to scale the DAT
A for SVM training. # We are cheating a bit in this example with scaling all of the data, # instead of fitting the transformation on the Trainin

G Set and # Just applying it on the test set. Scaler = Standardscaler () #进行标准化, by removing the mean and unit variance scaling normalization function X = Scaler.fit_transform (x) #先fit, then transform #
Information unsupervised conversion refers to the conversion of statistical information using only features, including mean, standard deviation, boundary and so on, such as standardization and PCA reduction. #X = Scaler.transform (X) #失败原因是不知道如何转换, you need to know the conversion rules, training to learn how to convert #http://blog.csdn.net/kaido0/article/details/52974049 # http://stackoverflow.com/questions/23838056/ What-is-the-difference-between-transform-and-fit-transform-in-sklearn x_2d = Scaler.fit_transform (X_2d) ############################################################################## # Train classifiers # # for a initial Search, a logarithmic grid with basis # often helpful.

Using a basis of 2, a finer # tuning can is achieved but at a much higher cost. C_range = Np.logspace ( -2, 10) # Logspace (a,b,n) divides 10 of the A-to- -9 into the N-parts of the B-part gamma_range = Np.logspace (3) Param_grid = d ICT (Gamma=gamma_range, C=c_range) #dict ([container]) to create a dictionary of factory functions.
If a container class (container) #就用其中的条目填充字典 is provided, an empty dictionary is created. #dict (a= ' A ', b= ' B ', t= ' t ') #{' a ': ' A ', ' B ': ' B ', ' t ': ' t '} #关键字参数的等号左边必须为一个变量. And the right side must be a value, not a variable.
Otherwise it will error CV = Stratifiedshufflesplit (n_splits=10, test_size=0.2, random_state=42) #random_state用于随机抽样的伪随机数发生器状态.
#n_splits重新洗牌和分裂迭代次数.
#http://scikit-learn.org/stable/modules/generated/sklearn.model_selection. #StratifiedShuffleSplit. Html#sklearn.model_selection.
Stratifiedshufflesplit Grid = GRIDSEARCHCV (SVC (), Param_grid=param_grid, CV=CV) #基于交叉验证的网格搜索.
#cv: Determines the cross-validation split policy. #http://scikit-learn.org/stable/modules/generated/sklearn.model_selection. Gridsearchcv.html#sklearn.model_selection. GRIDSEARCHCV Grid.fit (X, y) print ("The best parameters is%s with a score of%0.2f"% (Grid.best_params_, Grid.bes t_score_)) #找到最佳超参数 # Now we need-fit a classifier for all parameters in the 2d version # (we use a smaller set of para meters here because it takes a and train) C_2d_range = [1e-2, 1, 1e2] Gamma_2d_range = [1e-1, 1, 1e1] classifiers =
        [] for C in c_2d_range:for gamma in GAMMA_2D_RANGE:CLF = SVC (c=c, Gamma=gamma) clf.fit (x_2d, y_2d) Classifiers.append ((C, Gamma, CLF)) ############################################################################# # # # # # # visualization # Draw visualization of parameter effects plt.figure (figsize= (8, 6)) xx, yy = Np.meshgrid (np.linspace ( -3, 3, $), Np.linspace ( -3, 3, +)) for (K, (C, Gamma, CLF)) in Enumerate (classifiers): # Evaluate decision Functio N in a grid Z = Clf.decision_function (np.c_[xx.raveL (), Yy.ravel ()]) Z = Z.reshape (xx.shape) # Visualize decision function for these parameters Plt.subplot (len (
              C_2d_range), Len (Gamma_2d_range), K + 1) plt.title ("gamma=10^%d, c=10^%d"% (np.log10 (gamma), Np.log10 (C)), Size= ' Medium ') # visualize parameter ' s effect on decision function #可视化参数对决策函数的影响 Plt.pcolormesh (xx, yy, -Z, CMAP=PLT.CM.RDBU) #对网格进行画图 plt.scatter (x_2d[:, 0], x_2d[:, 1], c=y_2d, Cmap=plt.cm.rdbu_r) plt.xticks (()) p
                                                     
Lt.yticks (()) Plt.axis (' tight ') #返回交叉验证的平均测试值, written in (Len (C_range), Len (gamma_range)) Form
                                                     Scores = grid.cv_results_[' Mean_test_score '].reshape (Len (C_range), Len (Gamma_range)) # Draw Heatmap of the validation accuracy as a function of Gamma and C # # The score a Re encoded as colors with the hot colormap which varies from dark # red to bright yellow. As the most interesting scores is all located in the #0.92 to 0.97 range we use a custom normalizer to set the mid-point to 0.92 so # as-make it easier to visualize the smal L Variations of score values in the # Interesting range and not brutally collapsing all the low score values to # the SA
Me color. #绘制热力图imshow Plt.figure (figsize= (8, 6)) #创建一个宽8英寸, height 6-inch figure Plt.subplots_adjust (left=.2, right=0.95, bottom=0.15, top= 0.95) Plt.imshow (scores, interpolation= ' nearest ', Cmap=plt.cm.hot, Norm=midpointnormalize (vmin=0.2, midpoint=0). ) Plt.xlabel (' Gamma ') Plt.ylabel (' C ') Plt.colorbar () plt.xticks (Np.arange (len (gamma_range)), Gamma_range, rotation=45) plt.yticks (Np.arange (len (c_range)), C_range) plt.title (' Validation accuracy ') plt.show ()



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.