Today read the Python language written using SMO in SVM optimization, using the RBF function for handwriting recognition, the following simple collation of the whole process and ideas, and then detailed the various parts.
(1) Acquiring training data sets Trainingmat and Labelmat;
(2) Optimization parameters Alphas and B are optimized using SMO, this step is to train to obtain the optimal parameters
(3) using Alphas and B into the RBF Gaussian kernel function to calculate the training set output and calculate the training error rate;
(4) Acquisition of test data sets Testmat and LABELMAT1;
(5) using the parameters of (2) Alphas and B into the RBF Gaussian kernel function to calculate the output, thus calculating the classification error rate.
def testdigits (ktup= (' RBF ', Ten)): Dataarr,labelarr=loadimages (' d://softwaretool/python/python_exercisecode/chap6_ Svm//trainingdigits '); B,alphas=smop (Dataarr,labelarr,200,0.0001,1000,ktup); Datamat=mat (Dataarr); Labelmat=mat (Labelarr). Transpose (); # Gets the index of the support vector Svind=nonzero (alphas. A>0) [0]; Svs=datamat[svind]; Labelsv=labelmat[svind]; Print ("There is", shape (SVs) [0], ' support Vectors '); M,n=shape (Datamat); errorcount=0.0; For I in range (m): Kernelevl=kerneltrans (Svs,datamat[i,:],ktup); # Calculation Output Formula Predict=kernelevl.t*multiply (Labelsv,alphas[svind]) +b; If sign (predict)!=sign (Labelmat[i]): errorcount+=1.0; Print ("The training error rate is:", errorcount/(len)); Dataarr,labelarr=loadimages (' d://softwaretool/python/python_exercisecode/chap6_svm//testdigits '); Datamat=mat (Dataarr); Labelmat=mat (Labelarr). Transpose (); M,n=shape (Datamat); errorcount=0.0; For I in range (m): KerneLeval=kerneltrans (Svs,datamat[i,:],ktup); Predict=kerneleval*multiply (Labelsv,alphas[svind]) +b; If sign (predict)!=sign (Labelmat[i]): errorcount+=1.0; Print ("The test error rate is:", Errorcount/float (ms));
Above is the entire main frame and main program.
The following modules describe each of the sections:
(1) Get training data sets and training tags:
If the training data is stored in the file trainingdigits, there are multiple. txt sub-files, each. txt file holds a 32*32 image, each image represents a number of 0-9;
If this picture shows the number 3, and finally converts each image into a 32*32=1024 column vector, if the number of training samples is m, then Datamat is the matrix of m*1024, Labelmat is the column vector of 1*m.
Let's talk about how to convert an image (example 32*32) into a column vector (1*1024) in Python: The function input is the filename of the image ' 3_177.txt '
Pseudo code: Initialize the column vector Returnvec to Zeors ((1,1024));
Traverse each line:
Reads the contents of each row (1*32 column vector);
Add each line of content to the Returnvec;
return to Returnvec;
The python of SVM handwritten recognition classification based on SMO-RBF