Python Machine Learning Theory and Practice (6) Support Vector Machine and python Learning Theory

Last Update:2018-01-24 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python Machine Learning Theory and Practice (6) Support Vector Machine and python Learning Theory

In the previous section, the theory of SVM is basically pushed down, and the goal of finding the maximum interval is finally converted to the problem of solving the alpha of the Child variable of the Laplace multiplication. After finding the alpha, the SVM weight W can be obtained, with the weight, the maximum distance is obtained. However, in the previous section, we assume that the training set is linearly segmented, and the obtained alpha value is in [0, infinite]. But what if the data is not linearly segmented? In this case, we need to allow some samples to bypass the classifier, so that the optimized target function will remain unchanged. Just introduce the relaxation variable, which indicates the cost of the wrong classification sample point, if the classification is correct, it is equal to 0. If the classification is incorrect, Tn indicates the real label-1 or 1 of the sample. In the previous section, we set the distance from the support vector to the classifier to 1. Therefore, the distance between the two types of support vectors must be greater than 1. When the classification is incorrect, it must be greater than 1, as shown in Figure 5) (here, the formula and Icon serial number are all connected to the previous section ).

(Figure 5)

In this case, the cost of the wrong category exists. We add the cost of this wrong category to the objective function in the previous section (formula 4) to obtain the form (formula 8:

(Formula 8)

Repeat the step of the method described in the previous section, and obtain (formula 9 ):

(Formula 9)

With an Un multiplier added, of course, our job is to continue to solve this objective function, continue to repeat the steps in the previous section, and obtain the result (formula 10 ):

(Formula 10)

And because alpha is greater than 0 and Un is greater than 0, so 0 <alpha <C. To make the explanation clearer, let's (formula 9) the KKT condition of is also released (the third type of Optimization Problem in the previous section). Note that Un is greater than or equal to 0:

Until now, the form of the optimization function has not changed. It only adds the value of an incorrect classification, but with one more condition, 0 <alpha <C, C is a constant, its function is to control the maximum spacing when error classification is allowed. If it is too large, it will lead to over-fitting. If it is too small, it will lead to underfitting. The next step seems to be something everyone should know, adding a restriction for a C constant, and then continuing to use the SMO algorithm to optimize and solve the secondary plan, however, I want to continue to talk about the kernel function once. If the sample is linear and inseparable, after the kernel function is introduced, the sample can be linearly divided by ing to a high-dimensional space, for example (figure 6) linear samples:

(Figure 6)

In (figure 6), the existing samples are obviously linear but we add some different operations between the existing samples X, such as (figure 6) is it better to make f a new sample (or a new feature), as shown on the right? Now we have projected X to the high dimension, but we do not know f. Now the kernel function should be on the stage. Taking the Gaussian Kernel function as an example, in (Figure 7) use the kernel function to calculate f using several sample points as the benchmark, as shown in Figure 7:

(Figure 7)

In this way, f is available, and the kernel function is equivalent to a measurement of the X and baseline of the sample, and the weight is reduced to form a new feature f dependent on x, put f in the SVM mentioned above, continue to solve alpha, and then obtain the weight. The principle is simple. In order to seem academic, add the core function to the target function, as shown in formula 11:

(Formula 11)

K (Xn, Xm) is a kernel function. Compared with the above objective function, it does not change much. Just use SMO to optimize and solve the problem. The Code is as follows:

def smoPK(dataMatIn, classLabels, C, toler, maxIter): #full Platt SMO  oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler)  iter = 0  entireSet = True; alphaPairsChanged = 0  while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):   alphaPairsChanged = 0   if entireSet: #go over all    for i in range(oS.m):       alphaPairsChanged += innerL(i,oS)     print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   else:#go over non-bound (railed) alphas    nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]    for i in nonBoundIs:     alphaPairsChanged += innerL(i,oS)     print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   if entireSet: entireSet = False #toggle entire set loop   elif (alphaPairsChanged == 0): entireSet = True   print "iteration number: %d" % iter  return oS.b,oS.alphas

The following is a small example of handwriting recognition.

(1) Collect data: Provide text files

(2) prepare data: Construct Vectors based on binary images

(3) analysis data: visual inspection of image vectors

(4) training algorithm: Two Different kernel functions are used, and different settings are used for radial basis functions to run the SMO algorithm.

(5) Test Algorithm: compile a function to test different kernel functions and calculate the error rate.

(6) using algorithms: the complete application of image recognition still requires some image processing. This demo is omitted.

The complete code is as follows:

from numpy import * from time import sleep  def loadDataSet(fileName):  dataMat = []; labelMat = []  fr = open(fileName)  for line in fr.readlines():   lineArr = line.strip().split('\t')   dataMat.append([float(lineArr[0]), float(lineArr[1])])   labelMat.append(float(lineArr[2]))  return dataMat,labelMat  def selectJrand(i,m):  j=i #we want to select any J not equal to i  while (j==i):   j = int(random.uniform(0,m))  return j  def clipAlpha(aj,H,L):  if aj > H:   aj = H  if L > aj:   aj = L  return aj  def smoSimple(dataMatIn, classLabels, C, toler, maxIter):  dataMatrix = mat(dataMatIn); labelMat = mat(classLabels).transpose()  b = 0; m,n = shape(dataMatrix)  alphas = mat(zeros((m,1)))  iter = 0  while (iter < maxIter):   alphaPairsChanged = 0   for i in range(m):    fXi = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b    Ei = fXi - float(labelMat[i])#if checks if an example violates KKT conditions    if ((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)):     j = selectJrand(i,m)     fXj = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[j,:].T)) + b     Ej = fXj - float(labelMat[j])     alphaIold = alphas[i].copy(); alphaJold = alphas[j].copy();     if (labelMat[i] != labelMat[j]):      L = max(0, alphas[j] - alphas[i])      H = min(C, C + alphas[j] - alphas[i])     else:      L = max(0, alphas[j] + alphas[i] - C)      H = min(C, alphas[j] + alphas[i])     if L==H: print "L==H"; continue     eta = 2.0 * dataMatrix[i,:]*dataMatrix[j,:].T - dataMatrix[i,:]*dataMatrix[i,:].T - dataMatrix[j,:]*dataMatrix[j,:].T     if eta >= 0: print "eta>=0"; continue     alphas[j] -= labelMat[j]*(Ei - Ej)/eta     alphas[j] = clipAlpha(alphas[j],H,L)     if (abs(alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; continue     alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j])#update i by the same amount as j                   #the update is in the oppostie direction     b1 = b - Ei- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[i,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[i,:]*dataMatrix[j,:].T     b2 = b - Ej- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[j,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[j,:]*dataMatrix[j,:].T     if (0 < alphas[i]) and (C > alphas[i]): b = b1     elif (0 < alphas[j]) and (C > alphas[j]): b = b2     else: b = (b1 + b2)/2.0     alphaPairsChanged += 1     print "iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)   if (alphaPairsChanged == 0): iter += 1   else: iter = 0   print "iteration number: %d" % iter  return b,alphas  def kernelTrans(X, A, kTup): #calc the kernel or transform data to a higher dimensional space  m,n = shape(X)  K = mat(zeros((m,1)))  if kTup[0]=='lin': K = X * A.T #linear kernel  elif kTup[0]=='rbf':   for j in range(m):    deltaRow = X[j,:] - A    K[j] = deltaRow*deltaRow.T   K = exp(K/(-1*kTup[1]**2)) #divide in NumPy is element-wise not matrix like Matlab  else: raise NameError('Houston We Have a Problem -- \  That Kernel is not recognized')  return K  class optStruct:  def __init__(self,dataMatIn, classLabels, C, toler, kTup): # Initialize the structure with the parameters   self.X = dataMatIn   self.labelMat = classLabels   self.C = C   self.tol = toler   self.m = shape(dataMatIn)[0]   self.alphas = mat(zeros((self.m,1)))   self.b = 0   self.eCache = mat(zeros((self.m,2))) #first column is valid flag   self.K = mat(zeros((self.m,self.m)))   for i in range(self.m):    self.K[:,i] = kernelTrans(self.X, self.X[i,:], kTup)    def calcEk(oS, k):  fXk = float(multiply(oS.alphas,oS.labelMat).T*oS.K[:,k] + oS.b)  Ek = fXk - float(oS.labelMat[k])  return Ek    def selectJ(i, oS, Ei):   #this is the second choice -heurstic, and calcs Ej  maxK = -1; maxDeltaE = 0; Ej = 0  oS.eCache[i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E  validEcacheList = nonzero(oS.eCache[:,0].A)[0]  if (len(validEcacheList)) > 1:   for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E    if k == i: continue #don't calc for i, waste of time    Ek = calcEk(oS, k)    deltaE = abs(Ei - Ek)    if (deltaE > maxDeltaE):     maxK = k; maxDeltaE = deltaE; Ej = Ek   return maxK, Ej  else: #in this case (first time around) we don't have any valid eCache values   j = selectJrand(i, oS.m)   Ej = calcEk(oS, j)  return j, Ej  def updateEk(oS, k):#after any alpha has changed update the new value in the cache  Ek = calcEk(oS, k)  oS.eCache[k] = [1,Ek]    def innerL(i, oS):  Ei = calcEk(oS, i)  if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)):   j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand   alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy();   if (oS.labelMat[i] != oS.labelMat[j]):    L = max(0, oS.alphas[j] - oS.alphas[i])    H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])   else:    L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)    H = min(oS.C, oS.alphas[j] + oS.alphas[i])   if L==H: print "L==H"; return 0   eta = 2.0 * oS.K[i,j] - oS.K[i,i] - oS.K[j,j] #changed for kernel   if eta >= 0: print "eta>=0"; return 0   oS.alphas[j] -= oS.labelMat[j]*(Ei - Ej)/eta   oS.alphas[j] = clipAlpha(oS.alphas[j],H,L)   updateEk(oS, j) #added this for the Ecache   if (abs(oS.alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; return 0   oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#update i by the same amount as j   updateEk(oS, i) #added this for the Ecache     #the update is in the oppostie direction   b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,i] - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[i,j]   b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,j]- oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[j,j]   if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1   elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2   else: oS.b = (b1 + b2)/2.0   return 1  else: return 0  def smoP(dataMatIn, classLabels, C, toler, maxIter,kTup=('lin', 0)): #full Platt SMO  oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler, kTup)  iter = 0  entireSet = True; alphaPairsChanged = 0  while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):   alphaPairsChanged = 0   if entireSet: #go over all    for i in range(oS.m):       alphaPairsChanged += innerL(i,oS)     print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   else:#go over non-bound (railed) alphas    nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]    for i in nonBoundIs:     alphaPairsChanged += innerL(i,oS)     print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   if entireSet: entireSet = False #toggle entire set loop   elif (alphaPairsChanged == 0): entireSet = True   print "iteration number: %d" % iter  return oS.b,oS.alphas  def calcWs(alphas,dataArr,classLabels):  X = mat(dataArr); labelMat = mat(classLabels).transpose()  m,n = shape(X)  w = zeros((n,1))  for i in range(m):   w += multiply(alphas[i]*labelMat[i],X[i,:].T)  return w  def testRbf(k1=1.3):  dataArr,labelArr = loadDataSet('testSetRBF.txt')  b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) #C=200 important  datMat=mat(dataArr); labelMat = mat(labelArr).transpose()  svInd=nonzero(alphas.A>0)[0]  sVs=datMat[svInd] #get matrix of only support vectors  labelSV = labelMat[svInd];  print "there are %d Support Vectors" % shape(sVs)[0]  m,n = shape(datMat)  errorCount = 0  for i in range(m):   kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1))   predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b   if sign(predict)!=sign(labelArr[i]): errorCount += 1  print "the training error rate is: %f" % (float(errorCount)/m)  dataArr,labelArr = loadDataSet('testSetRBF2.txt')  errorCount = 0  datMat=mat(dataArr); labelMat = mat(labelArr).transpose()  m,n = shape(datMat)  for i in range(m):   kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1))   predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b   if sign(predict)!=sign(labelArr[i]): errorCount += 1   print "the test error rate is: %f" % (float(errorCount)/m)    def img2vector(filename):  returnVect = zeros((1,1024))  fr = open(filename)  for i in range(32):   lineStr = fr.readline()   for j in range(32):    returnVect[0,32*i+j] = int(lineStr[j])  return returnVect  def loadImages(dirName):  from os import listdir  hwLabels = []  trainingFileList = listdir(dirName)   #load the training set  m = len(trainingFileList)  trainingMat = zeros((m,1024))  for i in range(m):   fileNameStr = trainingFileList[i]   fileStr = fileNameStr.split('.')[0]  #take off .txt   classNumStr = int(fileStr.split('_')[0])   if classNumStr == 9: hwLabels.append(-1)   else: hwLabels.append(1)   trainingMat[i,:] = img2vector('%s/%s' % (dirName, fileNameStr))  return trainingMat, hwLabels   def testDigits(kTup=('rbf', 10)):  dataArr,labelArr = loadImages('trainingDigits')  b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup)  datMat=mat(dataArr); labelMat = mat(labelArr).transpose()  svInd=nonzero(alphas.A>0)[0]  sVs=datMat[svInd]  labelSV = labelMat[svInd];  print "there are %d Support Vectors" % shape(sVs)[0]  m,n = shape(datMat)  errorCount = 0  for i in range(m):   kernelEval = kernelTrans(sVs,datMat[i,:],kTup)   predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b   if sign(predict)!=sign(labelArr[i]): errorCount += 1  print "the training error rate is: %f" % (float(errorCount)/m)  dataArr,labelArr = loadImages('testDigits')  errorCount = 0  datMat=mat(dataArr); labelMat = mat(labelArr).transpose()  m,n = shape(datMat)  for i in range(m):   kernelEval = kernelTrans(sVs,datMat[i,:],kTup)   predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b   if sign(predict)!=sign(labelArr[i]): errorCount += 1   print "the test error rate is: %f" % (float(errorCount)/m)   '''''#######******************************** Non-Kernel VErsions below '''#######********************************  class optStructK:  def __init__(self,dataMatIn, classLabels, C, toler): # Initialize the structure with the parameters   self.X = dataMatIn   self.labelMat = classLabels   self.C = C   self.tol = toler   self.m = shape(dataMatIn)[0]   self.alphas = mat(zeros((self.m,1)))   self.b = 0   self.eCache = mat(zeros((self.m,2))) #first column is valid flag    def calcEkK(oS, k):  fXk = float(multiply(oS.alphas,oS.labelMat).T*(oS.X*oS.X[k,:].T)) + oS.b  Ek = fXk - float(oS.labelMat[k])  return Ek    def selectJK(i, oS, Ei):   #this is the second choice -heurstic, and calcs Ej  maxK = -1; maxDeltaE = 0; Ej = 0  oS.eCache[i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E  validEcacheList = nonzero(oS.eCache[:,0].A)[0]  if (len(validEcacheList)) > 1:   for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E    if k == i: continue #don't calc for i, waste of time    Ek = calcEk(oS, k)    deltaE = abs(Ei - Ek)    if (deltaE > maxDeltaE):     maxK = k; maxDeltaE = deltaE; Ej = Ek   return maxK, Ej  else: #in this case (first time around) we don't have any valid eCache values   j = selectJrand(i, oS.m)   Ej = calcEk(oS, j)  return j, Ej  def updateEkK(oS, k):#after any alpha has changed update the new value in the cache  Ek = calcEk(oS, k)  oS.eCache[k] = [1,Ek]    def innerLK(i, oS):  Ei = calcEk(oS, i)  if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)):   j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand   alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy();   if (oS.labelMat[i] != oS.labelMat[j]):    L = max(0, oS.alphas[j] - oS.alphas[i])    H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])   else:    L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)    H = min(oS.C, oS.alphas[j] + oS.alphas[i])   if L==H: print "L==H"; return 0   eta = 2.0 * oS.X[i,:]*oS.X[j,:].T - oS.X[i,:]*oS.X[i,:].T - oS.X[j,:]*oS.X[j,:].T   if eta >= 0: print "eta>=0"; return 0   oS.alphas[j] -= oS.labelMat[j]*(Ei - Ej)/eta   oS.alphas[j] = clipAlpha(oS.alphas[j],H,L)   updateEk(oS, j) #added this for the Ecache   if (abs(oS.alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; return 0   oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#update i by the same amount as j   updateEk(oS, i) #added this for the Ecache     #the update is in the oppostie direction   b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[i,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[i,:]*oS.X[j,:].T   b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[j,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[j,:]*oS.X[j,:].T   if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1   elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2   else: oS.b = (b1 + b2)/2.0   return 1  else: return 0  def smoPK(dataMatIn, classLabels, C, toler, maxIter): #full Platt SMO  oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler)  iter = 0  entireSet = True; alphaPairsChanged = 0  while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):   alphaPairsChanged = 0   if entireSet: #go over all    for i in range(oS.m):       alphaPairsChanged += innerL(i,oS)     print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   else:#go over non-bound (railed) alphas    nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]    for i in nonBoundIs:     alphaPairsChanged += innerL(i,oS)     print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged)    iter += 1   if entireSet: entireSet = False #toggle entire set loop   elif (alphaPairsChanged == 0): entireSet = True   print "iteration number: %d" % iter  return oS.b,oS.alphas

The running result is shown in figure 8:

(Figure 8)

If you are interested in the above code, you can read it. If you use it, we recommend using libsvm.

References:

[1] machine learning in action. PeterHarrington

[2] pattern recognition and machinelearning. Christopher M. Bishop

[3] machine learning. Andrew Ng

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More