Principle and practice of SVM

Last Update:2016-12-26 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The rapid development and improvement of SVM shows many unique advantages in solving small-sample, nonlinear and high-dimensional pattern recognition problems, and can be applied to other machine learning problems such as function fitting. From this rapid development, already in many fields (bioinformatics, text and handwriting recognition, etc.) have been successfully applied. The solution of nonlinear inversion in geophysical inversion also has significant effect, for example (SVM in predicting groundwater gushing water problem, etc.).

One of the highlights of SVM is that the dual theory is proposed in the traditional optimization problem, which mainly has the maximum minimum duality and Lagrange duality.

The key to SVM is the kernel function. Low-dimensional vector sets are often difficult to partition, and the solution is to map them to high-dimensional spaces. But the difficulty with this approach is the increase in computational complexity, and the kernel function just cleverly solves the problem. In other words, the classification function of high-dimensional space can be obtained by using proper kernel function. .

After confirming the kernel function, due to certain errors in the known data of the kernel function, the relaxation coefficient and the penalty coefficient of two parametric (note: w is still unique after the introduction of the relaxation variable), considering the generalization problem. is not the only one) to be corrected. On the basis of confirming the kernel function, the two coefficients are determined by a large number of contrast experiments, the research is basically completed, suitable for the relevant disciplines or business applications, and has a certain ability to promote.

Principle

SVM Advantages and Disadvantages:

SVM Advantages:
(1) Inner product kernel function: nonlinear mapping is the theoretical basis of SVM method, and SVM uses kernel function instead of nonlinear mapping to high-dimensional space.
(2) maximizing the classification marginal: the optimal hyper-plane for the feature space is the goal of SVM, and the idea of maximizing the classification marginal is the core of SVM method.
(3) Support Vectors : support vectors are the training results of SVM, and support vectors are the determinant of SVM classification decision. The final decision function of SVM is determined by only a few support vectors, and the complexity of the computation depends on the number of support vectors, not the dimension of the sample space, which avoids the "dimension catastrophe" in a sense.
(4) A novel small sample learning method based on a solid theoretical basis: SVM is a novel small sample learning method with solid theoretical basis. It basically does not involve the probability measure and the law of large numbers, so it is different from the existing statistical methods. In essence, it avoids the traditional process from induction to deduction, and realizes the efficient "transduction inference" from training samples to forecast samples. Greatly simplifies the usual classification and regression problems;
(5) Robustness: A few support vectors determine the final result, which not only helps us to seize the key samples, "eliminate" a large number of redundant samples, but also doomed to this method not only the algorithm is simple, but also has a good "robust" nature. This "robust" sex is mainly reflected in: ① increase, The deletion of non-support vector samples has no effect on the model; ② support Vector Sample set has certain robustness; ③ Some successful applications, SVM method is insensitive to kernel selection.

Insufficient:
(1) SVM algorithm is difficult to implement for large-scale training samples:
Since SVM solves support vectors with two-times programming, and the two-time plan will involve the calculation of M-order matrices (M is the number of samples), the storage and computation of the matrix will consume a lot of machine memory and computational time when the number of M is large. The main improvements to the above problems are the J.platt SMO algorithm, T. The SOR algorithm of Joachims SVM, C.j.c.burges, PCGC, Zhang Xue CSVM and O.l.mangasarian, etc.
(2) It is difficult to solve multi-classification problem with SVM:
The classical SVM algorithm only gives two kinds of classification algorithms, and in the actual application of data mining, it is generally necessary to solve the multi-class classification problem. It can be solved by a combination of two classes of SVM. There are mainly one-to-many combination modes, one-to Again, it is solved by constructing a combination of multiple classifiers. The main principle is to overcome the inherent shortcomings of SVM, combined with the advantages of other algorithms, to solve the classification accuracy of multi-class problems. For example, combining with rough set theory, a combination classifier of multi-class problem with complementary advantages is formed.

Code

################################### #R包 ###################################

R the function package e1071 provides a LIBSVM the interface. Use the SVM function in the e1071 package to get the same results as LIBSVM. WRITE.SVM () is also able to write the results of R training as a standard LIBSVM format for use in other environments LIBSVM . Let's look at the use of the SVM () function. There are two types of formats available.

SVM (formula,data=null,..., subset,na.action=na.omit,sacle=true)

or SVM (x, y = null, scale = TRUE, type = NULL, kernel = "radial", Degree = 3, gamma = if (Is.vector (x)) 1 Else 1/ncol ( x), COEF0 = 0, cost = 1, Nu = 0.5, class.weights = NULL, CacheSize = x, tolerance = 0.001, epsilon = 0.1, shrinking = TR UE, cross = 0, probability = FALSE, fitted = TRUE, ..., subset, na.action = Na.omit)

The main parameters are described as follows:

formula: The classification model form, which can be understood as y~x in the second expression , that is, y equivalent to a label, X equivalent to the characteristic (variable) data: data frame. Subset: You can specify a portion of the dataset as the training data.

na.cation : Missing value processing, the default is to delete missing data.

Scale : standardize and center your data to a value of 0 , the variance is 1 , it will be executed automatically.

TYPE:SVM the form.

is: c-classification , Nu-classification , one-classification (for novelty detection) , Eps-regression , Nu-regression five different forms. The latter two are used to make the return. The default is the C classifier.

Kernel: in nonlinear ticks, we introduce kernel functions to do.

R The following kernel functions are provided in

linear Core: U ' *v polynomial cores:

(gamma*u ' *v + coef0) ^degree

Gaussian core: exp (-gamma*|u-v|^2)

sigmoid Core: Tanh (gamma*u ' *v + coef0) The default is the Gaussian kernel. Incidentally, you can customize the kernel function in the kernel package.

Degree: number of polynomial cores, default is 3

Gamma: except for the linear core, the parameters of the other cores are 1 by default . number of data dimensions

COEF0: polynomial nuclei and sigmoid the parameter of the kernel, which defaults to 0.

Cost:c classification of penalty item C the value

Nu:nu category, single category in Nu the value

Cross : do k Fold cross-validation, calculate classification correctness.

We still use Iris data set to do SVM classification.

As follows

> Data (IRIS)

> Ir<-iris

> Set.seed (124)

> Count.test<-round (runif (50,1,150))

> Test<-ir[count.test,]

> Library (e1071)

>SV<-SVM (species~.,data=ir,cross=5,type= ' c-classification ', kernel= ' sigmoid ')

> Summary (SV) # view support vector machine SV the specific information found to do 5 the correct rate of double cross-validation is 92%

> Pre<-predict (sv,test) # make predictions for the test sample. The pre is a class vector.

> Dim (Test[test$species!=pre,]) [1]/dim (test) [1] # calculation error rate [1] 0.06 we find that the error rate is 6%

################################# #MATLAB ####################################

% Main function

Clear all;

Close all;

C = 10;

Kertype = ' linear ';

% Training Samples

n = 50;

RANDN (' state ', 6);

x1 = RANDN (2,n); %2 row n column Matrix

y1 = ones (1,n); %1*n a 1

x2 = 5+randn (2,n); %2*n Matrix

y2 =-ones (1,n); %1*n A-1

Figure

Plot (x1 (1,:), X1 (2,:), ' bx ', x2 (1,:), X2 (2,:), ' K. ');

Axis ([-3 8-3 8]);

Hold on;

X = [X1,X2]; % Training sample d*n matrix, n is the number of samples, D is the number of eigenvectors

Y = [Y1,y2]; % training target 1*n Matrix, n is the number of samples, the value is +1 or 1

SVM = Svmtrain (x,y,kertype,c);

Plot (SVM. XSV (1,:), SVM. XSV (2,:), ' ro ');

% Test

[X1,X2] = Meshgrid ( -2:0.05:7,-2:0.05:7); %x1 and x2 are 181*181 matrices.

[Rows,cols] = size (x1);

NT = Rows*cols;

Xt = [Reshape (x1,1,nt); reshape (x2,1,nt)];

Yt = ones (1,nt);

result = Svmtest (SVM, Xt, Yt, Kertype);

Yd = Reshape (result. Y,rows,cols);

Contour (x1,x2,yd, ' m ');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function SVM = Svmtrain (x,y,kertype,c)

options = optimset; The% options are vectors used to control the option parameters of the algorithm

Options. Largescale = ' off ';

Options. Display = ' off ';

n = Length (Y);

H = (Y ' *y). *kernel (X,x,kertype);

f =-ones (n,1);%f to 1*n a -1,f equivalent to Quadprog C in the function

A = [];

b = [];

aeq = Y;% equivalent to Quadprog a1,b1 in a function

beq = 0;

lb = zeros (n,1); equivalent to Quadprog lb in the function , UB

UB = C*ones (n,1);

a0 = zeros (n,1); % A0 is the initial approximate value of the solution

[A,fval,exitflag,output,lambda] = Quadprog (h,f,a,b,aeq,beq,lb,ub,a0,options);

Epsilon = 1e-8;

Sv_label = Find (ABS (a) >epsilon); %0<a<a (max) thinks that x is a support vector

SVM.A = A (Sv_label);

Svm. XSV = X (:, Sv_label);

Svm. YSV = Y (Sv_label);

Svm.svnum = Length (Sv_label);

%svm.label = Sv_label;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function result = Svmtest (SVM, Xt, Yt, Kertype)

temp = (SVM.A '. *svm. YSV) *kernel (SVM. XSV,SVM. Xsv,kertype);

Total_b = SVM. Ysv-temp;

b = mean (total_b);

W = (SVM.A '. *svm. YSV) *kernel (SVM. Xsv,xt,kertype);

Result.score = w + b;

Y = sign (w+b);

Result. y = y;

result.accuracy = Size (Find (Y==YT))/size (Yt);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function K = Kernel (x,y,type)

%x Number of dimensions * number

Switch type

Case ' linear '

K = X ' *y;

Case ' RBF '

Delta = 5;

Delta = Delta*delta;

XX = SUM (X '. *x ', 2);

YY = SUM (Y '. *y ', 2);

XY = X ' *y;

K = ABS (Repmat (xx,[1 size (yy,1)) + Repmat (YY ', [Size (xx,1) 1])-2*XY);

K = exp (-k./delta);

End

Principle and practice of SVM

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More