Python version for classification using LIBSVM

Source: Internet
Author: User
Tags svm

Preface: Recently in a multi-classification problem, the data format requirements with the LIBSVM accepted format is very similar, for the diagram convenient, try to use the LIBSVM, used python, then use Python version of it.

The its prerequisite of the work. LIBSVM Download: http://www.csie.ntu.edu.tw/~cjlin/libsvm/,Download libsvm that column, download LIBSVM package. can also be downloaded from the screen: HTTP://PAN.BAIDU.COM/S/1BNK5HR5, I use the Ubuntu 15.04 version, Python with its own, and installed in addition to Anaconda.

After decompression, read the README. There is a Python folder, this is the Python section, see the Python folder under the README . There is a detailed introduction inside.

Under this folder open terminal, make, compile, then enter Ipython or Python, enter the Python interactive interface, follow the Readme file instructions, run a processed LIBSVM data format heart Data Heart_ scale, to classify.

<span style= "FONT-SIZE:18PX;" >>>> from svmutil import *# Read data in Libsvm format>>> y, x = Svm_read_problem (' ... /heart_scale ') >>> m = Svm_train (y[:200], x[:200], '-C 4 ') >>> P_label, P_ACC, P_val = Svm_predict (y[200: ], x[200:], m) </span>
This is just a small demo. The key is also the various functions and parameter usage.

First, theSvm_read_problem () function, as the name implies, is the function used to read the data, the data format needs to be LIBSVM acceptable format, and this function returns two values, Y is the first column label, which is the category. X is the following feature column. Data format, Bo friends said very clearly:

#=====================================

1) data format used by LIBSVM

The training data and the test data file format used by the software are as follows:

[Label] [Index1]:[value1] [index2]:[value2] ...

[Label] [Index1]:[value1] [index2]:[value2] ...

One line of record data, as follows (see Libsvm-3.1/heart_scale):

+1 1: 0.708 2: 1 3: 14: -0.3205: -0.1056:-1

Note: Because of the programming reason, after the last value of each line of data,

You must also add a space ' or tab ' \ t ' to return to the next line to continue entering the next piece of data.

The last line of data must also be followed by a space ' or tab ' \ t ' after the last value!!!

Here (x, Y)-((0.708,1,1, -0.320, -0.105,-1), + 1) (PS: What does that mean?)  Didn't get it!! )

The label or class is the kind you want to classify, usually a number of integers.

Index is an indexed, usually sequential, sequence of integers.

Value is the data used to train, usually a bunch of real numbers.

#=======================================================

Second, thesvm_train () function. The best document is Help (Svm_train):

Help on function Svm_train in module Svmutil:svm_train (arg1, Arg2=none, Arg3=none) svm_train (y, x [, Options]) mo del | ACC | MSE Svm_train (prob [, Options])-model | ACC | MSE svm_train (prob, param)-model | acc|    MSE Train an SVM model from data (y, x) or a svm_problem prob using ' options ' or an svm_parameter param. If '-V ' is specified in ' options ' (i.e., cross validation) either accuracy (ACC) or mean-squared error (MSE) is Returne D. Options:-S Svm_type:set type of SVM (default 0) 0--C-svc (Multi-Class Classifica tion) 1--NU-SVC (Multi-Class classification) 2--One-class SVM 3--Epsilo N-svr (regression) 4--NU-SVR (regression)-T kernel_type:set type of kernel funct Ion (default 2) 0--Linear:u ' *v 1--Polynomial: (gamma*u ' *v + coef0) ^degree 2--Radia L Basis Function:exp (-gamma*|u-v|^2) 3--Sigmoid:tanh (gamma*u ' *v + coef0) 4--precomputed kernel (kernel values in Training_set_f ile)-D degree:set degree in kernel function (default 3)-G gamma:set gamma in kernel function (default 1/num_features)-R coef0:set coef0 in kernel function (default 0)-C cost:set the parameter C of C-svc,         Epsilon-svr, and Nu-svr (default 1)-n nu:set the parameter nu of nu-svc, One-class SVM, and Nu-svr (default 0.5) -P Epsilon:set The Epsilon in loss function of EPSILON-SVR (default 0.1)-M Cachesize:set cache memory Size in MB (default)-e epsilon:set tolerance of termination criterion (default 0.001)-H shrinking: Whether to use the shrinking heuristics, 0 or 1 (default 1)-B probability_estimates:whether to train a SVC or SV R model for probability estimates, 0 or 1 (default 0)-wi weight:set the parameter C of Class I to weight*c, for       C-svc (default 1) -V n:n-fold Cross validation mode-q: Quiet mode (no outputs) 
The above is an English document. Chinese documents have Bo friends also write very clearly, directly reproduced here:

#=================================

The options have the following meanings:

-S SVM type: Sets the SVM type, the default value is 0, and the optional type is:

0--C-svc

1--Nu-svc

2--ONE-CLASS-SVM

3--E-svr

4--Nu-svr

-T kernel function type: Sets the kernel function type, the default value is 2, the optional type is:

0--Linear Core: U ' *v

1--Polynomial nucleus: (g*u ' *v+ coef0) degree

2--RBF Core: exp (-| | u-v| | *|| u-v| | /G*G)

3--sigmoid core: Tanh (g*u ' *v+ coef 0)

-D Degree: the degree setting in the kernel function, the default value is 3;

-G R (Gama): function setting in kernel function (default 1/k);

-R COEF 0: sets the COEF0 in the kernel function, the default value is 0;

-C Cost: Set C-svc, E-svr, n-svr from the penalty factor C, the default value is 1;

-N nu: Set nu-svc, ONE-CLASS-SVM and Nu-svr parameters nu, default value 0.5;

-P E: Kernel width, set e-svr in the loss function of E, the default value is 0.1;

-M CacheSize: Sets the cache memory size in megabytes (default 40):

-E: Sets the tolerable deviation in the termination criteria, the default value is 0.001;

-H Shrinking: Whether heuristic is used, optional value is 0 or 1, default value is 1;

-B probability estimate: whether to calculate the probability estimate of Svc or SVR, optional value 0 or 1, default 0;

-wi weight: The penalty coefficient C weighted for each kind of sample, the default value is 1;

-V N/A fold cross-validation mode.

where k in the-G option refers to the number of attributes in the input data. Operation Parameter-v randomly splits the data into N parts and calculates the cross-check accuracy and RMS error. These parameter settings can be any combination of the type of SVM and the parameters supported by the kernel function, and the program will not accept the parameter if it does not have an effect on the function or SVM type, and the parameter will take the default value if the expected parameter is set incorrectly. Training_set_file is the data set to be trained, and Model_file is the model file that is generated after training, which if not set will be the default file name, or it can be set to its own customary file name.

#==================================

Once again, the key features of the problem, I do the NLP of the war five slag research monk, the data format and text have the following examples:

<span style= "FONT-SIZE:18PX;" > for the    Queen Yang mi    zhengshuang    Yang Mi Zhao Rmb Zhang Ziyi Zhengshuang who is well-deserved the most beautiful beauty of the former rival Vivian Chow mother Vivian Chow    rival Zhang Mao married foreign boyfriend    Ling Flower    Zeng Yi    Zeng Yi Ken Flower between friendship and love </span>
colleagues of the pot friends should be able to see what is the main halogen, yes, yes, you guessed it: The news headlines to extract the star name and the celebrity 1 and celebrity 2 relationship between the classification. To stop using words, to find out the characteristics of the class, using Chi-square features and so on. I'm still doing it for the time being. ~,~///.

Finally, to be continued ...

#==============================

Reference:

1.http://www.csie.ntu.edu.tw/~cjlin/libsvm/

2.http://blog.csdn.net/meredith_leaf/article/details/6714144

3.http://shiyanjun.cn/archives/548.html

Python version for classification using LIBSVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.