LIBSVM and Liblinear respective features and use experience

Source: Internet
Author: User
Tags svm

LIBSVM and Liblinear were developed by Dr. Chih-jen Lin, of the National Taiwan University, and LIBSVM was mainly used for the generation of nonlinear SVM classifiers, which was proposed for some time, and Liblinear was created last year. The main response is to large-scale data classification, because the training of linear classifier is much lower than the training computational complexity of the nonlinear classifier, the time is much less, and the performance and non-linear classifier performance on the large scale data is quite , so the liblinear is for big data.


Both are a cross-platform common tool Library, support Windows/linux/mac OS, the code itself is written in C + +, but also has matlab,python,java,c/c++ extension interface, convenient for different language environment, can be said to be the first choice for scientific research and enterprise personnel. Like me in the general school with Matlab/c++, and my classmates in Baidu is mainly used is python/c++, so just their respective focus is not the same, but the core of the use is its SVM library.


The above LIBSVM and Liblinear's homepage has the binary file download under Windows, the Zip,tar format has, after decompression, find the MATLAB sub-file directory, see the Readme file inside, need to enter this directory in MATLAB, Run the make.m file, Matlab will build the. c file into a. mexw32 file (since I am a 32-bit operating system, this is MEXW32, for a 64-bit OS, mexw64), according to the native default C/D + + compiler, providing interfaces that can be used under MATLAB. After you have generated these. mexw32 files into your own MATLAB engineering root directory, you can call the functions in the Libsvm/liblinear library in the Matlab file ~


Http://blog.sina.com.cn/s/blog_5bd2cb260100ev25.html this netizen to LIBSVM in MATLAB in the use of the description is very detailed, you can refer to the next.


The respective advantages of liblinear and LIBSVM can be summarized as follows:

1.LIBSVM is used to solve general-purpose classification problems

2.liblinear is mainly for linear model design of large scale data

It able to handle large-scaled datasets can be used to handle large-scale data it runs really faster than LIBSVM because it doesn ' t has to co Mpute Thekernel for any of the points because of the linear core, you do not need to calculate the kernel value, faster trust region method for optimization looks new for Mac Hine Learning People

The following is a netizen using Liblinear for data classification of experimental performance description "

"Today I tried the following liblinear, fast (I didn't think of it),
My experimental data:
Training set: 21504 * 1500 (1500 is the number of samples, 21504 is the dimension)
Test set: 21504 * 2985
The speed is measured in seconds, with 20 experiments totaling less than 2 minutes.

The same problem I used LIBSVM experiment speed difference is too big, LIBSVM experiment 5 times, each time nearly 10 minutes, time is second, found a problem is, LIBSVM than liblinear results of 1%, did not read liblinear article, I do not know the problem in that place, LIBSVM I directly use the default parameters, linear model. This inevitably causes a problem, if I want to evaluate the performance of the linear model and the non-linear model, I can not use a liblinear one with LIBSVM, if two are used LIBSVM, reported performance certainly have some problems.

So if your problem dimension is large (the linear model has very good performance), consider Liblinear. “



A general look at the LIBSVM and liblinear documentation, found that a problem is that the objective function on the linear problem is not the same, so the performance difference is normal, it should be said that if the optimization of the same target function should be similar performance, but the speed is obvious, Liblinear a lot faster.

For the question of when to use the linear model, I think the above example of me with linear classifier is better, the nonlinear classification is not necessarily better than the linear classifier, especially in the case of the sample and its finite, while the characteristic dimension is very high, because the sample is limited, the kernel Map is often inaccurate, and it is possible to mistakenly classify a category space, which can result in worse results than a linear model.

Speaking of scale, I suggest not to use the scale of the LIBSVM, because once the tool is used, it will be the original sparse data into a non-sparse format, which will not only generate very large data files, and liblinear to the sparse data fast processing advantages can not be reflected. Therefore, to scale, write one yourself, in order to keep the original sparse format Liblinear The advantage is fast, especially for sparse characteristics. The disadvantage is that you eat too much memory. 10G of data needs to be close to 50G of memory, the amount of data can not be large enough to do.


In addition, there is a frequently mentioned SVM library svm-per:http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html designed by people at Cornell University. It seems that the performance requirements for computer hardware are lower than liblinear ... People with image processing use this svm-per instead of liblinear.


In addition, for multi-classification problems and the selection of kernel functions, the following empirical rules can be used for reference:

If the number of features is much larger than the number of samples, the use of a linear core is possible. If the number of features and the number of samples are large, such as document classification, generally using a linear kernel, liblinear is much faster than the LIBSVM speed. If the number of features is much smaller than the number of samples, this situation generally uses RBF. However, if you must use a linear core, select Liblinear is preferable and use the-S 2 option.

For multi-classification issues:

For class 15 scenarios, 100 training images per class, if a 15 class of multi-class classifier is trained directly, the label value of the training file is not specified with the 1~15,WI tag (default 1). If a classifier is trained separately for each class, so that 100 images of the class are taken as positive samples (assuming label=1), and all the remaining training images are negative (1400, assuming Label=-1), the positive and negative samples are unbalanced, so the WI option should be developed, Specifically, you can specify-W1 14,-w-1 1 (1 is the default), first specify-WI at cross validation, and then grid.py to determine the optimal value (C,G). When actually doing the experiment, you can compare the exact difference between the two cases.


About LIBSVM, Taiwan University Linzhiren described it this way: "LIBSVM is a integrated software for support vector classification, (c-svc, nu-svc), Regression (ep Silon-svr, NU-SVR) and distribution estimation (One-class SVM). IT supports multi-class classification. " That is, LIBSVM is a software that integrates support vector machines (c-svc, nu-svc), regression, and distribution estimation (One-class SVM). and support multiple categories of classification. For Liblinear, the official web site is described as follows: "Liblinear is a linear classifier for data with millions of instances and features", that is, mainly for millions data and special Linear classifier for the implementation of the

Both of them are used for classification, and relatively libsvm are used in a wide range of applications, while the liblinear is primarily designed to handle the training process of large data volumes. In what case, the choice liblinear instead of the LIBSVM. The author gives some advice: when you face massive amounts of data, the bulk of this is usually millions. The massive data is divided into two levels: the number of samples and the quantity of features. Use linear and nonlinear mapping to train the model to achieve similar results. The time efficiency of model training is higher.

In such cases, it is recommended that you use Liblinear instead of LIBSVM. Text classification is the most typical example, the sample volume of text classification is very large, and the characteristics of the dimension is very high, from thousands of to millions of of the order of magnitude, so in the text of the classification of the best choice liblinear. The author gives an example of comparing liblinear and LIBSVM training effects and time efficiency. The data contains a total of 20,242 samples, each containing 47,236 features.

[Plain] View plain copy% time libsvm-2.85/svm-train-c 4-t 0-e 0.1-m 800-v 5 rcv1_train.binary Cross Validation Ac curacy = 96.8136% 345.569s time liblinear-1.21/train-c 4-e 0.1-v 5 rcv1_train.binary Cross Validation Accuracy = 97.0161% 2.944s


For more information, please refer to:

liblinear:http://www.csie.ntu.edu.tw/~cjlin/liblinear/

libsvm:https://www.csie.ntu.edu.tw/~cjlin/libsvm/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.