preliminary practice of Liblinear
In the revision of related recommendation projects, the training effect of mainstream mature algorithm model such as Liblinear/fm/xgboost is tried and compared, and the liblinear is actually used in the first stage of transformation. In this paper, we mainly introduce the models of liblinear from the angle of engineering application, and give the actual evaluation results of liblinear/fm/xgboost for reference. (Reference from http://blog.csdn.net/ytbigdata/article/details/52909685) 1. Liblinear Instructions
Considering the training efficiency, this selection is multi-threaded parallel version liblinear, Actually for Liblinear-multicore-2.1-4, first directly give its train command support of each mode description, the mode selection is not only with our use of liblinear tools directly related to our understanding of liblinear is very helpful, the following is mainly around these patterns unfold.
Parallelliblinear is only available for-s0, 1, 2, 3, one now
Usage:train [options] training_set_file[model_file]
Options
-S Type:set typeof solver (default 1)
Formulti-class Classification (dual dual, primal original)
0--l2-regularized logisticregression (Primal)--- Logistic regression
1--l2-regularized l2-losssupport vector Classification (dual)--- linear SVM
2--l2-regularized L2-loss Supportvector Classification (Primal)-- corresponds to 1
3--l2-regularized l1-loss Support vector Classification (dual)
4--support vectors classification by crammer and Singer
5--l1-regularized l2-loss Support vector classification
6--l1-regularized logisticregression
7--l2-regularized Logistic regression (dual)
Forregression
11--l2-regularized l2-loss Support vector regression (primal)
12--l2-regularized l2-loss Support vector regression (dual)
13--l2-regularized l1-loss Support vector regression (dual) 1.1 liblinear or LIBSVM
Since it is liblinear related, not mundane will be involved in this issue, of course, this is a very large proposition, in this we intercept the focus of a simple introduction.
First, Liblinear and LIBSVM were developed by a team of Linzhiren (Chih-jen Lin), a National Taiwan University, LIBSVM was released as early as 2000, and Liblinear only released its first version in 2007 years.
there are differences in principle and implementation, and LIBSVM is a complete set of SVM implementations that includes both the underlying linear SVM and the non-linear svm;liblinear with kernel functions, which is a toolkit specifically implemented and optimized for linear scenarios. Both linear SVM and linear logistic regression model are supported . Because LIBSVM supports kernel functions to implement nonlinear classifiers, in theory, LIBSVM has a stronger classification capability and should be able to handle more complex problems.
However, LIBSVM training speed is a big bottleneck, according to the general experience, after the sample volume million, LIBSVM is relatively slow, the sample size of a large order of magnitude, the usual machine can not be processed , and Liblinear is designed to solve the problem of large data volume, Because only a linear classification is needed, the liblinear can use an optimization algorithm that is completely different from the LIBSVM, while maintaining the linear SVM classification with similar effects, greatly reducing the training computational complexity and time consumption.
At the same time, in the big data background, the linear classification and the non-linear classification effect is not very different, especially in the case of high feature dimension and limited sample, the kernel function method may mistakenly divide the category space, resulting in poor effect. Linzhiren also gave a lot of practical examples to prove that the artificial structure features + linear model can achieve even more than the performance of kernel SVM, while greatly reducing the training time and resources consumed.
As for the actual time comparison, the Liblinear author has given the following data: For an instance of the LIBSVM dataset, "20,242 samples/47,236 Features", the Liblinear takes only about 3 seconds to keep the accuracy of cross-validation close to each other. Far less than 346 seconds of LIBSVM.
% timelibsvm-2.85/svm-train-c 4-t 0-e 0.1-m 800-v 5rcv1_train.binary
Crossvalidation accuracy = 96.8136%
345.569s
% timeliblinear-1.21/train-c 4-e 0.1-v 5rcv1_train.binary
Crossvalidation accuracy = 97.0161%
2.944s 1.2 Specific Solver options. Linear SVM or logisticregression/l1 regularization or L2 regularization
Liblinear supports a variety of solver modes, the following is a direct enumeration of the liblinear supported structural risk functions of several typical solver modes ( structural risk functions are composed of loss function and regularization item/penalty, In fact, it is the optimization problem to solve the minimum value of structural risk function , which is convenient to explain and understand.
L2-regularizedl1-loss Support Vectorclassification
L2-regularizedl2-loss Support Vector Classification
L1-regularizedl2-loss Support Vector Classification
l2-regularized Logistic Regression
l1-regularized Logistic Regression
Liblinear supports both linear SVM and logisticregression, the difference is that the loss function (loss function) is different, the loss function is used to describe the predicted value F (X) and the actual value of the difference between the non-negative real value function, recorded as L ( Y, F (X)), which is the item in the above formula.
Another important option is the regularization item. Regularization items are introduced to reduce the complexity of the model, improve generalization ability, and avoid overfitting. When the data dimension is very high/the sample is not many, the model parameters are many, the model is easy to become very complex, on the surface, although it is very good to pass all the sample points, but actually there are many overfitting, at this time by introducing L1/L2 regularization items to solve.
In General, the L1 is the 1 norm, the sum of the absolute value, and the L2 is the 2 norm, which is the usual meaning of the modulus. L1 tend to produce a small number of features, while others are 0, that is, to achieve so-called sparse, and L2 will choose more features that will be close to 0.
for the choice of Solver, the author's suggestion is that linear SVM is generally recommended, its training speed is fast and the effect is close to LR; L2 regularization is generally recommended , L1 accuracy is relatively low and training speed is slower, Unless you want a sparse model (individual note: When the number of features is very large, the sparse model is helpful for reducing the amount of online prediction calculations). 1.3 Primal or dual?
Primal and dual respectively correspond to the solution of the original problem and duality problem, which has no effect on the result , but the duality problem may be slow. The author has the following suggestions: for L2 regular-SVM, you can first try to solve with dual, if very slow, then the primal solution.
Another reference on the Web is: for a small sample size, but particularly high-dimensional scenes, such as text classification, more suitable for dual problem solving; Conversely, when the number of samples is very large, and the feature dimension is not high, if solve duality problem, because the kernel matrix is too big, the solution is not convenient. Instead, it is easier to solve the original problem. 1.4 Whether the training data should be normalized
For this reason, the authors suggest that, in their application of document classification, normalization not only greatly reduces training time, but also makes training more effective, so we choose to normalization the training data . At the same time in practice, normalization allows us to directly compare the weight of the formulas of each feature, and to see intuitively which features are more important.
2. Liblinear and Fm/xgboost actual effect comparison record
In this round of transformation, the main actual attempt to liblinear the effects of the various models, but also the industry commonly used fm/xgboost for comparison test, the following list for reference.
Note: Because Liblinear is still a single-machine training, limited by memory, can not load full-scale data training, so the subsequent training data on how much (1/120->1/4->1/2) also have a special experiment;
5, Xgboost Effect Summary
The full name of Xgboost is extreme Gradient boosting, which is a C + + implementation of Gradientboosting machine, the author of the University of Washington, the Institute of Computer Learning Daniel Chen Tianchi . Traditional GBDT with cart as the base classifier, Xgboost also supports the linear classifier, it can automatically use the multi-threaded CPU to parallel, while improving the accuracy of the algorithm, in the Kaggle and other data competition platform community visibility is high.
In the test, Xgboost did show strength, with only the default parameter configuration and 1/120 small data volumes (about 2 million samples), It reached 0.8406 of the AUC, which exceeded all liblinear effects, and was not currently used directly by the xgboost, and further follow-up by colleagues.