Text classification--multi-classification

Last Update:2015-12-14 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Text classification is the most common problem in the field of natural language processing, open source tools are also very useful, but the slow pace of training, the need to introduce a multi-core version, open source multi-core support parameters are limited, and colleagues provide a language barrier, feel that they explore the multi-classifier.

There are many classification algorithms, but the effect of the basic is LR and SVM, and these two algorithms industry-famous open source code should be liblinear and LIBSVM,LIBSVM support multi-core is not yet understood, but the liblinear supported by the multi-core version of the three groups (0, 2, 11), just to avoid the set of parameters I need to use, so I groped under the Liblinear train code.

First, say the classification

The second classification is the most basic function of the classification problem, which is supported by both LR and SVM, and then the multi-classification problem.

Multiple classifications can be divided into two types: (1). Direct Multi-Classification, (2) use of multiple two classification combinations

1.1 Direct Multi-classification

Softmax is the direct multi-classification of the LR version, and SVM can directly realize the multi-classification, modify the target function, combine the parametric solution of multiple classification planes into one optimization problem, and realize the multi-class classification class by solving the optimization problem "once". This method seems simple, but its computational complexity is relatively high, it is difficult to achieve, only suitable for small-scale problems. Therefore, the use of multiple two classification is more common.

1.2 Using multiple two classification combinations

(a) One-to-many law (one-versus-rest, abbreviated as OVRSVMS).

In the course of training, the samples of a category are grouped into one class, and the other remaining samples are classified as another, so that the K-class sample constructs a K classifier. Classify unknown samples as classes that have the maximum classification function values.

If I have four categories to divide (that is, 4 labels), they are a, B, C, D. So when I was extracting the training set, I took the corresponding vector of a as the positive set, the corresponding vector of b,c,d as the negative set, and the corresponding vector of B as the positive set, the vector corresponding to the a,c,d as the negative set, the vector corresponding to the C as the positive set, and the corresponding vector of a,b,d as the negative set. D, the corresponding vector as a positive set, a,b,c the corresponding vector as a negative set, the four training sets are trained respectively, and then get four training results files, in the test, the corresponding test vectors using the four training results file to test, and finally each test has a result F1 (x), F2 ( x), F3 (x), F4 (x). So the end result is the largest of these four values.

Original Author Note: This method has a flaw, because the training set is 1:m, in this case there is biased. Thus, it is not very practical.

I note: Liblinear uses this method, so the training speed is fast, but the memory is very large.

(b) One-to-one law (One-versus-one, abbreviated as OVOSVMS or pairwise).

The practice is to design a classifier between any two types of samples, so the K class sample requires the design of K (k-1)/2 two classifiers. When an unknown sample is classified, the category that is the last to be voted on is the category of the unknown sample.

Or fake there are four classes of a,b,c,d four classes. In training I choose a, B; A,c; A,d; B,c; B,d; C,d the corresponding vector as a training set, and then get six training results, in the test, the corresponding vector of six results are tested, and then take the voting form, and finally get a set of results.

The vote is like this.
A=b=c=d=0;
(A, B)-classifier if it is a win, then a=a+1;otherwise,b=b+1;
(a,c)-classifer if it is a win, then A=a+1;otherwise, c=c+1;
...
(c,d)-classifer if it is a win, then c=c+1;otherwise,d=d+1;
The decision is the Max (a,b,c,d)

Original Author Note: Although this method is good, but when the category is many, the number of model is n (n-1)/2, the price is quite large.

I note: LIBSVM uses this method, so the accuracy rate will be higher than liblinear, but the speed will be much slower.

The hierarchical taxonomy first divides all categories into two subclasses, and then divides the subclasses further into two sub-subclasses, so that it loops until a separate category is obtained.

A detailed description of C can be referred to the paper "Support vector machine in multi-class classification problems" (Computer engineering and application. 2004) # #没有了解, have a spare attention

(d) Dag-svms is Platt proposed by the decision-oriented circular diagram Ddag export, is aimed at "one-on" SVMs the existence of false points, refused to divide the phenomenon. The algorithm in the training stage and the "one-to-one" method, but also to construct a classifier between each of the two classes, both N (n-1)/2 classifiers. However, in the classification phase, the method makes all classifiers a two-direction, forward-loop graph, including n (n-1)/2 nodes and N-leaves. Each of these nodes is a classifier and is connected to two nodes (or leaves) on the next layer. When an unknown sample is classified, first from the top node (containing two classes), according to the classification results of the node continues to classify the next layer of left or right node, until the bottom of a leaf, the leaf represents the category is unknown sample category. DAGSVM in training with OVOSVM, all need to train n (n-1)/2 classifiers, but in the classification of the use of the structure of the circular graph, you can only use (n-1) classifier can be completed. And the efficiency is improved.

But Dag-svms in the classification process has the error accumulation phenomenon, even if in a node at a classification error, then the classification error will continue to the node in the lower node, classification errors in the closer to the root node, the error accumulation will be more serious, the performance of the classification will be worse.

I note: The two classification truly combines decision trees ...

Reference documents:

1.http://blog.sina.com.cn/s/blog_4c98b96001009b8d.html

2.http://www.doc88.com/p-6092154562202.html

Text categorization-multi-classification

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More