Tell us about one class classification and use SVDD (support vector domain description) to do one class today. Classification. Recently contacted one class classification, very interesting, and many kinds of classification ideas still have a big difference, longer posture ~
We know that the classification problem is generally 2 and more than 2 categories, typical 2 types of problems such as the identification of a message is not spam, there are only 2 classes, "yes" or "no", a typical multi-class classification problems such as face recognition, Each person's face is a class, and then the faces to be identified are divided into corresponding classes.
So what is one class classification? It has only one class, and then the result of recognition is "yes" or "No" class. Hey? Sounds like the 2-classification problem seems almost the same, what's the difference between them? The difference is that in the 2 class classification problem, there are 2 classes in the training set, often referred to as positive and negative examples, for example, for spam identification problems, the positive example is spam, negative examples are normal messages, and in one class classification, There is only one class. Sounds like a bit magical, what happens when there is only one class in the training set? It is usually in the case that there is only one kind of sample data on hand, or if other kind of data is not OK, what is OK? For example, there is now a bunch of historical sales data for a product that records the various information about the user who bought the product (which is used for feature extraction), and then some users who have not bought the product, and want to predict whether they will buy the product by 2 classes of classification, that is, 2 classes , one category is "buy", the other is "Do not buy". That's when the problem comes. If the user who bought the product as a positive example, did not buy the product users as negative example, will appear (1) has bought the user, you can clearly know that he has bought, and did not buy the user, but do not know that he is not really interested in the product, or want to buy but for a variety of reasons for the moment did not buy. (2) Generally speaking, do not buy the number of users will be far greater than the number of users already bought, which will cause training set of positive and negative sample imbalance, so that train out of the model has bias. At this point, you can use one class classification method to solve, that is, the training set only has already bought the product user data, in identifying a new user will buy the product, the recognition result is "will" or "not".
How does one class classification this? Multi-Class classification We are familiar with a lot of methods, such as SVM to find an optimal super plane to separate the positive and negative samples, in short, it involves more than one class of samples, The equivalent of telling the algorithm what this thing looks like (what's long in here is the extract extracted from the feature extraction method), what kind of thing it looks like, and then train a model to differentiate between these things.
The problem is that there is only one class classification, what should I do? To introduce a method: SVDD (Support vector domain description), its basic idea is that since there is only one class, then I trained a minimal hyper- sphere (the sphere is a spherical surface in the space of 3 or more dimensions, The corresponding 2-dimensional space is the curve, 3-dimensional space is the sphere, 3-dimensional or more than the called hyper-sphere, the heap of data are all wrapped up to identify a new data point, if the data point falls in the hyper-sphere, is this class, otherwise not. For example, 2-D (the number of dimensions based on feature extraction, extraction features, the number of dimensions is high, to facilitate the display, 2-dimensional example, the actual time can not be so low) data, probably like the following:
(Figure quoted HTTPS://KIWI.ECN.PURDUE.EDU/RHEA/INDEX.PHP/ONE_CLASS_SVM)
One might say: the curves on the graph don't wrap the dots. See the principle of understanding, the following for everyone to talk about SVDD principle, SVDD is called support vector domain description, presumably your first reaction is to think of support vector Machine (SVM), indeed, its principle and svm very much like, Can be used to do one class SVM, if you have seen the SVM principle, then the following explanation you will feel very familiar. Generally speaking model, there will be an optimization goal , SVDD optimization goal is to find a center, a radius of R minimum spherical :
Makes this sphere satisfying:
Satisfying this condition means that the data points in the training set are wrapped in a sphere.
What's this thing? If you have seen the SVM, presumably you can guess the meaning of it, it is a relaxation variable , and the classical SVM in the role of relaxation variables, it is to make the model is not the individual extreme data points to "destroy", imagine if most of the data in a small area , only a few abnormal data in a very long distance from them, if you want to find a super spherical to wrap them, the hyper-sphere will be very large, because to wrap the few very far points, so that the model is very sensitive to outliers, it is common to say that the few unusual points, although it is not possible to determine whether it is a noise data, It is because the large number of points are together, the few are not here, preferring to consider the few data points are abnormal, so that the model in order to cater to the few data points will make too much sacrifice, this is called overfitting (overfitting). so tolerate some data points that do not meet the hard constraints, give them some flexibility, but also to ensure that each data point in the training set to meet the constraints, so that in the later can be used Lagrange multiplier method to solve, because the Lagrange multiplier method is to contain constraints, If your data doesn't meet the constraints, it won't work. Note that the relaxation variable is with subscript I, that is, it is related to each data point, each data point has a corresponding relaxation variable, can be understood as: For each data point, the hyper-sphere can be different, according to the relaxation variable to control, if the value of the relaxation variable, the super spherical is the same. That C, is to adjust the impact of the relaxation variable size, said the popular point is to those who need to relax the number of data points slack space, if C is large, then in the cost function, the cost of the relaxation variable brought by the large, then training will be the relaxation variable is adjusted to small, The result is not to tolerate those outliers, just to wrap them up, conversely, if C is small, it gives the outliers greater elasticity, so that they can not be included in. Now do you understand what the figure above does not wrap up the dots? Show two graphs, the first is the case of C, and the second picture is the case of a large C:
(Figure quoted HTTPS://KIWI.ECN.PURDUE.EDU/RHEA/INDEX.PHP/ONE_CLASS_SVM)
Now there is the goal of the demand solution, and the constraints, the next solution is almost the same as SVM, using the Lagrangian multiplier :
Note and, the derivation of the parameter and the derivative equal to 0 are obtained:
Bring this stuff back to the Lagrangian function and get:
Note at this point, which is made up of, and co-launched. The above vector inner product can also be solved with kernel functions like SVM:
After the solution step is the same as in SVM, very complex, please refer to the SVM principle.
After the training is over, determine if a new data point Z is this class, then see if the data point is in the trained hyper-sphere, if it is inside, that is, it is determined to belong to this class. When the center of the spherical surface is represented by a support vector, the criteria for determining whether the new data belongs to this class are:
If the kernel function is used, that is:
Reference: David m.j, Robert p.w Duin. Support vector domain Description[j]. Pattern recognition letters,1999,20:1191-1199.
Transferred from: http://blog.sina.com.cn/s/blog_4ff49c7e0102vlbv.html
One Class SVM, SVDD (support Vector Domain Description) (GO)