Support Vector Machine-machine learning in action learning notes

Last Update:2016-05-11 Source: Internet

Author: User

Tags scalar svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

p.s. SVM is more complex, the code is not studied clearly, further learning other knowledge after the supplement. The following is only the core of the knowledge, from the "machine learning Combat" learning summary.
Advantages:The generalization error rate is low, the calculation cost is small, the result is easy to explain. Disadvantages:Sensitivity to parameter tuning and selection of kernel functions, the original classifier is only suitable for handling two types of problems. applicable data type:Numeric and nominal type data.
Linearly Scalable Data: Draws a straight line separating the two sets of data points.
Super Plane (A line separating the dataset into aN-1 Dimension ): The decision boundary of the classification. The farther away the data points are from the decision boundary, the more credible their final predictions will be. interval:The distance from the point to the divider surface. (The closest point to the divider is to ensure that they are as far away from the divider as possible.) ）
Support Vectors(Support vector) is the nearest point to the separation of the hyper plane. The next step is to try to maximize the distance from the support vector to the divider, and you need to find the optimal solution for this problem.
Application support vector machine algorithm isThe classification super Plane is established in the feature space.

the category label is 0 or 1:Why do category labels use-1 and +1? This is due to the fact that 1 and +1 differ only by one symbol, which facilitates mathematical processing. (If the data point is in positive direction (i.e., class +1)And far from the separation of the super-plane, wtx+b(that is, separating the hyper-plane) is a large positive number and label * (WTX+B)(That is, the interval) can also be a large positive number. And if the data points are in Negative direction (-Class 1)And is far from the separation of the plane, the label * (WTX+B) is still a large positive number because the category label is-1. ）

Target:

Find the that maximizes the minimum interval. Span style= "FONT-SIZE:10PT; line-height:1.5; " >w and B;
label * (w t
Span style= "FONT-SIZE:10PT; line-height:1.5; " >label * (w tx+b) ≥1.0;
is solved by Lagrange multiplier, (there is a hypothesis that the data must be 100% linearly divisible because it is not possible, so the so-called relaxation variable, to allow some data points to be on the wrong side of the divider) is:

The constant c here is used to control "Maximize Interval"And "ensure that the function interval for most points is less than 1.0"The weights of these two goals. In the implementation code of the optimization algorithm, the constant c is a parameter, so we can get different results by adjusting the parameter. Once all the alpha is found, the separation of the hyper plane can be expressed by these alpha. This conclusion is very straightforward, and the main task in SVM is to solve these alpha. (most alpha values are 0.) Instead of 0 alpha, the support vector is the equivalent. ）

general flow of SVMCollect data: You can use any method. Prepare data: Numeric data is required. Analyze data: Helps visualize the separation of the superelevation plane. Training algorithm: Most of the time of SVM originates from training, which mainly realizes the tuning of two parameters. Test algorithm: A very simple calculation process can be achieved. Using the algorithm: Almost all classification problems can use SVM, it is worth mentioning that the SVM itself is a two class classifier, the application of SVM to multi-class problems need to make some changes to the code.
Sequence minimization optimization algorithm (SMO): The large optimization problem is decomposed into several small optimization problems to solve in order. objective of the SMO algorithm:To find a series of Alpha and B, it is easy to calculate the weight vector w with these alpha and to get the separated hyper plane.
how the SMO algorithm works:Select two alpha in each cycle to optimize processing. Once a pair of appropriate alpha is found, increase one and decrease the other. Here the so-called "fit" means that two alpha must meet certain conditions, one of the conditions is that the two alpha must be outside the interval boundary, and the second condition is that the two alpha has not been the interval processing or not on the boundary.

the first version of the SMO algorithm pseudo-code:
Creates an alpha vector and initializes it to a 0 vector when the number of iterations is less than the maximum number of iterations (outer loop) for each data vector in the dataset (inner Loop): If the data vector can be optimized: Randomly select another data vector to optimize both vectors if two A vector cannot be optimized, exiting the inner loop if all vectors are not optimized, increase the number of iterations, and continue the next loop
Array Filtering:Only useful for numpy types, array (list) >0, you get a Boolean array. Array (list) [Array (list) >0] to get an array of all values greater than 0 in the original list.
Accelerate optimization with the full Platt SMO algorithm:
1. Select the first alpha value by an outer loop: One way is to perform a single-pass scan on all datasets, and the other is to perform a single-pass scan in non-boundary alpha (not equal to the alpha value of boundary 0 or C, skipping a known alpha value that does not change). 2. Select the second alpha value through an inner loop: Maximize Step sizeTo obtain a second alpha value. In the simplified SMO algorithm, we calculate the error rate EJ after selecting J. But here we're going to build a the global cacheFor saving error valueand choose the alpha value that makes the stride length or ei-ej the largest.
kernel functions:Implements a mapping from one feature space to another feature space. It is possible to convert data from one difficult form to another that is easier to handle. (To map data (sometimes nonlinear data) from a low-dimensional space to a high-dimensional space, a nonlinear problem in a low-dimensional space can be converted to a linear problem in a high-dimensional space. ）
the inner product of a vector:Refers to the multiplication of two vectors, followed by a single scalar or value. All operations in SVM optimization can be written in the form of an inner product. We can replace the inner product operation with the kernel function (the process is nuclear technique or nuclear "power") without simplifying the processing.
Radial basis core function（a popular kernel function):

Using vectors as independent variables to output a scalar based on the vector distance operation
Gaussian version of the radial basis function:(σ is a user-defined speed parameter used to determine the arrival rate (REACH) or the function value drops to 0)

number of support vectorsThere is an optimal value. The advantage of SVM is that it can classify data efficiently. If the support vector is too small, a poor decision boundary may be obtained, and if the support vector is too many, it is equivalent to using the entire data set to classify each time, this classification method is called k Nearest Neighbor。

From for notes (Wiz)

Support Vector Machine-machine learning in action learning notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More