A SMO algorithm Basics
The support vector is the closest point to the split plane. Separating the superelevation plane is the decision boundary that separates the datasets.
Support Vector machines map vectors to a higher dimensional space, where a maximum interval of hyperspace is established in this space. On both sides of the super plane that separates the data, there are two super-planes that are parallel to each other. The distance between the two parallel super-planes is maximized by establishing a suitable separating super-plane. It is assumed that the larger the distance or gap between parallel planes, the smaller the total error of the classifier.
We want to find the closest point to the divider, to make sure that they are as far away from the divider as possible, with as little error.
Unlike logistics, the support vector machine uses a category tag that is-1 and +1, because 1 and + 1 differ only by one symbol, facilitating mathematical processing.
The form of separating the super-plane can be written as a point to the length of the normal or vertical line separating the plane, and our goal is to find the W and b at which the interval is maximized, and the interval is calculated by means of the target type:
where | | w| | is the second-order norm, which is the square root of the sum of the squares of each item:
But the product optimization is particularly troublesome, so we can write the hyper plane as a data point:
The constraints are:
Available W vs. B:
But in most cases the data is not 100% linear, so we add the relaxation variable and the constraint becomes:
The solution process is more complicated, if like I touch this kind of computer language soon, can not be too entangled in the computational process of the algorithm, the clear principle can also read the program. The next step is the complete SMO algorithm program implementation.
Two Program implementation
Use the data in the TestSet.txt to get a simple look at the support vector machine algorithm.
Data in partial TestSet.txt
For the TestSet.txt dataset, the first helper function Loaddataset () function opens the file and parses it on a line-by-row basis to get the class label and the entire data matrix for each row. The Selectjrand () parameter i is the subscript for the first alpha, and M is the number of all alpha. Clipalpha () adjusts the alpha value greater than H or less than L.
Then you need to build a Optstructure class that contains only the Init method to implement the population of its member variables. Calcek () calculates the E value and returns it. SELECTJ () is used to select the alpha value of the inner loop. Cpdateek () calculates the error value that is used when the alpha value is optimized.
Then import the Innerl () function and use its own data structure to pass in the parameter OS. The Smop () function creates a data structure to hold all the data, and then initializes the variables that control the function exit. The For Loop in CALCWS () iterates through all the data in the dataset, discarding other data points beyond the support vector.
Finally, draw a graph using matplotlib.
Three Handwriting recognition issues
Data: Handwriting recognition data stored in Trainingdigits and testdigits two folders.
The SMO algorithm program is the same as before, then the Img2vector () function is used to convert the 32x32 binary image to the 1x1024 vector and the loadimages () function to load the image.
Four Test results and methods
The number of support vectors, the error rate of training set and the error rate of test set are tested with the testdigits () function.
After 4 iterations are obtained:
Five Kernel function
The kernel function is the core algorithm of SMV, and for a sample that is linearly non-divided, the original input space can be linearly divided into a new kernel space by mapping it to the feature space of the higher dimension. Only by adjusting the kernel function, the feature can be transformed from low dimension to high dimension, and the real result is in high dimension on the low dimension.
This figure from 74418365
One of the more popular kernel functions is the radial basis kernel function:
Because I do not know the kernel function very much, the data set used here is relatively simple, no kernel function is used, it is no longer specific, want to understand the specific algorithm implementation process can refer to http://www.cnblogs.com/jerrylead/archive/2011/03/ 18/1988406.html
Six Summarize
Support Vector machine has good learning ability, and the result has good generalization. And the complete algorithm improves the speed of optimization to a great extent. However, for parameter adjustment and kernel function selection, it is necessary to modify the original separator to deal with problems other than the two classification. Suitable for both numerical and nominal data types.
Machine learning-Support vector machine algorithm implementation and instance program