Reference: Http://www.cppblog.com/sunrise/archive/2012/08/06/186474.html Http://blog.csdn.net/sunanger_wan g/article/details/7887218
My Data Mining Algorithm code:https://github.com/linyiqun/DataMiningAlgorithm
Introduction
SVM (Support vector machines) is a machine learning algorithm for pattern recognition and pattern classification. The main idea of SVM can be summed up as 2 points: (1), for the analysis of linear sub-conditions. (2), for the linear non-divided case, by using the kernel function, the low-dimensional linear non-divided space into a high-dimensional linear can be divided, and then in the analysis. At present, there is a good SVM algorithm package, in the latter part of this paper will give me the implementation of a good SVM based on LIBSVM packet classification code.
principles of SVM algorithm
The specific principles of the SVM algorithm have to be divided into 2 parts, one is a linear can be divided into the case, one is a linear non-divided case, the following is the case of linear can be divided:
the condition of linear can be divided
Here is the form of a two-dimensional space:
The middle line is divided into dividing lines, we can use F (X) =w*x+b, W,x here is the form of vectors. To such a split line, as long as a little move, and there will be a correct target line, so we are looking for a target solution, of course, to find out the critical condition of the segmentation.
For example, as shown above, the best classification situation, should be the margin above the largest size, to ensure that the most accurate classification. Some mathematical proofs of reasoning are omitted here. To use the following maximization:
Conversely, it is to minimize the denominator position:
is to make | | w| | The minimum, of course, there will be a restrictive condition, that is, the line should have the role of classification, that is, the sample data into the formula, at least there will be classified, so the restrictions are coming:
S.T means subject to, which is the limiting condition at the back. This is the final expression of the problem. The next one goes through a series of transformations that eventually turn out to look like this:
This is the formula that we need to finally optimize. At this point, the optimal formula of linear problem is obtained. If at this time you ask me how to solve this problem, I am sorry to tell you, I do not know (regret the high number did not learn ...)
linear non-divided case
Also give a picture:
We can only find such a curve to divide the line of AB. At this point, the 4 kernel functions introduced in the beginning are used.
choose different kernel functions, can generate different SVM, commonly used kernel functions have the following 4 kinds:⑴ linear kernel function k (x, y) =x y;⑵ polynomial kernel function k (x, y) =[(x y) +1]^d;⑶ radial basis function K (x, y) =exp (-|x-y|^2/d^2)⑷ Two-layer neural network kernel function k (x, y) =tanh (a (x y) +b)but at some point, we add the penalty factor C and ε thresholds (to ensure fault tolerance) for data fault tolerance and accuracy .
The restrictions are:
The above is a linear, non-divided case can be automatically converted to linear sub-condition by the kernel function. Throughout the process, the main reasoning process is omitted, and the details can be clicked at the top of the 2 links provided.
algorithm implementation of SVM
Here I use the LIBSVM library to do a pattern classification. The main process is:
1. Enter training set data.
2, provide training set data construction Svm_problem parameters.
3. Set the SVM type and kernel function type in the Svm_param parameter.
4, through Svm_problem and Svm_param to build the classification model.
5, finally through the model and test data output forecast value.
Svmtool Tool Class code:
Package Datamining_svm;import Java.io.bufferedreader;import Java.io.file;import java.io.filereader;import Java.util.arraylist;import Java.util.list;import Datamining_svm.libsvm.svm;import DataMining_SVM.libsvm.svm_model ; Import Datamining_svm.libsvm.svm_node;import Datamining_svm.libsvm.svm_parameter;import DataMining_ svm.libsvm.svm_problem;/** * SVM Support vector machine Tool class * * @author Lyq * */public class Svmtool {//Training set data file path private String Traindatap ath;//Svm_problem object, used to construct the SVM model private Svm_problem sproblem;//SVM parameter, which has SVM support vector machine type and different SVM kernel function type private svm_ Parameter Sparam;public svmtool (String traindatapath) {This.traindatapath = traindatapath;//initialize SVM-related variables Sproblem = Initsvmproblem (); sParam = Initsvmparam ();} /** * Initialize operation, construct classification model according to training set data */private void Initoperation () {}/** * Svm_problem object, training set data related information configuration * * @return */private Svm_prob Lem Initsvmproblem () {list<double> label = new Arraylist<double> (); list<svm_node[]> nodeset = new arraylist<svm_node[]> (); GetData (NodeSet, LabeL, Traindatapath); int datarange = Nodeset.get (0). length;svm_node[][] Datas = new Svm_node[nodeset.size ()][datarange]; The vector table for the training set for (int i = 0; i < datas.length; i++) {for (int j = 0; J < DataRange; J + +) {Datas[i][j] = Nodeset.get (i) [ j];}} double[] lables = new double[label.size ()]; A, B corresponds to lablefor (int i = 0; i < lables.length; i++) {Lables[i] = Label.get (i);} Define Svm_problem Object Svm_problem problem = new Svm_problem ();p ROBLEM.L = Nodeset.size (); Number of vectors problem.x = datas; Training set vector table problem.y = lables; The corresponding lable array return problem;} /** * Initializes the parameters of the SVM support vector machine, including the type of SVM and the type of kernel function * * @return */private svm_parameter Initsvmparam () {//define Svm_parameter object Svm_paramet ER param = new svm_parameter ();p aram.svm_type = Svm_parameter. epsilon_svr;//the kernel function type for the set SVM is Linetype param.kernel_type = Svm_parameter. The parameter configuration behind linear;//is only for the training set of data Param.cache_size = 100;param.eps = 0.00001;param. C = 1.9;return param;} /** * Predictive data type by SVM * * @param testdatapath */public void Svmpredictdata (String testdatapath) {//Get test data list<D ouble> TestLabel = new arraylist<double> (); list<svm_node[]> testnodeset = new arraylist<svm_node[]> () GetData (Testnodeset, TestLabel, TestDataPath) ; int datarange = Testnodeset.get (0). length;svm_node[][] Testdatas = new Svm_node[testnodeset.size ()][dataRange]; The vector table for the training set for (int i = 0; i < testdatas.length; i++) {for (int j = 0; J < DataRange; J + +) {Testdatas[i][j] = Testnod Eset.get (i) [j];}} The actual value of the test data will be compared with the predictive value of the SVM in the following double[] testlables = new double[testlabel.size ()]; A, B corresponds to lablefor (int i = 0; i < testlables.length; i++) {Testlables[i] = Testlabel.get (i);} If there is no problem with the parameter, the Svm.svm_check_parameter () function returns NULL, otherwise the error description is returned. The configuration parameters for the SVM are called validation, because some parameters are only for the type of the support vector machine System.out.println (Svm.svm_check_parameter (Sproblem, SParam)); System.out.println ("------------test parameters-----------");//Training SVM classification model Svm_model model = Svm.svm_train (Sproblem, SParam);// Predictive test data labledouble err = 0.0;for (int i = 0; i < testdatas.length; i++) {Double truevalue = testlables[i];//test data True Value SYS Tem.out. Print (TrueValue + ");d ouble predictvalue = svm.svm_predict (model, testdatas[i]);//test Data Predictive value SYSTEM.OUT.PRINTLN ( Predictvalue);}} /** * Get data from a file * * @param nodeset * Vector node * @param label * node value type value * @param filename * data text Piece address */private void GetData (list<svm_node[]> nodeset, list<double> label,string filename) {try {FileReader fr = new FileReader (new File); BufferedReader br = new BufferedReader (FR); String line = null;while (line = Br.readline ()) = null) {string[] datas = Line.split (","); svm_node[] vector = new Svm_no de[datas.length-1];for (int i = 0; i < datas.length-1; i++) {svm_node node = new Svm_node (); node.index = i + 1;node . Value = Double.parsedouble (Datas[i]); vector[i] = node;} Nodeset.add (vector);d ouble lablevalue = double.parsedouble (Datas[datas.length-1]); Label.add (Lablevalue);}} catch (Exception e) {e.printstacktrace ();}}}
Call class:
/** * SVM Support Vector Airport View Call class * @author Lyq * */public class Client {public static void main (string[] args) {//Training set data file path String Traind Atapath = "C:\\users\\lyq\\desktop\\icon\\traininput.txt";//test data file path string Testdatapath = "C:\\users\\lyq\\desktop\ \icon\\testinput.txt "; Svmtool tool = new Svmtool (traindatapath);//SVM Support vector Machine Classification of test data tool.svmpredictdata (Testdatapath);}}
Enter the contents of the file:
Training Set Data trainInput.txt:
17.6,17.7,17.7,17.7,17.817.7,17.7,17.7,17.8,17.817.7,17.7,17.8,17.8,17.917.7,17.8,17.8,17.9,1817.8,17.8,17.9,18,18.117.8 , 17.9,18,18.1,18.217.9,18,18.1,18.2,18.418,18.1,18.2,18.4,18.618.1,18.2,18.4,18.6,18.718.2,18.4,18.6,18.7,18.918.4,18.6,1 8.7,18.9,19.118.6,18.7,18.9,19.1,19.3
Test Data Set TestInput.txt:
18.7,18.9,19.1,19.3,19.618.9,19.1,19.3,19.6,19.919.1,19.3,19.6,19.9,20.219.3,19.6,19.9,20.2,20.619.6,19.9,20.2,20.6,2119.9 , 20.2,20.6,21,21.520.2,20.6,21,21.5,22
The output is:
Null------------test Parameters-----------.................. *optimization finished, #iter = 452nu = 0.8563102916247203obj =- 0.8743284941628513, rho = 3.4446523008525705nSV = NBSV = 919.6 19.5502720169190519.9 19.845547360617520.2 20.17559362 818860420.6 20.5404108196373721.0 20.95576985883348821.5 21.40589982190544722.0 21.94590866154817
SVM Support Vector Machine algorithm