SVM is a commonly used supervised learning model (which gives you some input features that tell you that the samples of these features belong to Class A, and then give you some input features that tell you that the samples of these features belong to Class B, and now we have some data to determine which category they belong to).
The difference between it and Kmeans is that Kmenas is unsupervised learning model, that is, Kmeans does not need to know in advance (training), as long as you give me the characteristics, I am done according to the characteristics of the.
The difference between it and KNN is that the KNN to one even once (and the rest of the set to compare), and SVM is equivalent to the beginning of the pre-work (training good), and then to a I based on the results of the training to the end of the generation,
So KNN if the amount of data is too large or the dimension is too high GG, because I want to compare with the data set (even if some optimization strategy is a lot of comparisons), and SVM has a once and for all feeling
General flow of linear SVM:
1, Data standardization processing (minus average./standard deviation)
2, draw a plane to split all the features (that is, to transform into a planning model solver)
3, the expected data is brought in after the training is completed.
Here is a training set
I want to classify two categories according to the asset's liquidity and profitability.
Number |
Asset Discount Force |
Earning power |
Activity |
Category |
1 |
0.9 |
0.34 |
1.53 |
1 |
2 |
0.88 |
0.23 |
1.67 |
1 |
3 |
0.92 |
0.28 |
1.43 |
1 |
4 |
0.89 |
0.14 |
1.24 |
1 |
5 |
0.78 |
0.35 |
1.8 |
1 |
6 |
0.81 |
0.26 |
2.01 |
1 |
7 |
0.72 |
0.18 |
1.75 |
1 |
8 |
0.93 |
0.22 |
0.99 |
1 |
9 |
0.82 |
0.26 |
1.4 |
1 |
10 |
0.78 |
0.26 |
1.34 |
-1 |
11 |
0.78 |
0.27 |
1.67 |
-1 |
12 |
0.72 |
0.18 |
1.53 |
-1 |
13 |
0.69 |
0.16 |
1.2 |
-1 |
14 |
0.63 |
0.15 |
0.88 |
-1 |
15 |
0.58 |
0.22 |
1.42 |
-1 |
16 |
0.81 |
0.18 |
1.59 |
-1 |
17 |
0.67 |
0.21 |
1.21 |
-1 |
18 |
0.65 |
0.16 |
1.37 |
-1
|
Save the data as Svm.xls with the following code:
CLC;
Clear all;
X0=xlsread (' Svm.xls ', ' b2:e19 ');
For i=1:3%%% the input data of the training sample
X (:, i) = (X0 (:, i)-mean (X0 (:, i)))/std (X0 (:, i)),%%% standard processing
end
[M,n]=size (X);
E=ones (m,1);
D=[x0 (:, 4) The output of the];%%% training sample
B=zeros (m,m);
For i=1:m
B (i,i) =1;
C (i,i) =d (i,1);
End
A=[-x (:, 1). *d,-x (:, 2). *d,-x (:, 3). *d,d,-b];
B=-E;
F=[0,0,0,0,ones (1,m)];
Lb=[-inf,-inf,-inf,-inf,zeros (1,m)] ';
X=linprog (f,a,b,[],[],lb);
W=[x, X (2,1), X (3,1)];
Cc=x (4,1);
X1=[x (:, 1), X (:, 2), X (:, 3)];
R1=x1*w '-cc;
R2=sign (R1);
DISP (' program output is: ');
R=[R1,R2]
Although there is a small error but acceptable.
But the common is not divided into two categories, such as wine that the problem is divided into 4, 5 How to do (of course, the problem is unsupervised, directly hierarchical clustering or Kemans OK)
If I have three categories to divide, they are a, B, C.
So when I was extracting the training set, I extracted
(1) A corresponds to a vector as a positive set (1), b,c the corresponding vector as a negative set (the rest is divided into-1);
(2) b corresponds to a vector as a positive set, the corresponding vector of a,c as a negative set;
(3) the vector corresponding to C is a positive set, and a B corresponds to a vector as a negative set;
Use these three training sets to train separately, then get three training result files.
At the time of testing, the corresponding test vectors were tested using the three training result files.
Finally, each test has a result F1 (x), F2 (x), F3 (x).
The final result is the largest of the four values as a result of the classification.
This means that the process is as follows:
1. Organize data into three XLS
2, for the I-XLS represents the input Class I test set, belongs to Class I of the mark result is 1, the remaining is-1.
3, each of which is divided into a plane
4, bring in forecast data.
5, the maximum value of the XLS for each prediction data as a result, that is, it belongs to Class I.
The new data set is as follows:
Number |
Asset Discount Force |
Earning power |
Activity |
Category |
1 |
0.9 |
0.34 |
1.53 |
1 |
2 |
0.88 |
0.23 |
1.67 |
1 |
3 |
0.92 |
0.28 |
1.43 |
1 |
4 |
0.89 |
0.14 |
1.24 |
1 |
5 |
0.78 |
0.35 |
1.8 |
1 |
6 |
0.81 |
0.26 |
2.01 |
1 |
7 |
0.72 |
0.18 |
1.75 |
2 |
8 |
0.93 |
0.22 |
0.99 |
2 |
9 |
0.82 |
0.26 |
1.4 |
2 |
10 |
0.78 |
0.26 |
1.34 |
2 |
11 |
0.78 |
0.27 |
1.67 |
2 |
12 |
0.72 |
0.18 |
1.53 |
2 |
13 |
0.69 |
0.16 |
1.2 |
3 |
14 |
0.63 |
0.15 |
0.88 |
3 |
15 |
0.58 |
0.22 |
1.42 |
3 |
16 |
0.81 |
0.18 |
1.59 |
3 |
17 |
0.67 |
0.21 |
1.21 |
3 |
18 |
0.65 |
0.16 |
1.37 |
3 |
The data will be stored separately after Svm2.xls,svm3.xls,svm4.xls
The code is as follows:
CLC
Clear all;
X0=xlsread (' Svm2.xls ', ' b2:e19 ');
For i=1:3%%% the input data of the training sample X (:, i) = (X0 (:, i)-mean (X0 (:, i)))/std (X0 (:, i)),%%% standard processing end [M,n]=size (X);
E=ones (m,1);
D=[x0 (:, 4) The output of the];%%% Training sample B=zeros (M,M);
For I=1:m B (i,i) = 1;
C (i,i) =d (i,1);
End A=[-x (:, 1). *d,-x (:, 2). *d,-x (:, 3). *d,d,-b];
B=-E;
F=[0,0,0,0,ones (1,m)];
Lb=[-inf,-inf,-inf,-inf,zeros (1,m)] ';
X=linprog (F,A,B,[],[],LB);
W=[x, X (2,1), X (3,1)];
Cc=x (4,1);
X1=[x (:, 1), X (:, 2), X (:, 3)];
R1=x1*w '-cc;
R=R1;
%%% treatment Table Two x0=xlsread (' Svm3.xls ', ' b2:e19 ');
For i=1:3%%% the input data of the training sample X (:, i) = (X0 (:, i)-mean (X0 (:, i)))/std (X0 (:, i)),%%% standard processing end [M,n]=size (X);
E=ones (m,1);
D=[x0 (:, 4) The output of the];%%% Training sample B=zeros (M,M);
For I=1:m B (i,i) = 1;
C (i,i) =d (i,1);
End A=[-x (:, 1). *d,-x (:, 2). *d,-x (:, 3). *d,d,-b];
B=-E;
F=[0,0,0,0,ones (1,m)];
Lb=[-inf,-inf,-inf,-inf,zeros (1,m)] ';
X=linprog (F,A,B,[],[],LB);
W=[x, X (2,1), X (3,1)];
Cc=x (4,1);
X1=[x (:, 1), X (:, 2), X (:, 3)];
R1=x1*w '-cc;
R=[R,R1];
%%% Processing Table Three x0=xlsread (' Svm4.xls ', ' b2:e19 '); The input data for the for i=1:3%%% Training sample x (:, i) = (x0 (:, i)-mean (X0 (:, i)))/std (X0 (:, i)),%%% standard processing end [M,n]=size (X);
E=ones (m,1);
D=[x0 (:, 4) The output of the];%%% Training sample B=zeros (M,M);
For I=1:m B (i,i) = 1;
C (i,i) =d (i,1);
End A=[-x (:, 1). *d,-x (:, 2). *d,-x (:, 3). *d,d,-b];
B=-E;
F=[0,0,0,0,ones (1,m)];
Lb=[-inf,-inf,-inf,-inf,zeros (1,m)] ';
X=linprog (F,A,B,[],[],LB);
W=[x, X (2,1), X (3,1)];
Cc=x (4,1);
X1=[x (:, 1), X (:, 2), X (:, 3)];
R1=x1*w '-cc;
R=[R,R1];
For I=1:size (r,1) [C,d]=max (R (i,:)); t (i,1) =c; t (i,2) =d;%%% determine which category end disp (' output is: '); t
t =
91.4813 1.0000
105.2576 1.0000
91.4428 1.0000
5.0052 1.0000
10.0521 1.0000
105.9796 1.0000
-0.7916 2.0000
0.5677 2.0000
-0.7547 2.0000
-1.0000 3.0000
-1.2244 2.0000
-0.3183 3.0000
1.1881 3.0000
2.9200 3.0000
1.2706 3.0000
-0.0850 2.0000
1.0000 3.0000
1.1044 3.0000
The results are good ~ ~