Classification algorithm and our life is closely related to the current data mining is the most widely used algorithms, such as: The known series of temperature, humidity sequence and the history of the statistics, we need to use historical data as a learning set to determine whether tomorrow rain, and bank credit card fraud discrimination.
Classification problem has a learning set, according to the learning set to construct discriminant function, and finally according to the discriminant function to calculate what kind of individual we need to distinguish.
A common classification model and algorithm
Traditional methods
1 and linear discriminant method; 2 and distance discriminant method; 3 , Bayesian classifier;
Modern methods:
1 , decision Tree; 2 , support vector machine; 3 , neural networks;
linear discriminant Method:
Weather Forecast Data (x1,x2 respectively for temperature and humidity,G for Rain )
G=c (1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2) x1=c ( -1.9,-6.9,5.2,5.0,7.3,6.8,0.9,-12.5,1.5,3.8,0.2,- 0.1,0.4,2.7,2.1,-4.6,-1.7,-2.6,2.6,-2.8) X2=c (3.2,0.4,2.0,2.5,0.0,12.7,-5.4,- 2.5,1.3,6.8,6.2,7.5,14.6,8.3,0.8,4.3,10.9,13.1,12.8,10.0) A=data.frame (g,x1,x2) plot (x1,x2) text (x1,x2,G,adj=-0.5 )
Observation can be 1 points distributed in the lower right area,2 points are mainly distributed in the upper area, the naked eye to see the two sets of relatively obvious separation, the principle of linear discriminant method is to find a straight line in the plane, so that the study set 1 is distributed on the straight side, It belongs to the study set 2 , which is distributed on the other side of the line.
Discriminant is allowed to travel, as long as within a certain range.
R the language is expressed as follows:
Library (MASS) Ld=lda (g~x1+x2) z=predict (LD) Newg=z$classy=cbind (G,Z$X,NEWG)
as can be seen from the upper left graph, the prior probability is calculated first, the data 1 , 2 each accounted for 50% , and then calculate X1 and the X2 , the algebraic representation of the discriminant function is given at the end:
The view on the right is visible, NEWG for the prediction of the discriminant, it can be seen that there is only one error in the two categories, while the value of the discriminant function is positive. 2 class, when the value of the discriminant function is negative 1 class.
Distance discriminant method
Calculate the distance between the points to be measured and the various types, select the nearest classification to classify. The distance calculation is very critical, the common distance is the Markov distance:
R language does not have the function of the automatic distance discriminant method, we need to write manually:
mydiscriminiant<-function (Trnx1,trnx2,tstx =null,var.equal=false) {if (Is.null (TSTX) ==true) tstx <-rbind ( TRNX1,TRNX2) if (Is.vector (TSTX) ==true) tstx<-T (As.matrix (TSTX)) Else if (Is.matrix (TSTX)!=true) TsTX <- As.matrix (TSTX) if (Is.matrix (TrnX1)!=true) TrnX1 <-As.matrix (TrnX1) if (Is.matrix (TrnX2)!=true) TrnX2 <- As.matrix (TrnX2) NX <-Nrow (TSTX) blong <-Matrix (Rep (0,nx), Nrow=1,byrow=true,dimnames=list ("Blong", 1:nx)) MU1 <-colmeans (TrnX1); MU2 <-Colmeans (TrnX2) if (var.equal==true | | var.equal==t) {s<-var (rbind (TRNX1,TRNX2)) W <-Mahalanobis (tstx,mu2,s)-mahalanobis (tstx,mu1,s)}else{s1<-var (TrnX1); S2<-var (TrnX2) w<-mahalanobis (TSTX,MU2,S2)-mahalanobis (TSTX,MU1,S1)}for (i in 1:nx) {if (w[i]>0) blong[i] <-1elseblong[i] <-2}blong}
After saving to the current space, call it in the console:
ClassX1 <-data.frame (X1=c (6.6,6.6,6.1,6.1,8.4,7.2,8.4,7.5,7.5,8.3,7.8,7.8), X2=c ( 39,39,47,47,32,6,113,52,52,113,172,172), X3=c (1,1,1,1,2,1,3.5,1,3.5,0,1,1.5)) classX2 <-data.frame (X1=c ( 8.4,8.4,8.4,6.3,7,7,7,8.3,8.3,7.2,7.2,7.2,5.5,8.4,8.4,7.5,7.5,8.3,8.3,8.3,8.3,7.8,7.8), X2=c ( 32,32,32,11,8,8,8,161,161,6,6,6,6,113,113,52,52,97,97,89,56,172,283), X3=c ( 1,2,2.5,4.5,4.5,6,1.5,1.5,0.5,3.5,1.0,1.0,2.5,3.5,3.5,1,1,0,2.5,0,1.5,1,1)) Source ("MYDISCRIMINIANT.R") Mydiscriminiant (Classx1,classx2,var.equal=true)
Watch Blong you can see which category the individual belongs to .
Bayesian classifier:
calculate the probability of individual belonging to all categories, according to the probability of the size of the selection of the classification, has two overall overall discriminant situation, X1 , X2 have probability density function respectively F1 (x) and the F2 (x) , the sample actually comes from X1 but misjudged it as X2 the probability is:
also from X2 but misjudged it as X1 the probability of simple conversion can be;
from X1 was also sentenced to X1 the probability is:
from X2 was also sentenced to X2 also similar to
Set P1,P2 respectively indicate X1 and the X2 a priori probability, then
with L (1|2) represents X2 be misjudged as X1 losses, other similar, in order to be classified more accurately, you need to reduce the average miscalculation loss ( expected cost of MISCLASSIFICATION:ECM As small as possible. :
The above style is Bayes Edition type.
in accordance with the above mathematical deduction, we build our own two total Bayes discriminant procedure:
Mybayes <-Function (trnx1,trnx2,rate=1,tstx=null,var.equal=false) {if (Is.null (TSTX) ==true) Tstx<-rbind (TrnX1 , TrnX2) if (Is.vector (TSTX) ==true) tstx<-t (As.matrix (TSTX)) Else if (Is.matrix (TSTX)!=true) tstx <-As.matrix ( TSTX) if (Is.matrix (TrnX1)!=true) TrnX1 <-As.matrix (TrnX1) if (Is.matrix (TrnX2)!=true) TrnX2 <-As.matrix (TrnX2) NX <-Nrow (TSTX) blong <-Matrix (Rep (0,nx), Nrow=1,byrow=true,dimnames=list ("Blong", 1:nx)) MU1 <-colmeans ( TrnX1); Mu2 <-Colmeans (TrnX2) if (var.equal==true | | var.equal==t) {s<-var (rbind (TRNX1,TRNX2)) w<-Mahalanobis (tstx,mu2,s)-mahalanobis (tstx,mu1,s)}else{s1<-var (TrnX1); S2<-var (TrnX2) Beta <-2*log (rate) +log (det (S1)/det (S2)) W<-mahalanobis (TSTX,MU2,S2)-mahalanobis (TSTX,MU1, S1)}for (i in 1:nx) {if (W[i]>beta) blong[i] <-1elseblong[i] <-2}blong}
taking the weather forecast as a case, let's see how to use Bayse classifier:
We enter data in the console:
trnx1<-Matrix (C (24.8,24.1,26.6,23.5,25.5,27.4,-2,-2.4,-3,-1.9,-2.1,-3.1), ncol=2) Trnx2<-matrix (C ( 22.1,21.6,22,22.8,22.7,21.5,22.1,21.4,-0.7,-1.4,-0.8,-1.6,-1.5,-1,-1.2,-1.3), ncol=2) source ("MyBayes.R") myBayes (TRNX1,TRNX2,RATE=8/6)
All of the samples are correctly judged:
R language and data analysis three: Classification algorithm 1