Resources
"1" Spark MLlib machine Learning Practice
"2" Statistical learning methods
1. Logistic distribution
Set X is a continuous random variable, and x obeys a logistic distribution means X has the following distribution function and density function
,。 where u is the positional parameter and γ is the shape parameter. Such as:
The distribution function is symmetrically centered (U,1/2), satisfying: the smaller the shape parameter γ, the faster the center part increases.
2. Logistic regression model
The two-item logistic regression model is a classification model, by the conditional probability P (y| x) indicates that here the random variable x takes the real number, and Y takes 0 or 1. Defined:
And
The logistic regression compares two conditional probabilities, and the X is classified as the one of the larger conditional probabilities. Essentially it is converting the output of a linear function WX + b into a conditional probability.
The multiple logistic regression model is an extension of two models, supporting multiple classification problems, with the following model:
3. Logistic regression Spark Mlib example
Packagecom.fredric.spark.logisticImportOrg.apache.spark.mllib.classification.LogisticRegressionWithSGDImportorg.apache.spark.mllib.linalg.VectorsImportOrg.apache.spark.mllib.regression.LabeledPointImportOrg.apache.spark. {sparkcontext, sparkconf}/*-* Logistic regression * Fredric*/Object Logistic {def main (args:array[string]): Unit={val conf=NewSparkconf (). Setmaster ("local"). Setappname ("Logistic") Val SC=Newsparkcontext (conf) Val Array=NewArray[labeledpoint] (10) //constructs the training data, the virtual one classification with the value of 5//for one-dollar, two-item logistic regression classification for(I <-0 to 9){ if(I >= 5) {Array (i)=NewLabeledpoint (1, Vectors.dense (i))}Else{Array (i)=NewLabeledpoint (0, Vectors.dense (i))} } Val Data=Sc.makerdd (Array); Val Model= Logisticregressionwithsgd.train (data, 50) //model.weights Output [0.20670127500478114]println (model.weights) var test=-2//when input is-1, the return probability is 0.0//when input is 11 o'clock, the return probability is 1.0Val result =model.predict (vectors.dense (test)) println (Result)//Verify the method//Calculate P (y=1| x), calculate the conditional probability of input x returning 1Val res1= Math.exp (model.weights (0) *test)/(1 + math.exp (model.weights (0) *test)) //Calculate P (y=0| x), calculate the conditional probability of input x returning 0Val Res0 = 1/(1 + math.exp (model.weights (0) *test)) //output: For target:-2 propalitity for 1 is:0.3980965348017618 propalitity for 0 is:0.6019034651982381//According to the comparison of two conditional probabilities, 2 belongs to category 0.println ("for Target:" + Test + "propalitity for 1 are:" + res1 + "propalitity for 0 is:" +res0)}}
Machine learning note Four classification algorithm-logistic regression