Catalog Naive Bayes principle naive Bayesian code (Spark Python)
See blog: http://www.cnblogs.com/itmorn/p/7905975.html
Back to Catalog
naive Bayesian code (Spark Python) |
Code data: Https://pan.baidu.com/s/1jHWKG4I Password: acq1
#-*-coding=utf-8-*- fromPysparkImportsparkconf, SPARKCONTEXTSC= Sparkcontext ('Local') fromPyspark.mllib.regressionImportLabeledpoint, LINEARREGRESSIONWITHSGD, Linearregressionmodel#Load and parse the data to load and parse it, converting each number to a floating point. The first number of each line as a marker, followed by a featuredefParsepoint (line): Values= [Float (x) forXinchLine.replace (',',' '). Split (' ')] returnLabeledpoint (Values[0], values[1:]) Data= Sc.textfile ("Data/mllib/ridge-data/lpsa.data")PrintData.collect () [0]#-0.4307829,-1.63735562648104-2.00621178480549-1.86242597251066-1.024....-0.864466507337306Parseddata =Data.map (parsepoint)PrintParseddata.collect () [0]#( -0.4307829,[-1.63735562648,-2.00621178481,-1.86242597251,-1.024....,-0.864466507337])#Build ModelModel = Linearregressionwithsgd.train (Parseddata, iterations=1000, step=0.1)#Evaluate the model on training data evaluates the error on the training setValuesandpreds = Parseddata.map (LambdaP: (P.label, Model.predict (p.features))) MSE=valuesandpreds. Map (LambdaVP: (Vp[0]-vp[1]) **2). Reduce (LambdaX, y:x + y)/Valuesandpreds.count ()Print("Mean squared Error ="+ str (MSE))#Mean squared Error = 6.32693963099#Save and load model saving models and loading modelsModel.save (SC,"Pythonlinearregressionwithsgdmodel") Samemodel= Linearregressionmodel.load (SC,"Pythonlinearregressionwithsgdmodel")PrintSamemodel.predict (Parseddata.collect () [0].features)#-1.86583391312
Back to Catalog
"Spark Mllib crash canon" model 04 Naive Bayes "Naive Bayes" (Python version)