Directory random forest Principle random Forest code (Spark Python)
to be continued ...
Back to Catalog
Random Forest Code (Spark Python) |
Code data: Https://pan.baidu.com/s/1jHWKG4I Password: acq1
#-*-coding=utf-8-*- fromPysparkImportsparkconf, SPARKCONTEXTSC= Sparkcontext ('Local') fromPyspark.mllib.treeImportRandomforest, Randomforestmodel fromPyspark.mllib.utilImportmlutils#Load and parse the data file into an RDD of labeledpoint.data = Mlutils.loadlibsvmfile (SC,'Data/mllib/sample_libsvm_data.txt')" "each row uses the following format to represent a sparse feature vector for a tag label index1:value1 index2:value2 ... tempfile.write (b "+1 1:1.0 3:2.0 5:3.0\\n-1\\n-1 2:4 4:5.0 6:6.0 ") >>> Tempfile.flush () >>> examples = Mlutils.loadlibsvmfile (SC, tempfile.name). Collect ( ) >>> tempfile.close () >>> Examples[0]labeledpoint (1.0, (6,[0,2,4],[1.0,2.0,3.0])) >>> Examples[1]labeledpoint ( -1.0, (6,[],[)) >>> Examples[2]labeledpoint (-1.0, (6,[1,3,5],[4.0,5.0,6.0])) " "#Split the data into training and test sets (30% held out for testing) splits the dataset, leaving 30% as the test set(Trainingdata, TestData) = Data.randomsplit ([0.7, 0.3])#Train a randomforest model. Training Decision Tree Models#empty categoricalfeaturesinfo indicates all features is continuous. Null categoricalfeaturesinfo means that all features are continuous#note:use larger numtrees in practice. Note: More trees can be used in practice#Setting featuresubsetstrategy= "Auto" lets the algorithm choose. featuresubsetstrategy= "Auto" means to let the algorithm choose for itselfModel = Randomforest.trainclassifier (Trainingdata, numclasses=2, categoricalfeaturesinfo={}, Numtrees=3, featuresubsetstrategy="Auto", impurity='Gini', maxdepth=4, maxbins=32)#Evaluate model on test instances and compute Test error evaluation modelspredictions = Model.predict (Testdata.map (Lambdax:x.features)) Labelsandpredictions= Testdata.map (LambdaLp:lp.label). zip (predictions) Testerr=Labelsandpredictions.filter (LambdaLP:LP[0]! = lp[1]). COUNT ()/Float (testdata.count ())Print('Test Error ='+ str (TESTERR))#Test Error = 0.0Print('learned classification forest model:')Print(Model.todebugstring ())" "Treeensemblemodel classifier with 3 trees Tree 0:if (feature 517 <= 116.0) If (feature 489 <= 11.0) If (feature 437 <= 218.0) predict:0.0 Else (feature 437 > 218.0) predict:1.0 Else (feature 489 > 11.0) predict:1.0 Else (feature 517 > 116.0) predict:1.0 Tree 1:if (feature 456 <= 0.0) If (feature 471 <= 0.0) predict:1.0 Else (feature 471 > 0.0) predict:0.0 Else (feature 456 &G T 0.0) predict:0.0 Tree 2:if (Feature 377 <= 3.0) If (feature 212 <= 253.0) predict:0.0 Else ( Feature 212 > 253.0) predict:1.0 Else (Feature 377 > 3.0) If (feature 299 <= 204.0) predict:1. 0 Else (Feature 299 > 204.0) predict:0.0" "#Save and load ModelModel.save (SC,"Myrandomforestclassificationmodel") Samemodel= Randomforestmodel.load (SC,"Myrandomforestclassificationmodel")PrintSamemodel.predict (Data.collect () [0].features)#0.0
Back to Catalog
"Spark Mllib crash Treasure" model 06 random Forest "random forests" (Python version)