Directory Decision tree Principle decision tree Code (Spark Python)
See blog: http://www.cnblogs.com/itmorn/p/7918797.html
Back to Catalog
decision Tree Code (Spark Python) |
Code data: Https://pan.baidu.com/s/1jHWKG4I Password: acq1
#-*-coding=utf-8-*- fromPysparkImportsparkconf, SPARKCONTEXTSC= Sparkcontext ('Local') fromPyspark.mllib.treeImportDecisionTree, Decisiontreemodel fromPyspark.mllib.utilImportmlutils#Load and parse the data file into an RDD of labeledpoint.data = Mlutils.loadlibsvmfile (SC,'Data/mllib/sample_libsvm_data.txt')" "each row uses the following format to represent a sparse feature vector for a tag label index1:value1 index2:value2 ... tempfile.write (b "+1 1:1.0 3:2.0 5:3.0\\n-1\\n-1 2:4 4:5.0 6:6.0 ") >>> Tempfile.flush () >>> examples = Mlutils.loadlibsvmfile (SC, tempfile.name). Collect ( ) >>> tempfile.close () >>> Examples[0]labeledpoint (1.0, (6,[0,2,4],[1.0,2.0,3.0])) >>> Examples[1]labeledpoint ( -1.0, (6,[],[)) >>> Examples[2]labeledpoint (-1.0, (6,[1,3,5],[4.0,5.0,6.0])) " "#Split the data into training and test sets (30% held out for testing) splits the dataset, leaving 30% as the test set(Trainingdata, TestData) = Data.randomsplit ([0.7, 0.3])#Train a decisiontree model. Training Decision Tree Models#empty categoricalfeaturesinfo indicates all features is continuous. Null categoricalfeaturesinfo means that all features are continuousModel = Decisiontree.trainclassifier (Trainingdata, numclasses=2, categoricalfeaturesinfo={}, impurity='Gini', Maxdepth=5, maxbins=32)#Evaluate model on test instances and compute Test error prediction and testing accuracypredictions = Model.predict (Testdata.map (Lambdax:x.features)) Labelsandpredictions= Testdata.map (LambdaLp:lp.label). zip (predictions) Testerr=Labelsandpredictions.filter (LambdaLP:LP[0]! = lp[1]). COUNT ()/Float (testdata.count ())Print('Test Error ='+ str (TESTERR))#Test Error = 0.04#Save and load model saving and loading modelsModel.save (SC,"Mydecisiontreeclassificationmodel") Samemodel= Decisiontreemodel.load (SC,"Mydecisiontreeclassificationmodel")PrintSamemodel.predict (Data.collect () [0].features)#0.0
Back to Catalog
"Spark Mllib crash Treasure" model 05 decision tree "Decision tree" (Python edition)