The forecast results are from 1 to 11 of 1
Load data first, train data, train tags, predict data, predict tags:
if __name__= ="__main__": importtraincontentdata () Importtestcontentdata () importtrainlabeldata () importtestlabeldata ( )
Traindata =[]testdata=[]trainlabel=[]testlabel= []defimporttraincontentdata (): File='F:/goverment/myfinalcode/train_big.csv'fo=open (file) LS=[] forLineinchFo:line=line.replace ("\ t",",") Line=line.replace ("\ n",",") Line=line.replace ("\"",",") Ls.append (Line.split (",")) forIinchLs:li=[] forJinchI:ifj = ="': ContinueLi.append (Float (j)) Traindata.append (LI)defimporttestcontentdata (): File='F:/goverment/myfinalcode/test_big.csv'fo=open (file) LS=[] forLineinchFo:line=line.replace ("\ t",",") Line=line.replace ("\ n",",") Line=line.replace ("\"",",") Ls.append (Line.split (",")) forIinchLs:li=[] forJinchI:ifj = ="': ContinueLi.append (Float (j)) Testdata.append (LI)#Import Training and test data for a categorydefimporttrainlabeldata (): File='F:/goverment/myfinalcode/train_big_label.xls'WB=xlrd.open_workbook (file) WS= Wb.sheet_by_name ("Sheet1") forRinchRange (ws.nrows): Col= [] forCinchRange (1): Col.append (Ws.cell (R, c). Value) Trainlabel.append (col)defimporttestlabeldata (): File='F:/goverment/myfinalcode/test_big_label.xls'WB=xlrd.open_workbook (file) WS= Wb.sheet_by_name ("Sheet1") forRinchRange (ws.nrows): Col= [] forCinchRange (1): Col.append (Ws.cell (R, c). Value) Testlabel.append (col)
The training data, the forecast data is the CSV file format, and is STR, to float and row into the LIS, and then put all the LIS into Traindata or testdata, but the CSV is "," separated, so to "\ T" and so on ",", need to use
Ls.append (Line.split (",")) put in LS, but still str type, I converted into a float, and later sent
It is also possible to convert now, maybe it will be converted later.
After the use of a variety of classifiers, tuning parameters Reference
Http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
Then select the best possible classifier to improve the accuracy rate
" "#19% from Sklearn import Neighbors knn=neighbors. Kneighborsclassifier (n_neighbors=75, leaf_size=51, weights= ' distance ', p=2) knn.fit (Traindata, Trainlabel) predict=k Nn.predict (testdata)" " " "#这个不行 from sklearn.neural_network import mlpclassifier import numpy as NP Traindata = Np.array (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (t Rainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = Testdata.astype (float ) Model=mlpclassifier (activation= ' Relu ', alpha=1e-05, batch_size= ' auto ', beta_1=0.9, beta_2=0.999, early_stopping =false, epsilon=1e-08, hidden_layer_sizes= (5, 2), learning_rate= ' constant ', learning_rate_init=0.001, Max_iter =200, momentum=0.9, Nesterovs_momentum=true, power_t=0.5, random_state=1, Shuffle=true, solver= ' Lbfgs ', tol=0. 0001, validation_fraction=0.1, Verbose=false, Warm_start=false) model.fit (Traindata, trainlabel) predict = MoD El.predict (testdata)" " " "#19% from sklearn.tree import decisiontreeclassifier model=decisiontreeclassifier (class_weight= ' balanced ', MA X_features=68,splitter= ' best ', random_state=5) model.fit (Traindata, trainlabel) predict = Model.predict (testdata) This doesn't work. From sklearn.naive_bayes import MULTINOMIALNB CLF = MULTINOMIALNB (alpha=0.052). Fit (Traindata, Trainlabel) #clf. Fit (Traindata, Trainlabel) predict=clf.predict (testdata)" " " "17% from SKLEARN.SVM import svc CLF = svc (c=150,kernel= ' RBF ', degree=51, gamma= ' auto ', Coef0=0.0,shrinking=false, probability=false,tol=0.001,cache_size=300, Class_weight=none,verbose=false,max_iter=-1,decision_function_shape =none,random_state=none) Clf.fit (Traindata, Trainlabel) predict=clf.predict (testdata)" " " "0.5% from Sklearn.naive_bayes import GAUSSIANNB import numpy as NP GNB = GAUSSIANNB () Traindata = Np.array (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (trainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = Testdata.astype (float) predict = Gnb.fit (Traindata, Trainlabel). Predict (TestData)" " " "16% from Sklearn.naive_bayes import bernoullinb import numpy as NP GNB = BERNOULLINB () Traindata = Np.arra Y (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (trainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = testdata. Astype (float) predict = Gnb.fit (Traindata, Trainlabel). Predict (TestData)" " fromSklearn.ensembleImportRandomforestclassifier Forest= Randomforestclassifier (n_estimators=500,random_state=5, Warm_start=false, Min_impurity_decrease=0.0,min_samples _SPLIT=15)#generate random Forest multi-classifierpredict= Forest.fit (Traindata, Trainlabel). Predict (TestData)
Output accuracy, I also output the forecast to TXT, convenient analysis.
s=Len (predict) F=open ('F:/goverment/myfinalcode/predict.txt','W') forIinchRange (s): F.write (str (predict[i))) F.write ('\ n') F.write ("it's all written.") F.close () K=0Print(s) forIinchRange (s):ifTestlabel[i] = =Predict[i]: K=k+1Print("The accuracy is:", k*1.0/s)
The next step is to output the support of all labels
Print('I'm going to start outputting the support level.') Attribute_proba=Forest.predict_proba (testdata)#Print (Forest.predict_proba (testdata)) #输出各个标签的概率 Print(Type (attribute_proba))ImportXLWT Myexcel=XLWT. Workbook () sheet= Myexcel.add_sheet ('sheet') Si=-1SJ=-1 forIinchAttribute_proba:si=si+1 forJinchI:SJ=sj+1Sheet.write (Si,sj,str (j)) SJ=-1Myexcel.save ("Attribute_proba_small.xls")
The results of the operation are as follows:
But that's not enough, and I'm going to output the number and support of the first 3 predictions.
I opened a class Attri,key used to put the number, weight to put the support degree.
All predicted probabilities (support degrees) for each record are then traversed 3 times. Each time you find the one with the greatest probability, pick out the number and
The probability is stored well, and the value is changed to 0, then the largest one is searched, and the loop is 3 times. Save well and output to Excel
" "Next, output the number of the four with the largest probability of each group" " classAttri:def __init__(self): Self.key=0 Self.weight=0.0label=[] forIinchAttribute_proba:lis=[] k=0 whileK<3: K=k+1P=1mm=0 SJ=-1 forJinchI:SJ=sj+1ifJ>mm:mm=J P=SJ I[p]=0#is it starting from 1? I wrote I "P-1" at first, but I found it wrong when I debug.A=Attri () A.key=P a.weight=mm Lis.append (a) label.append (LIS)Print('pick a few outputs') ImportXLWT Myexcel=XLWT. Workbook () sheet= Myexcel.add_sheet ('sheet') Si=-2SJ=-1 forIinchLabel:si=si+2 forJinchI:SJ=sj+1Sheet.write (Si,sj,str (J.key)) sheet.write (Si+1, Sj,str (j.weight)) SJ=-1Myexcel.save ("Proba_big.xls")
The results of the operation are as follows:
Self-study really hard ah, these are my learning results, accurate or can be improved, for you to help, point a praise it, hey.
Python single-category predictive templates, output support, multiple classifiers, str csv-to-float