Python single-category predictive templates, output support, multiple classifiers, str csv-to-float

Source: Internet
Author: User

The forecast results are from 1 to 11 of 1

Load data first, train data, train tags, predict data, predict tags:

if __name__= ="__main__":          importtraincontentdata ()    Importtestcontentdata ()    importtrainlabeldata ()    importtestlabeldata ( )
Traindata =[]testdata=[]trainlabel=[]testlabel= []defimporttraincontentdata (): File='F:/goverment/myfinalcode/train_big.csv'fo=open (file) LS=[]     forLineinchFo:line=line.replace ("\ t",",") Line=line.replace ("\ n",",") Line=line.replace ("\"",",") Ls.append (Line.split (","))     forIinchLs:li=[]         forJinchI:ifj = ="':                ContinueLi.append (Float (j)) Traindata.append (LI)defimporttestcontentdata (): File='F:/goverment/myfinalcode/test_big.csv'fo=open (file) LS=[]     forLineinchFo:line=line.replace ("\ t",",") Line=line.replace ("\ n",",") Line=line.replace ("\"",",") Ls.append (Line.split (","))     forIinchLs:li=[]         forJinchI:ifj = ="':                ContinueLi.append (Float (j)) Testdata.append (LI)#Import Training and test data for a categorydefimporttrainlabeldata (): File='F:/goverment/myfinalcode/train_big_label.xls'WB=xlrd.open_workbook (file) WS= Wb.sheet_by_name ("Sheet1")     forRinchRange (ws.nrows): Col= []         forCinchRange (1): Col.append (Ws.cell (R, c). Value) Trainlabel.append (col)defimporttestlabeldata (): File='F:/goverment/myfinalcode/test_big_label.xls'WB=xlrd.open_workbook (file) WS= Wb.sheet_by_name ("Sheet1")     forRinchRange (ws.nrows): Col= []         forCinchRange (1): Col.append (Ws.cell (R, c). Value) Testlabel.append (col)

The training data, the forecast data is the CSV file format, and is STR, to float and row into the LIS, and then put all the LIS into Traindata or testdata, but the CSV is "," separated, so to "\ T" and so on ",", need to use

Ls.append (Line.split (",")) put in LS, but still str type, I converted into a float, and later sent
It is also possible to convert now, maybe it will be converted later.

After the use of a variety of classifiers, tuning parameters Reference
Http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
Then select the best possible classifier to improve the accuracy rate
 " "#19% from Sklearn import Neighbors knn=neighbors. Kneighborsclassifier (n_neighbors=75, leaf_size=51, weights= ' distance ', p=2) knn.fit (Traindata, Trainlabel) predict=k Nn.predict (testdata)" "    " "#这个不行 from sklearn.neural_network import mlpclassifier import numpy as NP Traindata = Np.array (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (t Rainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = Testdata.astype (float ) Model=mlpclassifier (activation= ' Relu ', alpha=1e-05, batch_size= ' auto ', beta_1=0.9, beta_2=0.999, early_stopping =false, epsilon=1e-08, hidden_layer_sizes= (5, 2), learning_rate= ' constant ', learning_rate_init=0.001, Max_iter =200, momentum=0.9, Nesterovs_momentum=true, power_t=0.5, random_state=1, Shuffle=true, solver= ' Lbfgs ', tol=0. 0001, validation_fraction=0.1, Verbose=false, Warm_start=false) model.fit (Traindata, trainlabel) predict = MoD El.predict (testdata)" "          " "#19% from sklearn.tree import decisiontreeclassifier model=decisiontreeclassifier (class_weight= ' balanced ', MA        X_features=68,splitter= ' best ', random_state=5) model.fit (Traindata, trainlabel) predict = Model.predict (testdata)      This doesn't work. From sklearn.naive_bayes import MULTINOMIALNB CLF = MULTINOMIALNB (alpha=0.052). Fit (Traindata, Trainlabel) #clf. Fit (Traindata, Trainlabel) predict=clf.predict (testdata)" "        " "17% from SKLEARN.SVM import svc CLF = svc (c=150,kernel= ' RBF ', degree=51, gamma= ' auto ', Coef0=0.0,shrinking=false, probability=false,tol=0.001,cache_size=300, Class_weight=none,verbose=false,max_iter=-1,decision_function_shape =none,random_state=none) Clf.fit (Traindata, Trainlabel) predict=clf.predict (testdata)" "        " "0.5% from Sklearn.naive_bayes import GAUSSIANNB import numpy as NP GNB = GAUSSIANNB () Traindata = Np.array (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (trainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = Testdata.astype (float) predict = Gnb.fit (Traindata, Trainlabel). Predict (TestData)" "        " "16% from Sklearn.naive_bayes import bernoullinb import numpy as NP GNB = BERNOULLINB () Traindata = Np.arra Y (traindata) #TypeError: Cannot perform reduce with flexible type Traindata = Traindata.astype (float) Trainlabel = Np.array (trainlabel) Trainlabel = Trainlabel.astype (float) testdata=np.array (testdata) testdata = testdata. Astype (float) predict = Gnb.fit (Traindata, Trainlabel). Predict (TestData)" "         fromSklearn.ensembleImportRandomforestclassifier Forest= Randomforestclassifier (n_estimators=500,random_state=5, Warm_start=false, Min_impurity_decrease=0.0,min_samples _SPLIT=15)#generate random Forest multi-classifierpredict= Forest.fit (Traindata, Trainlabel). Predict (TestData)

Output accuracy, I also output the forecast to TXT, convenient analysis.

s=Len (predict) F=open ('F:/goverment/myfinalcode/predict.txt','W')     forIinchRange (s): F.write (str (predict[i))) F.write ('\ n') F.write ("it's all written.") F.close () K=0Print(s) forIinchRange (s):ifTestlabel[i] = =Predict[i]: K=k+1Print("The accuracy is:", k*1.0/s)

The next step is to output the support of all labels
    Print('I'm going to start outputting the support level.') Attribute_proba=Forest.predict_proba (testdata)#Print (Forest.predict_proba (testdata)) #输出各个标签的概率    Print(Type (attribute_proba))ImportXLWT Myexcel=XLWT. Workbook () sheet= Myexcel.add_sheet ('sheet') Si=-1SJ=-1 forIinchAttribute_proba:si=si+1 forJinchI:SJ=sj+1Sheet.write (Si,sj,str (j)) SJ=-1Myexcel.save ("Attribute_proba_small.xls")

The results of the operation are as follows:

But that's not enough, and I'm going to output the number and support of the first 3 predictions.
I opened a class Attri,key used to put the number, weight to put the support degree.
All predicted probabilities (support degrees) for each record are then traversed 3 times. Each time you find the one with the greatest probability, pick out the number and
The probability is stored well, and the value is changed to 0, then the largest one is searched, and the loop is 3 times. Save well and output to Excel
    " "Next, output the number of the four with the largest probability of each group" "    classAttri:def __init__(self): Self.key=0 Self.weight=0.0label=[]     forIinchAttribute_proba:lis=[] k=0 whileK<3: K=k+1P=1mm=0 SJ=-1 forJinchI:SJ=sj+1ifJ>mm:mm=J P=SJ I[p]=0#is it starting from 1? I wrote I "P-1" at first, but I found it wrong when I debug.A=Attri () A.key=P a.weight=mm Lis.append (a) label.append (LIS)Print('pick a few outputs')     ImportXLWT Myexcel=XLWT. Workbook () sheet= Myexcel.add_sheet ('sheet') Si=-2SJ=-1 forIinchLabel:si=si+2 forJinchI:SJ=sj+1Sheet.write (Si,sj,str (J.key)) sheet.write (Si+1, Sj,str (j.weight)) SJ=-1Myexcel.save ("Proba_big.xls")

The results of the operation are as follows:

Self-study really hard ah, these are my learning results, accurate or can be improved, for you to help, point a praise it, hey.

Python single-category predictive templates, output support, multiple classifiers, str csv-to-float

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.