The code of this article, "data analysis and mining actual combat", on the basis of the supplement to improve a bit ~
Code is based on the SVM classifier Python implementation, the original chapter title and code relationship is not small, or to give the method of processing good data is missing, the source is the image data is invisible, a word is the practice classifier (? belongs)
Source code directly to the good k=30, try to choose how to select the rules set relatively single, have a good idea please advise yo
1 #-*-coding:utf-8-*-2 """3 Created on Sun 12:19:34 20184 5 @author: Luove6 """7 fromSklearnImportSVM8 fromSklearnImportMetrics9 ImportPandas as PDTen ImportNumPy as NP One fromNumpy.randomImportShuffle A #From random import seed - #Import Pickle #保存模型和加载模型 - ImportOS the - - OS.GETCWD () -Os.chdir ('D:/analyze/python Matlab/python/bookcodes/python Data Analysis and mining actual combat/book supporting data, code/chapter9/demo/code') +Inputfile ='.. /data/moment.csv' -Data=pd.read_csv (inputfile) + A Data.head () atData=Data.as_matrix () - #seed (Ten) -Shuffle (data)#random reflow, column, and column rearrangement, since each operation randomly results in a different result, you can set the seed -n=0.8 -Train=data[:int (nlen (data)),:] -Test=data[int (nlen (data)):,:] in - #Modeling Data Grooming to #k=30 +m=100 -RECORD=PD. DataFrame (columns=['Acurrary_train','acurrary_test']) the forKinchRange (1,m+1): * #k feature expansion multiples, eigenvalues of 0-1, each other is too small to differentiate, expand to improve the sensitivity and accuracy $x_train=train[:,2:]*kPanax Notoginsengy_train=train[:,0].astype (int) -x_test=test[:,2:]*k they_test=test[:,0].astype (int) + AModel=SVM. SVC () the Model.fit (X_train,y_train) + #pickle.dump (Model,open ('.. /tmp/svm1.model ', ' WB ') #保存模型 - #model=pickle.load (Open ('.. /tmp/svm1.model ', ' RB ')) #加载模型 $ #Model Evaluation Confusion matrix $cm_train=Metrics.confusion_matrix (Y_train,model.predict (x_train)) -cm_test=Metrics.confusion_matrix (Y_test,model.predict (x_test)) - thePd. DataFrame (Cm_train,index=range (1,6), Columns=range (1,6)) -Accurary_train=np.trace (Cm_train)/cm_train.sum ()#Accurate rate CalculationWuyi #Accurary_train=model.score (x_train,y_train) #使用model自带的方法求准确率 thePd. DataFrame (Cm_test,index=range (1,6), Columns=range (1,6)) -Accurary_test=np.trace (cm_test)/cm_test.sum () WuRecord=record.append (PD. DataFrame ([accurary_train,accurary_test],index=['Accurary_train','accurary_test']). T) - AboutRecord.index=range (1,m+1) $Find_k=record.sort_values (by=['Accurary_train','accurary_test'],ascending=false)#generate a copy without altering the original variable -find_k[(find_k['Accurary_train']>0.95) & (find_k['accurary_test']>0.95) & (find_k['accurary_test']>=find_k['Accurary_train'])] - #Len (find_k[(find_k[' Accurary_train ']>0.95) & (find_k[' Accurary_test ']>0.95)]) - " "k=33 A Accurary_train accurary_test + 0.950617 0.95122 the " " - " "Calculate the overall $ Accurary_data the 0.95073891625615758 the " " theK=33 thex_train=train[:,2:]*k -y_train=train[:,0].astype (int) inModel=SVM. SVC () the Model.fit (X_train,y_train) the Model.score (X_train,y_train) About Model.score (Datax_train,datay_train) thedatax_train=data[:,2:]*k thedatay_train=data[:,0].astype (int) theCm_data=Metrics.confusion_matrix (Datay_train,model.predict (datax_train)) +Pd. DataFrame (Cm_data,index=range (1,6), Columns=range (1,6)) -Accurary_data=np.trace (Cm_data)/cm_data.sum () theAccurary_data
REF:
"Data analysis and mining"
Source code and data need to be able to pick up: https://github.com/Luove/Data
Implementation of classifier Python based on SVM