Continue with the previous write.
Third, the individual samples are classified.
The basic idea is to calculate the Euclidean distance of the input sample and training sample set first, then sort by distance, select K samples with the smallest distance, vote with the corresponding label of the sample, and the label with the most votes is the label of the input sample.
A more distinctive way of writing is this sentence:
# Sort and return the index theindexlistofsorteddist = Disvalarray.argsort ()
Disvalarray is a one-dimensional array of numpy, storing only the values of the Euclidean distance. The argsort directly sorts the values and returns the original index corresponding to the sorted result. Very convenient. The other is the invocation of the sorted function, which sorts the dictionary by value, using the lambda expression of functional programming. The same thing can be achieved with operator.
Iv. classify the test sample files and statistic the error rate
"' Function:classify the samples in test file by KNN Algorithminput:1. The name of training sample File2. The name of testing sample File3. The K value for KNN4. The name of the log file "Def classifysamplefilebyknn (Samplefilenamefortrain, Samplefilenamefortest, Kvalue, LogFileName) : logFile = open (LogFileName, ' W ') # Load the feature matrix and normailize them feamattrain, Labellisttrain = Loa Dfeaturematrixandlabels (samplefilenamefortrain) Norfeamattrain = Autonormalizefeaturematrix (FeaMatTrain) Feamattest, labellisttest = Loadfeaturematrixandlabels (samplefilenamefortest) norfeamattest = Autonormalizefeaturematrix (feamattest) # Classify the test sample and write the result into log errornumber = 0.0 Testsamplenum = norfeamattest.shape[0] for i in range (testsamplenum): label = CLASSIFYSAMPLEBYKNN (norfeamattest [I,:],norfeamattrain,labellisttrain,kvalue] If label = = Labellisttest[i]: Logfile.write ("%d:right\n"%i) Else Logfile.write ("%d:wrong\n"%i) errornumber + = 1 errorrate = errornumber/testsamplenum logfile.write ("th E Error Rate:%f "%errorrate) Logfile.close () return
It's a lot of code, but it's logically simple. There's nothing to say. In addition, do not know what is the custom of naming in Python? I found that if the variable name was completely expanded, it would be too long-my MacBook Pro was too ugly to show up. This is followed by the variable shorthand naming of C + +.
V. Entrance Call function
The main function, similar to C + +. As soon as you run the knn.py script, the code is executed first:
if __name__ = = ' __main__ ': print "You are running knn.py " CLASSIFYSAMPLEFILEBYKNN (' datingSetOne.txt ', ' DatingSetTwo.txt ', 3, ' Log.txt ')
The k value in KNN I chose is 3.
Not finished, to be continued.
"Play machine learning with Python" KNN * code * Two