"Play machine learning with Python" KNN * code * Two

Source: Internet
Author: User
Tags sorts

Continue with the previous write.


Third, the individual samples are classified.


The basic idea is to calculate the Euclidean distance of the input sample and training sample set first, then sort by distance, select K samples with the smallest distance, vote with the corresponding label of the sample, and the label with the most votes is the label of the input sample.

A more distinctive way of writing is this sentence:

# Sort and return the index    theindexlistofsorteddist = Disvalarray.argsort ()
Disvalarray is a one-dimensional array of numpy, storing only the values of the Euclidean distance. The argsort directly sorts the values and returns the original index corresponding to the sorted result. Very convenient. The other is the invocation of the sorted function, which sorts the dictionary by value, using the lambda expression of functional programming. The same thing can be achieved with operator.


Iv. classify the test sample files and statistic the error rate

"' Function:classify the samples in test file by KNN Algorithminput:1. The name of training sample File2. The name of testing sample File3. The K value for KNN4. The name of the log file "Def classifysamplefilebyknn (Samplefilenamefortrain, Samplefilenamefortest, Kvalue, LogFileName) : logFile = open (LogFileName, ' W ') # Load the feature matrix and normailize them feamattrain, Labellisttrain = Loa    Dfeaturematrixandlabels (samplefilenamefortrain) Norfeamattrain = Autonormalizefeaturematrix (FeaMatTrain) Feamattest, labellisttest = Loadfeaturematrixandlabels (samplefilenamefortest) norfeamattest =    Autonormalizefeaturematrix (feamattest) # Classify the test sample and write the result into log errornumber = 0.0 Testsamplenum = norfeamattest.shape[0] for i in range (testsamplenum): label = CLASSIFYSAMPLEBYKNN (norfeamattest        [I,:],norfeamattrain,labellisttrain,kvalue] If label = = Labellisttest[i]: Logfile.write ("%d:right\n"%i)           Else Logfile.write ("%d:wrong\n"%i) errornumber + = 1 errorrate = errornumber/testsamplenum logfile.write ("th E Error Rate:%f "%errorrate) Logfile.close () return

It's a lot of code, but it's logically simple. There's nothing to say. In addition, do not know what is the custom of naming in Python? I found that if the variable name was completely expanded, it would be too long-my MacBook Pro was too ugly to show up. This is followed by the variable shorthand naming of C + +.


V. Entrance Call function

The main function, similar to C + +. As soon as you run the knn.py script, the code is executed first:

if __name__ = = ' __main__ ':    print "You are running knn.py    " CLASSIFYSAMPLEFILEBYKNN (' datingSetOne.txt ', ' DatingSetTwo.txt ', 3, ' Log.txt ')

The k value in KNN I chose is 3.


Not finished, to be continued.


"Play machine learning with Python" KNN * code * Two

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.