K-Nearest-neighbor algorithm for machine learning (KNN algorithm) __ algorithm

Source: Internet
Author: User
Tags ranges
K-Nearest-neighbor algorithm for machine learning (KNN algorithm) first, the conceptK-Nearest Neighbor algorithm is a simple machine learning method based on the distance between different eigenvalues. This paper simply introduces the next KNN algorithm and uses it to realize handwritten digit recognition.
working principle:There is a set of sample data, also known as the training sample set, and each data in the sample set has a label, that is, we know the relationship between each data in the sample set and the classification it belongs to. After entering new data without tags, we compare each feature of the new data with the characteristic of the data in the sample set, and then extract the classification label of the most similar data (nearest neighbor) in the sample set. In general, we only select the first k in the sample dataset is the most similar data, this is k nearest neighbor algorithm K in the origin (usually k<20). Finally, we select the most frequent categories in K-most similar data as the classification of new data.
algorithm General Flow:

steps to use the algorithm:
the Python implementation of the KNN algorithm:def classify (inx,dataset,labels,k): Datasetsize=dataset.shape[0] Diffmat=tile (InX, (datasetsize,1))-dataSet SqD Iffmat=diffmat**2 sqdistances=sqdiffmat.sum (Axis=1) distances=sqdistances**0.5 sorteddistindicies=distances.ar Gsort () classcount={} for I in range (k): Votelabel=labels[sorteddistindicies[i]] Classcount[votela     Bel]=classcount.get (votelabel,0) +1 sortedclasscount=sorted (classcount,key=operator.itemgetter (0), reverse=True) return sortedclasscount[0][0]
Note: The value range of different characteristics is not the same, usually the data should be normalized processing. Normalized 0-1 formula: newvalue= (oldvalue-min)/(max-min) normalized to-1-1 formula: newvalue= (oldvalue-mid)/mid normalized to 0-1 code: def autonorm (da Taset):
Minvals=dataset.min (0)
Maxvals=dataset.max (0)
Ranges=maxvals-minvals
Normdataset=zeros (Shape (dataSet))
M=DATASET.SHAPE[0]
Normdataset=dataset-tile (Minvals, (m,1))
Normdataset=normdataset/tile (ranges, (m,1))
Return normdataset,ranges,minvals


Second, characteristicsAdvantages: High precision, insensitive to abnormal values, no data input assumptions, disadvantages: High computational complexity, high spatial complexity, applicable data range: Numerical and nominal type.


third, the application of the sceneThe KNN method is more suitable than other methods because the KNN method mainly depends on the neighboring samples, rather than on the method of discriminating the class domain to determine the category.
KNN algorithm can be used not only for classification, but also for regression. By finding the K nearest neighbor of a sample, assigning the average value of these neighbors ' attributes to the sample, you can get the properties of the sample. A more useful approach would be to give different weights (weight) to the effects of different distances on the sample, such as the inverse of the weight to the distance (the nearest neighbor is more powerful than the neighbour's). (A common weighting scheme is to assign values to 1/d for each neighbor, where D is the distance to the neighbor.) This scheme is a generalization of linear interpolation. )
K-Nearest Neighbor algorithm is also suitable for continuous variable estimation, such as the application of the inverse distance weighted average multiple K nearest neighbor to determine the test point value. The functions of the algorithm are:
Calculating European or Markov distances from target area sampling;
The optimal K-neighborhood is selected on the basis of the rmse of cross verification;
Calculates the distance reciprocal weighted average of the multivariate K nearest neighbors.



Iv. Application examples (handwritten digit recognition) Original diagram:
converted vector graphs: Data Preparation:In order to be able to use the classifier above, we must format the image as a vector. We convert the original binary image of the 32*32 into a 1*1024 vector. The code for converting images to vectors is as follows: def img2vector (filename):
Returnvect = Zeros ((1,1024))
FR = open (filename)
For I in range (32):
Linestr = Fr.readline ()
For j in Range (32):
RETURNVECT[0,32*I+J] = Int (linestr[j])
Return Returnvect
Test Code:Def handwritingclasstest ():
Hwlabels = []
Trainingfilelist = Listdir (' trainingdigits ') #load the training set
m = Len (trainingfilelist)
Trainingmat = Zeros ((m,1024))
For I in range (m):
FILENAMESTR = Trainingfilelist[i]
Filestr = Filenamestr.split ('. ') [0] #take off. txt
classnumstr = Int (Filestr.split ('_') [0])
Hwlabels.append (CLASSNUMSTR)
Trainingmat[i,:] = Img2vector (' trainingdigits/%s '% filenamestr)
Testfilelist = Listdir (' testdigits ') #iterate through the test set
Errorcount = 0.0
Mtest = Len (testfilelist)
For I in Range (mtest):
FILENAMESTR = Testfilelist[i]
Filestr = Filenamestr.split ('. ') [0] #take off. txt
classnumstr = Int (Filestr.split ('_') [0])
Vectorundertest = Img2vector (' testdigits/%s '% filenamestr)
Classifierresult = Classify0 (Vectorundertest, Trainingmat, Hwlabels, 3)
Print "The classifier came back with:%d, the real answer is:%d" (Classifierresult, Classnumstr)
if (Classifierresult!= classnumstr): Errorcount = 1.0
Print "\nthe total number of errors is:%d"% errorcount
Print "\nthe total error rate is:%f" (Errorcount/float (Mtest))

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.