K Nearest neighbor (KNN): Classification algorithm
* KNN is a non-parametric classifier (not to make the assumption of distribution form, to estimate the probability density directly from the data), is memory-based learning.
* KNN does not apply to high dimensional data (curse of dimension)
* Machine learning a lot of Python libraries, such as mlpy (more packages), here to achieve just to master the method
* Matlab in the call, see "Matlab classifier Encyclopedia (SVM,KNN, random forest, etc.)"
* KNN algorithm is highly complex (available in KD tree optimization, C can be used Libkdtree or Ann)
* k smaller is easier to fit, but K is very general to reduce classification accuracy (imagine limit: K=1 and K=n (sample number))
This article does not introduce theory, see the code for comments.
knn.py
[Python]View PlainCopy
- From NumPy Import *
- Import operator
- Class KNN:
- def createdataset (self):
- Group = Array ([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
- Labels = [' A ',' a ',' B ',' B ']
- return Group,labels
- def knnclassify (self,testx,trainx,labels,k):
- [N,m]=trainx.shape
- #calculate the distance between TESTX and other training samples
- difference = Tile (TESTX, (N,1))-Trainx # Tile for array and repeat for matrix in Python, = = Repmat in Matlab
- difference = difference * * 2 # Take POW (difference,2)
- Distance = Difference.sum (1) # take the sum of difference from all dimensions
- Distance = distance * * 0.5
- Sortdiffidx = Distance.argsort ()
- # Find the K nearest neighbours
- Vote = {} #create the dictionary
- For I in range (K):
- Ith_label = Labels[sortdiffidx[i]];
- Vote[ith_label] = Vote.get (Ith_label,0) +1 #get (ith_label,0): If dictionary ' vote ' exist key ' Ith_label ', ret Urn Vote[ith_label]; else return 0
- Sortedvote = sorted (Vote.iteritems (), key = Lambda x:x[1], reverse = True)
- # ' key = Lambda x:x[1] ' can be substituted by Operator.itemgetter (1)
- return sortedvote[0][0]
- K = KNN () #create KNN Object
- Group,labels = K.createdataset ()
- CLS = k.knnclassify ([0,0],group,labels,3)
- Print CLS
-------------------
Run:
1. You can run knn.py in the Python Shell
>>>import OS
>>>os.chdir ("/USERS/MBA/DOCUMENTS/STUDY/MACHINE_LEARNING/PYTHON/KNN")
>>>execfile ("knn.py")
Output b
(b = category)
2. Or run directly in terminal
$ python knn.py
3. You can also not write the output in the knn.py, and choose to get the result in the shell, i.e.
>>>import KNN
>>> knn.k.knnclassify ([0,0],knn.group,knn.labels,3)
from:http://blog.csdn.net/abcjennifer/article/details/19757987
K Nearest Neighbor Classification algorithm implementation in Python