1. Brief description:
To put it simply, the valley nearest neighbor algorithm uses the distance method to measure different eigenvalues to classify.
Advantages: High precision, insensitive to outliers, no data input assumptions.
Disadvantages: High computational complexity and high spatial complexity.
Applicable data range: Numerical and nominal type.
2. working principle is
There is a collection of sample numbers, also known as the training sample set, and each data in the sample set has a label, that is, we know the corresponding relationship between each data in the sample set and the owning category. After losing new data with no tags, each feature of the new data is compared with the feature in the sample set , and then the algorithm extracts the classification label of the most similar data (nearest neighbor) in the sample set. In general, we only select the first & most similar data in the sample dataset, which is the source of & in the &-nearest neighbor algorithm, usually * is an integer not greater than 20. Finally, select the most frequently occurring classification in the & most similar data as the classification of the new data.
3. code example:
1 #!/usr/bin/env python
2
3 From numpy Import *
4 # # NumPy machine learning a python library,
5 Import operator
6
7 def createdata ():
8
9 Group=array ([[1.0,1.2],[1.1,1.1],[0.1,0.2],[0.3,0.1]])
10
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
def classify (intx,dataset,labels,k):
Datasetsize = dataset.shape[0] # # returns dimension information
Diffmat = Tile (Intx, (datasetsize,1))-dataset # # Fill, and make a matrix meet
Sqdiffmat = diffmat**2 # # to Square
Sqdis = Sqdiffmat.sum (axis=1) # # Unity List Meet
Soreddis = Sqdis.argsort () # # Index Sort
classcount={}
For I in range (k):
21st
Votelabel = Labels[soreddis[i]]
Classcount[votelabel] = Classcount.get (votelabel,1) +1
24
Sortclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=true)
return sortclasscount[0][0]
27
if __name__ = = ' __main__ ':
29
Group,labels = Createdata ()
31
Print classify ([0.5,0.3],group,labels,3)
Brief introduction of K-Nearest neighbor classification algorithm