Brief introduction of K-Nearest neighbor classification algorithm

Source: Internet
Author: User
Tags index sort

1. Brief description:

To put it simply, the valley nearest neighbor algorithm uses the distance method to measure different eigenvalues to classify.

Advantages: High precision, insensitive to outliers, no data input assumptions.

Disadvantages: High computational complexity and high spatial complexity.

Applicable data range: Numerical and nominal type.

2. working principle is

There is a collection of sample numbers, also known as the training sample set, and each data in the sample set has a label, that is, we know the corresponding relationship between each data in the sample set and the owning category. After losing new data with no tags, each feature of the new data is compared with the feature in the sample set , and then the algorithm extracts the classification label of the most similar data (nearest neighbor) in the sample set. In general, we only select the first & most similar data in the sample dataset, which is the source of & in the &-nearest neighbor algorithm, usually * is an integer not greater than 20. Finally, select the most frequently occurring classification in the & most similar data as the classification of the new data.

3. code example:

1 #!/usr/bin/env python

2

3 From numpy Import *

4 # # NumPy machine learning a python library,

5 Import operator

6

7 def createdata ():

8

9 Group=array ([[1.0,1.2],[1.1,1.1],[0.1,0.2],[0.3,0.1]])

10

Labels = [' A ', ' a ', ' B ', ' B ']

Return Group,labels

def classify (intx,dataset,labels,k):

Datasetsize = dataset.shape[0] # # returns dimension information

Diffmat = Tile (Intx, (datasetsize,1))-dataset # # Fill, and make a matrix meet

Sqdiffmat = diffmat**2 # # to Square

Sqdis = Sqdiffmat.sum (axis=1) # # Unity List Meet

Soreddis = Sqdis.argsort () # # Index Sort

classcount={}

For I in range (k):

21st

Votelabel = Labels[soreddis[i]]

Classcount[votelabel] = Classcount.get (votelabel,1) +1

24

Sortclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=true)

return sortclasscount[0][0]

27

if __name__ = = ' __main__ ':

29

Group,labels = Createdata ()

31

Print classify ([0.5,0.3],group,labels,3)





Brief introduction of K-Nearest neighbor classification algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.