Machine learning Note (ii)--k-nearest neighbor algorithm

Last Update:2015-03-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview of the K- neighbor algorithm

the K-nearest algorithm is classified by measuring the distance between different eigenvalue values.

Advantages: High accuracy, insensitive to outliers, no data input assumptions

Cons: High computational complexity, high spatial complexity

Use data range: Numeric and nominal

How it works : There is a collection of sample data (also known as a training sample set), and each data in the sample set has a label, that is, we know the corresponding relationship between each data in the sample set and the owning one. After entering new data without a label, each feature of the new data is compared to the feature in the sample set, and the algorithm extracts the category labels of the most similar data (nearest neighbor) in the sample set. In general, the first K most similar data in a sample dataset is selected, which is the source of K in the K- nearest neighbor algorithm , usually k is not much more than the whole number. Finally, select the most frequently occurring classification of the K Most similar data as the classification of the new data.

K- Neighbor Algorithm Code analysis:

For each point in the dataset of the Unknown category property, do the following:

(1) Calculate the distance between the point in the data set of the known category and the current point;

(2) Sorting in ascending order of distance;

(3) Select k points with the minimum distance from the current point;

(4) Determine The frequency of occurrence of the category of the first k points;

(5) Returns the category with the highest frequency of the first K points as the predicted classification of the current point.

The code is as follows:

1 defclassify0 (InX, DataSet, labels, k):2Datasetsize =Dataset.shape[0]3Diffmat = Tile (InX, (datasetsize,1))-DataSet4Sqdiffmat = diffmat**25Sqdistances = Sqdiffmat.sum (Axis=1)6distances = sqdistances**0.57Sorteddistindicies =Distances.argsort ()8Classcount={}          9      forIinchRange (k):TenVoteilabel =Labels[sorteddistindicies[i]] OneClasscount[voteilabel] = Classcount.get (voteilabel,0) + 1 ASortedclasscount = sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True) -     returnSORTEDCLASSCOUNT[0][0]

Code annotations:

"1"shape[0] calculates the number of rows of the matrix,shape[1] calculates the number of columns of the matrix

"2"tile array InX to datasetsize rows 1 column repeats, for example: IntX to be [0, 0] , you Tile after calculation

[0, 0]

... datasetsize line

3 " **&NBSP; refers to the sub-party, DIFFMAT**2&NBSP; diffmat For example Span style= "Font-family:times New Roman;" >[1,&NBSP;2]**2&NBSP;=&NBSP;[1,&NBSP;4]

"4"sqdiffmat.sum (axis=1) refers to the and of each row of elements in an array, and these and then form a Array :

Example: >>>a = Array ([[[1, 2], [2, 4]])

>>>s = A.sum (Axis=1)

>>>s

Array ([3, 6])

>>>a = Array ([[1, 2, 3], [2, 3, 4]])

>>>s = A.sum (Axis=1)

>>>s

Array ([6, 9])

However, if the array has only one row, such as Array ([1, 2]), then sum (axis=1) cannot be used, only sum ( )

"5" ClassCount = {}Create a newDict,Dictprovided byGetmethod, ifKeydoes not exist, can returnNone, or the one you specifyvalue, hereclasscount.get (voteilabel, 0)means there is no relativeKeyValue ofvalueThe return0

For example: >>> d = {' Michael ': Up, ' Bob ': +, ' Tracy ': 85}

>>> d[' Michael ']

>>> d[' Thomas '

Traceback (most recent):

File "<stdin>", line 1, in <module>

Keyerror: ' Thomas '

To avoid a key that does not exist, there are two ways to determine whether a key exists by using in:

>>> ' Thomas ' in D

False

The second is the get method provided by Dict , if key does not exist, you can return None, or your own specified value:

>>> d.get (' Thomas ')

>>> d.get (' Thomas ',-1)

-1

"6"sorted () by classcount dictionary 2 Elements (that is, the number of occurrences of a category) from large to small

Test the code to run the effect:

knn.py File:

1  fromNumPyImport*2 Importoperator3 defclassify0 (InX, DataSet, labels, k):4Datasetsize =Dataset.shape[0]5Diffmat = Tile (InX, (datasetsize,1))-DataSet6Sqdiffmat = diffmat**27Sqdistances = Sqdiffmat.sum (Axis=1)8distances = sqdistances**0.59Sorteddistindicies =Distances.argsort ()TenClasscount={}           One      forIinchRange (k): AVoteilabel =Labels[sorteddistindicies[i]] -Classcount[voteilabel] = Classcount.get (voteilabel,0) + 1 -Sortedclasscount = sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True) the     returnSortedclasscount[0][0] -  - defCreateDataSet (): -Group = Array ([[[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]) +Labels = ['A','A','B','B'] - returnGroup, labels

Machine learning Note (ii)--k-nearest neighbor algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning Note (ii)--k-nearest neighbor algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning Note (ii)--k-nearest neighbor algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support