Machine learning--k-Nearest neighbor (KNN) algorithm

Last Update:2015-04-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, the basic principle There is a collection of sample data (also called a training sample set), and there is a label for each data in the sample set. After entering new data without a label, each feature of the new data is compared to the feature in the sample set, and then the algorithm extracts the category label of the most similar data (nearest neighbor) in the sample set. We generally select the most similar data for the first K (k is usually not greater than 20) in the sample set, and finally select the most frequently occurring classification of the K most similar data as the classification of the new data. second, the algorithm flow1) Calculate the distance between the point in the data set of the known category and the current point;2) Sort by the increment order of distance;3) Select K points with the minimum distance from the current point;4) Determine the occurrence frequency of the category of the first k points;5) Returns the category with the highest frequency of the first K points as the predicted classification of the current point. three, the characteristics of the algorithmAdvantages: High precision, insensitive to outliers, no data input assumptions. Disadvantages: High computational complexity and high spatial complexity. applicable data range: Numerical and nominal type. iv. Python code implementation1. Create a data setdef create_data_set ():
Group = Array ([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])
Labels = [' A ', ' a ', ' B ', ' B ']
Return group, Labels2. Implement KNN algorithm

##############################
#功能: Dividing each set of data into a class
#输入变量: Inx, Data_set,labels,k
# Classification of vectors, sample data, tags, k nearest neighbor samples
#输出变量: sorted_class_count[0][0] Select the most recent category label
##############################

def classify0 (Inx, Data_set, labels, k):
Data_set_size = data_set.shape[0] # Gets the number of rows in the array

# using Tiles (Inx, (data_set_size, 1)) to construct data_set_size*1 Inx on the original basis
# each row of data corresponds to the coordinates of a vector point
# sum each row of data to get a data_set_size*1 matrix
# final calculation of Euclidean distance
Diff_mat = Tile (Inx, (data_set_size, 1))-data_set
Sq_diff_mat = diff_mat**2
Sq_distances = Sq_diff_mat.sum (Axis=1)
distances = sq_distances**0.5

# The Argsort function returns the index value of the array value from small to large
Sorted_dist_indicies = Distances.argsort ()

Class_count = {}
For I in Xrange (k):
Vote_label = Labels[sorted_dist_indicies[i]]

# get equals a if...else ... Statement
# If the parameter Vote_label is not in the dictionary then return parameter 0, if Vote_label returns Vote_label corresponding value value in the dictionary
Class_count[vote_label] = class_count.get (Vote_label, 0) + 1

# items Returns a key-value pair in a dictionary in a list, Iteritems returns a key-value pair with an iterator object, and the key-value pair is stored in tuples, which is the way [(), ()]
# operator.itemgetter (0) Gets the value of the No. 0 field of the object, which is the key value returned
# operator.itemgetter (1) Gets the value of the 1th field of the object, that is, the value is returned
# Operator.itemgetter defines a function that acts on an object to get a value
# reverse=true is sorted in descending order
Sorted_class_count = sorted (Class_count.iteritems (), Key=operator.itemgetter (1), reverse=true)

return sorted_class_count[0][0]

3. Code Testdef main ():
Group, labels = Create_data_set ()
Sorted_class_labels = Classify0 ([0, 0], group, labels, 3)
print ' sorted_class_labels= ', Sorted_class_labelsif __name__ = = ' __main__ ':
Main ()

Machine learning--k-Nearest neighbor (KNN) algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning--k-Nearest neighbor (KNN) algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning--k-Nearest neighbor (KNN) algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support