Machine learning essay 01-k nearest neighbor algorithm

Last Update:2015-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

algorithm name : K Nearest Neighbor Algorithm (Knn:k-nearest Neighbor)

The problem is: classify the new object (thing) according to the classified data of the existing object.

Core Idea :

The object is decomposed into features, because the characteristics of the object determine the classification of the object.
Measure the degree of each feature and digitize it.
All eigenvalues constitute a tuple, as the coordinates of that object.
Calculates the distance between the object to be detected and all known objects, and selects the nearest K known object (k from k nearest neighbor).
The most frequently occurring classification of these K objects is the classification of the objects to be detected.

Important Premise : There is a need for a group of objects that have been correctly categorized. That's the usual training data.

Important Advantages :

High precision,
Insensitive to outliers in training data

Major defects :

The calculation is large, because each time the result of the operation is not helpful to the subsequent decision, so every decision needs to use all the data to recalculate.
Storage capacity is large, because every time to recalculate, all need to always carry the training data.

Realistic Example : classify movies.

algorithm Process :

Characterization: To simplify the problem, assume that the movie has only two categories: Romance and action movies. Then we can break the movie down into two features: kissing and fighting.
Feature digitization: Count each movie, including the movie to be tested, the number of kisses and fights, assuming X and Y, respectively.
Coordinate: The number of kisses per movie and the number of hits is the coordinates of the movie (x, y)
Calculation Distance: dist=sqrt ((x0-x1) **2+ (y0-y1) **2)
K Nearest neighbor: The smallest k of the selected dist
If there is more love in this K movie, then the film to be tested is a love movie, otherwise it is an action movie.

Extension Example : handwriting recognition

The reason for listing this example is that, at first glance, handwriting recognition is not related to object collation. But it's actually a relationship. To simplify the problem, we narrowed it down into handwritten recognition numbers. By answering the following questions, you will know how to apply the KNN algorithm.

What are known objects and objects to be detected? Answer: Known object: The pre-collected handwritten content stored in the system, object to be detected: Every time the user handwriting input content.
What is the input specific? How to characterize? Answer: The input face as a two-dimensional matrix, the handwriting swept the place is 1, the other place is 0. This matrix has to be size, can be determined by itself, such as 32*64, 64*128 and so on. The feature is all the places in the matrix. That is, how many points the matrix contains, even if there are many features.
What is the value of each feature? Since the feature represents a point in a particular position in the matrix, the value of the feature is the matrix element of that position, which is 0 or 1
How to make a coordinate? Answer: Concatenate all rows of the matrix in order to form a giant long line, which is the coordinates of the object

Expand your Mind :

Feature weighting, the core process of the algorithm, does not consider the importance of the characteristics of the degree.

For more information, please refer to: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Machine learning essay 01-k nearest neighbor algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning essay 01-k nearest neighbor algorithm

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support