Machine learning essay 01-k nearest neighbor algorithm

Source: Internet
Author: User

algorithm name : K Nearest Neighbor Algorithm (Knn:k-nearest Neighbor)

The problem is: classify the new object (thing) according to the classified data of the existing object.

Core Idea :

    1. The object is decomposed into features, because the characteristics of the object determine the classification of the object.
    2. Measure the degree of each feature and digitize it.
    3. All eigenvalues constitute a tuple, as the coordinates of that object.
    4. Calculates the distance between the object to be detected and all known objects, and selects the nearest K known object (k from k nearest neighbor).
    5. The most frequently occurring classification of these K objects is the classification of the objects to be detected.

Important Premise : There is a need for a group of objects that have been correctly categorized. That's the usual training data.

Important Advantages :

    1. High precision,
    2. Insensitive to outliers in training data

Major defects :

    1. The calculation is large, because each time the result of the operation is not helpful to the subsequent decision, so every decision needs to use all the data to recalculate.
    2. Storage capacity is large, because every time to recalculate, all need to always carry the training data.

Realistic Example : classify movies.

algorithm Process :

    1. Characterization: To simplify the problem, assume that the movie has only two categories: Romance and action movies. Then we can break the movie down into two features: kissing and fighting.
    2. Feature digitization: Count each movie, including the movie to be tested, the number of kisses and fights, assuming X and Y, respectively.
    3. Coordinate: The number of kisses per movie and the number of hits is the coordinates of the movie (x, y)
    4. Calculation Distance: dist=sqrt ((x0-x1) **2+ (y0-y1) **2)
    5. K Nearest neighbor: The smallest k of the selected dist
    6. If there is more love in this K movie, then the film to be tested is a love movie, otherwise it is an action movie.

Extension Example : handwriting recognition

The reason for listing this example is that, at first glance, handwriting recognition is not related to object collation. But it's actually a relationship. To simplify the problem, we narrowed it down into handwritten recognition numbers. By answering the following questions, you will know how to apply the KNN algorithm.

    1. What are known objects and objects to be detected? Answer: Known object: The pre-collected handwritten content stored in the system, object to be detected: Every time the user handwriting input content.
    2. What is the input specific? How to characterize? Answer: The input face as a two-dimensional matrix, the handwriting swept the place is 1, the other place is 0. This matrix has to be size, can be determined by itself, such as 32*64, 64*128 and so on. The feature is all the places in the matrix. That is, how many points the matrix contains, even if there are many features.
    3. What is the value of each feature? Since the feature represents a point in a particular position in the matrix, the value of the feature is the matrix element of that position, which is 0 or 1
    4. How to make a coordinate? Answer: Concatenate all rows of the matrix in order to form a giant long line, which is the coordinates of the object

Expand your Mind :

    1. Feature weighting, the core process of the algorithm, does not consider the importance of the characteristics of the degree.

For more information, please refer to: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Machine learning essay 01-k nearest neighbor algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.