algorithm name : K Nearest Neighbor Algorithm (Knn:k-nearest Neighbor)
The problem is: classify the new object (thing) according to the classified data of the existing object.
Core Idea :
- The object is decomposed into features, because the characteristics of the object determine the classification of the object.
- Measure the degree of each feature and digitize it.
- All eigenvalues constitute a tuple, as the coordinates of that object.
- Calculates the distance between the object to be detected and all known objects, and selects the nearest K known object (k from k nearest neighbor).
- The most frequently occurring classification of these K objects is the classification of the objects to be detected.
Important Premise : There is a need for a group of objects that have been correctly categorized. That's the usual training data.
Important Advantages :
- High precision,
- Insensitive to outliers in training data
Major defects :
- The calculation is large, because each time the result of the operation is not helpful to the subsequent decision, so every decision needs to use all the data to recalculate.
- Storage capacity is large, because every time to recalculate, all need to always carry the training data.
Realistic Example : classify movies.
algorithm Process :
- Characterization: To simplify the problem, assume that the movie has only two categories: Romance and action movies. Then we can break the movie down into two features: kissing and fighting.
- Feature digitization: Count each movie, including the movie to be tested, the number of kisses and fights, assuming X and Y, respectively.
- Coordinate: The number of kisses per movie and the number of hits is the coordinates of the movie (x, y)
- Calculation Distance: dist=sqrt ((x0-x1) **2+ (y0-y1) **2)
- K Nearest neighbor: The smallest k of the selected dist
- If there is more love in this K movie, then the film to be tested is a love movie, otherwise it is an action movie.
Extension Example : handwriting recognition
The reason for listing this example is that, at first glance, handwriting recognition is not related to object collation. But it's actually a relationship. To simplify the problem, we narrowed it down into handwritten recognition numbers. By answering the following questions, you will know how to apply the KNN algorithm.
- What are known objects and objects to be detected? Answer: Known object: The pre-collected handwritten content stored in the system, object to be detected: Every time the user handwriting input content.
- What is the input specific? How to characterize? Answer: The input face as a two-dimensional matrix, the handwriting swept the place is 1, the other place is 0. This matrix has to be size, can be determined by itself, such as 32*64, 64*128 and so on. The feature is all the places in the matrix. That is, how many points the matrix contains, even if there are many features.
- What is the value of each feature? Since the feature represents a point in a particular position in the matrix, the value of the feature is the matrix element of that position, which is 0 or 1
- How to make a coordinate? Answer: Concatenate all rows of the matrix in order to form a giant long line, which is the coordinates of the object
Expand your Mind :
- Feature weighting, the core process of the algorithm, does not consider the importance of the characteristics of the degree.
For more information, please refer to: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Machine learning essay 01-k nearest neighbor algorithm