KNN (abbreviation of k-nearest neighbor) also called nearest neighbor algorithm

Source: Internet
Author: User

KNN (abbreviation of k-nearest neighbor) also called nearest neighbor algorithm

Machine learning Note--KNN Algorithm 1

Objective

Hello, everyone. I'm a little flower. Senior graduate, stay in school a little something, here and everyone blowing our friends algorithm---KNN algorithm, why call friends algorithm, here I first sell a xiaoguanzi, and listen to my slow way.

Introduction to a KNN algorithm

KNN (k-nearest neighbor abbreviation) is also called the nearest neighbor algorithm. is a statistical method for classification and regression, which was proposed by cover and Hart in 1968. What is called the method of non-female statistics, here is a supplement: no female statistical method is also called nonparametric Statistics, is a branch of statistics, applicable to the situation of the parent group is not clear, small samples, the distribution of the parent group is not normal and not easy to convert to normal. It is characterized by minimizing or not modifying the model of its establishment, which is more suitable for processing small sample data. (How do I feel like Wei Xiaobao?) Haha, a bit far away, you know).

KNN works by: There is a collection of sample data, also known as the training sample set, and each data in the sample set has a label, that is, we know each data in the sample set corresponding to the classification of the respective relationship. After entering new data without a label, each feature of the new data is compared to the feature in the sample set, and then the algorithm extracts the category label of the most similar data (nearest neighbor) in the sample set. How to understand this sentence, now we think, if there are a lot of dogs in the square, these dogs are a female dog with a group of puppies, all varieties have. Every dog knows who his mother is. But there is a dog drink not to mind the water, do not know where their mother, how to find his mother. Then we'll compare the characteristics of the dog with those of the puppies. Then take the most similar dog, then his mother is the single dog's mother ~ ~ We can imagine that a Chihuahua must be far away from Teddy.

K-Nearest Neighbor classification algorithm

1: K is the nearest neighbor number, and D is the collection of training samples

2:for each test sample z= (x ', y ') do

3: Calculate the distance between z and each sample (x, y) ∈d D (× ', X)

4: Select the set of K training samples closest to Z Zzd

5:y ' =argmax∑ (xi,yi) ∈dzi (V=yi)

6:end for

It's a long way off. I want to see this is like me every day the code "pull the foot of the Han." Then we'll go straight to the pseudo code!

Gee, with the pseudo-code is really refreshing. So let's talk about pseudo-code. First of all, you will certainly ask the third line, how to find the distance between z and each example (x, y) ∈d D (× ', X)? The distance between them is what the formula is, and this really asks the right person. Our Huang have already given us to think of good moves, now please take the two veins open, I will preach to you!

Two KNN note the distance measurement of 2.1 KNN algorithm

the distance + from the two instance points in the feature space is the reaction of the similarity of two instance points. (be sure to focus on understanding, you can take a single dog to think about). The characteristic space of K-nearest neighbor model is generally n-dimensional real vector space Rn. The distance used is European-style, but can also be other distances, such as more general LP distance (LP distance) or Minkowski distance (Minkowski distance)

1. Euclidean distance , the most common distance representation between two points or multipoint, also known as Euclid's metric, is defined in Euclidean space, such as the distance between points x = (x1,..., xn) and y = (y1,..., yn):

2. Manhattan distance, Manhattan distance depends on the rotation of the coordinate system, rather than the translation or mapping of the system on the coordinates.

The mystery of K in 2.3 KNN algorithm

Now there are a lot of little friends to ask me, you say this is a friend algorithm, how do you explain? There is an old Chinese saying that people know their friends, that is to say, I see you around friends I know you are what kind of people, like my classmates around the most is more silly fork, the rest of me to explain? It's black again .... I guess some friends will godless me, you Luo Li bar half a day, K is what mean? In fact, K is our recent K friends, such as my recent friend is a person, he is a rogue, then I am a rogue possibility is not very big? If K is four, they have three hooligans in three, one scum. Vote 3:1. OK, Floret is a rascal, judged right ... Look at the following picture:

For the unknown kind of black love, he may be a pentagram or a diamond. When the k=3, Diamond PK Pentagram =2:1, Diamond victory, my wife (Black Star). When k=7, Diamond PK Pentagram =5:2, Pentagram victory, wife to me. This is like a gang brawl, people more powerful Ah! By the way, people are more powerful and valor.

In summary, the choice of K to a large extent determines whether the algorithm can be correctly categorized. This is also the lack of KNN place.

2.3 Added 2.3.1 weighting for KNN

A lot of smart little friends will ask, is there a problem with your distance formula? If I now make a recommendation system for Lily Nets, according to the sub-level for you to recommend a person you will feel appropriate. Object characteristics (height, weight, appearance, education, income, hobbies), if you are a Yan control, you are more concerned about the height and weight of the appearance, the income and education is not too much care, I think this is our real-time blind date often encountered problems. So if we follow the Euclidean distance formula above, it seems that income and education have the same impact on scores and looks as they do in height and weight. So in order to solve this problem, we introduced the concept of weight, such as when we enter the system, the system will let us enter: (height, weight, appearance, education, income, hobbies) weight wt (W1, W2, W3, W4,w5, W6), you think important you enter a larger point, you think it is not important, You have to enter a small certain, of course, these weights are better than 1. The formula for the change is as follows:

Normalization of 2.3.2

Now bring into the formula found that if the height of 180 and income 8000, even if the weight of the income is relatively small, but the impact on the distance is still very large. When dealing with eigenvalues of this different weight range, we usually use the method of normalized values, such as the range of values to be processed from 0 to 1 or 1 to 1. The following formula converts the eigenvalues of any range of values into values from 0 to 1 intervals:

Newvalue= (oldvalue–min)/(Max–min)

Finally, the algorithm of the Floret is explained, the algorithm 1 General introduction of the basic theory, the algorithm is the beginning of the algorithm in the specific life of the example, in short, followed by Floret step by step learning it, after all, I also just contact, there are many understanding not in place also please point out, together to think. Ha ha.

Category: Machine learning

KNN (abbreviation of k-nearest neighbor) also called nearest neighbor algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.