First, K-Nearest neighbor algorithm
K-Nearest neighbor algorithm is a classification algorithm, classification algorithm is supervised learning algorithm, supervised learning algorithm and unsupervised learning algorithm the biggest difference is that the supervision of learning needs to tell the machine some of the correct things, that is, training data sets, and unsupervised learning algorithms do not need to prepare these, such as clustering algorithm.
the so-called classification is that the data are required to be discrete (nominal type) and numerical. All of a sudden, so many concept terms are detours, data from the large classification is divided into discrete (nominal) and continuous type. Discrete is the data only in a limited data set (such as: Yes/No, 1/2/3,a/b/c, red/white/Black), continuous type of infinite set (such as: the whole set of real numbers), for the discrete type is suitable to use classifier method is the classification algorithm to solve, and continuous type suitable for using linear regression algorithm to solve, But this is not absolute.
the advantages of the K-nearest neighbor algorithm are: High precision, insensitive to outliers (individual noise data does not have a significant impact on the results); disadvantages are: high computational complexity, high spatial complexity (when the data dimension becomes larger, the matrix distance operation is time consuming resource) ; Applicable data range: Numeric and nominal (distance required data is numeric type).
Let's use the textbook example to simply say how the algorithm works, assuming I have six movies a (3,104, love Movie), B (2,100, Love), C (1, 81, Love), D (101,10, action movie), E (99,5, action movie), F (98,2. Action), where the first number represents the number of fighting shots in the movie, the second number represents the number of kisses, then there is now a new movie g (18,90,? ), is it a love movie or an action movie? In fact, we can judge it at a glance belongs to the love movie! Excuse me?
the K-Nearest neighbor algorithm uses the distance-finding method. A (X1,Y1,Z1), B (X2,Y2,Z2), then d=√ (x2-x1) (y2-y1) 2
the calculated result is (20.5,18.7,19.2,115.3,117.4,118.9), by the distance from small to large order, so-called K-nearest neighbor is to choose the most similar k, such as k= 3, that is (18.7,19.2,20.5) corresponding to the film is B, C, A, most of them are love movies (here are all love films). So we think G is love movie.
This is how the K-Nearest neighbor algorithm works.
here is a sample of the book given in combination with the code to tell you how to program.
Suppose I have four points in the coordinate system, namely (1.0,1.1), (1.0,1.0), (0,0), (0,0.1), the four points belong to A, a, B and B respectively. We need to predict what category this point belongs to (0,0) and actually (0,0) is already in the known training data set, it's okay, we're just doing a little test.
the code given is consistent with the code in the textbook, but it's hard for beginners to understand what each step means, so I added some notes. Specific operating procedures can refer to the steps in the textbook. When the final result returns "B", it proves that we are right.
A classification algorithm of machine learning: K-Nearest neighbor algorithm