This time I will introduce the basic principle of K-Nearest neighbor method (K-nearest neighbor, KNN) and its application in Scikit-learn. This is a machine learning algorithm that looks very simple in structure and principle, the main data structure is the construction and search of KD tree, and there are few examples in Scikit-learn. the principle of K-Nearest neighbor algorithm
K Nearest neighbor principle is very simple, given a data set, for the new data, we first define a number k, and then find the nearest to the new data point K points, and then find the most of the K points in the category, and then use this category as the newly added point of the class.
One such exception is the case of K equals 1, which is called the nearest neighbor, which is the category of the new point, which is the category closest to the newly added data point.
We can see that KNN has several important features, first of all it is very simple, basically do not need to train, in fact, just use a data structure to store a bit. It is a famous representative of "lazy learning".
Second, this value K has a great effect on the results.
The image above is from Wikipedia
For the newly added point (green), what its category should be. If the circle of the real line is counted, it should be red, but the dotted line of this circle to calculate the words should be blue. So in practice for K should be carefully selected. KNN Implementation principle: KD tree
Finding the nearest K point, the simplest algorithm is a linear search, but the price is too big for big data sets, so we have kd trees.
KD tree is a kind of binary tree, which represents the division of K-dimensional space, and the KD tree is equivalent to continuously recursively dividing space with the super-plane until the instance points are divided. Finally, each node represents a hyper-rectangle of a k-dimensional space.
algorithm
A dataset contains n sample points:
D={X1,X2,⋯,XN} d=\{x_1, X_2, \cdots, x_n\}
Each sample is k-dimensional:
Xi= (x1i,x22,⋯,xki) x_i = (x_i^1, x_2^2, \cdots, X_i^k)
(1) Select X1 x^1 as the axis, the median of the X1 x^1 component of all sample points is the segmentation point, and the whole space is divided into two parts.
(2) The sub-interval recursively segmentation, the depth of J-J nodes, select the XL x^l as a segmented axis, wherein l=j (MODK) +1 L = j (mod k) +1, the sub-interval of the XL x_l coordinates of the median as a segmentation point. Repeat (2) until no sample has been split.
The following is an example of a data set:
d={(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)} D = \{(2,3), (5,4), (9,6), (4,7), (8,1), (7,2) \}
The final generation is divided into the following figure
(Image from Baidu Encyclopedia)
in fact, in my opinion, the most important part of the kd-tree is the recursive division of the one hand, and the other, the dimension of each use is sequentially cyclic. kd Tree Search
KD tree is also a binary tree, its search and two fork trees have some similarities, the reason can be accelerated to find is that the data in the tree has a certain relationship between the order, then we sometimes can be based on the relationship between the root node and the data to be found directly over a whole subtrees tree.
Corresponding to the specific two-tree problem, is similar, we are here to consider the nearest neighbor, that is, the situation of k=1.
(1) First we continue to compare the values of each dimension of the query point, down the binary tree down to the leaf node.
(2) This leaf node is not necessarily the nearest neighbor, but we assume that this point is the nearest neighbor, draw a super-ball at a distance from each other, and then go up to the parent node. If the parent node represents a hyper-plane that does not intersect the hyper-sphere, then the upward fallback
(3) If the intersection is also to the parent node of another child to search, if you encounter a closer point, update the nearest neighbor point
(4) The search ends when fallback to the root node. use of KNN in Scikit-learn
The following describes the use of KNN in Scikit-learn, the use of KNN is relatively simple, there is not much to speak of, but in which there is a model called kernel Density estimation, it feels very interesting, here also introduced. Example One
Import NumPy as NP import Matplotlib.pyplot as PLT from matplotlib.colors import listedcolormap from Sklearn import NEIGHB ORS, Datasets n_neighbors = #确定最近邻的个数, k value # import some data to play with Iris = Datasets.load_iris () # We are only TA Ke the first and the features. We could avoid this ugly # slicing by using a Two-dim DataSet X = iris.data[:,: 2] y = iris.target #输入数据只取前两个特征 h =. 02 # Step size in the mesh # Create color Maps cmap_light = Listedcolormap ([' #FFAAAA ', ' #AAFFAA ', ' #AAAAFF ']) Cmap_bold = List Edcolormap ([' #FF0000 ', ' #00FF00 ', ' #0000FF ']) for weights in [' Uniform ', ' distance ']: #这里的 ' uniform ' and ' distance ' represent two weight functions , ' uniform ' means that the weights between the points are the same # ' distance ' represents the reciprocal of the distance, so that the distance near the point weight is greater # We create an instance of Neighbours Classifier and fit the D
Ata. CLF = neighbors. Kneighborsclassifier (N_neighbors, weights=weights) clf.fit (X, y) # Plot the decision boundary.
For that, we'll assign a color to each of the # in the mesh [X_min, X_max]x[y_min, Y_max]. X_min, X_max = x[:, 0].min ()-1, x[:, 0].max () + 1 y_min, Y_max = x[:, 1].min ()-1, x[:, 1].max () + 1 xx, yy = Np.meshgrid (Np.arange ( X_min, X_max, h), Np.arange (Y_min, Y_max, h)) Z = Clf.predict (Np.c_[xx.ravel (), Yy.ravel ()]) # Put the result int o a color plot Z = Z.reshape (xx.shape) #上面这些代码都是很常见的如果有不懂的可以看我上一篇讲SVM的应用的博客 plt.figure () Plt.pcolormesh (x X, yy, Z, Cmap=cmap_light) # Plot also the training points plt.scatter (x[:, 0], x[:, 1], c=y, Cmap=cmap_bold, Edgecol Or= ' K ', s=20) Plt.xlim (Xx.min (), Xx.max ()) Plt.ylim (Yy.min (), Yy.max ()) Plt.title ("3-class classification (k = %i, weights = '%s ') "% (n_neighbors, weights)) plt.show ()
You can see that the code itself is very simple, the core code is the two lines of training code, to understand the two weights is what the line. On the drawing aspect can see my previous blog, support vector machine principle and Practice (ii): Scikit-learn in the use of SVM, some of the above, the better is to the relevant official online view. Example Two
This example is about kernel density estimation (Kernel Density estimation), and KDE is a generation model that we can generate data for ourselves after the training is done. This algorithm is also implemented using a KD tree or balltree.
Kernel density estimation also has an important concept of "kernel function", but this "kernel function" should be different from SVM (I do not see the derivation process, so it is not very clear). For a kernel function K (x;h) K (x;h), an important parameter is H H and the other is the form of the kernel function itself, the commonly used kernel function has
Their images are
So with the kernel function, what is the model?
For a given point