K Nearest Neighbor Density Estimation is a classification method instead of a clustering method.
Not the optimal method, which is quite popular in practice.
The common but not necessarily understandable rules are:
1. Calculate the distance (Euclidean or Markov) between the data to be classified and each data in different classes ).
2. Select the smallest distance of the first K data. Here we use the sort method.
3. Compare the first K distances to find out the data of the class that contains the most K data, that is, the class where the data to be classified is located.
The not popular but rigorous rules are:
Given a location feature vector X and a distance measurement method, there are:
1. Apart from N training vectors, class labels are not considered to determine K neighboring. In both cases, K is selected as an odd number, which is generally not a multiple of class M.
2. In addition to k samples, determine the number ki of vectors belonging to the WI, I =,... m class. Obviously sum (Ki) = K.
3. X belongs to the type of Wi with the maximum sample value Ki.
For example, if we look at the green value, is it a triangle or a moment? It depends on how many NN s are used. If 3nn is used, it belongs to a triangle, and if 5nn is used, it belongs to a rectangle.
K should be treated differently in different situations.
The following is the relevant Matlab code:
Clear all; close all; clc; % first class data and number mu1 = [0 0]; % mean S1 = [0.3 0; 0 0.35]; % covariance data1 = mvnrnd (mu1, S1, 100); % generates Gaussian distribution data (data1 (:, 1), data1 (:, 2), '+ '); label1 = ones (1.25); Hold on; % second class data and label mu2 = [1.25 0.3]; S2 = [0.35 0; 0]; data2 = mvnrnd (mu2, S2, 100); plot (data2 (:, 1), data2 (:, 2), 'ro'); label2 = label1 + 1; data = [data1; data2]; label = [label1; label2]; k = 11; % two classes, K takes an odd number to distinguish the test data from the class % test data, the KNN algorithm determines the category of this number for II =. 1: 3 for JJ =-3: 0.1: 3 test_data = [ii jj]; % test data label = [label1; label2]; % The KNN algorithm is started below. Obviously, this is 11nn. % Calculate the distance between the test data and each data in the class. Euclidean distance (or Markov distance) distance = zeros (, 1); for I = distance (I) = SQRT (test_data (1)-data (I, 1 )). ^ 2 + (test_data (2)-data (I, 2 )). ^ 2); end % select the sorting method to locate only the smallest first K data, and sort the data and labels for I = 1: k ma = distance (I ); for J = I + 1:200 if distance (j) <Ma = distance (j); label_ma = label (j); TMP = J; end distance (TMP) = distance (I); % distance (I) = MA; label (TMP) = label (I); % label, mainly using label (I) = label_ma; end cls1 = 0; % number of tests closest to the statistical Class 1 for I = 1: K if label (I) = 1 cls1 = cls1 + 1; end end cls2 = K-cls1; % the number of tests closest to the data in Class 2 if cls1> cls2 plot (II, JJ); % the data that belongs to Class 1 draws a small black dot end endend
In the Code, there are two Gaussian distribution classes. The variables take the data from X =-, y =-to see which class the data belongs.
Run the following command: