Implementation of k-Nearest Neighbor (knn) in C Language
Recently, I am reading The knn algorithm.
Knn is a classification algorithm for data mining. The basic idea is that, in the distance space, if the vast majority of the k neighbors of a sample belong to a certain category, the sample also belongs to this category. As the saying goes, "with the big stream ".
In short, KNN can be seen as: There is a pile of data that you already know about classification, and when a new data enters, it starts to calculate the distance from each point in the training, then, pick out the K points closest to the data and check the type of the K points. Then, sort the new data according to the principle of majority.
The algorithm is simple and clear:
The following algorithm steps are taken from Baidu Library (the Library is a good thing). The Code is implemented with reference to this idea:
Code:
1 # include <stdio. h> 2 # include <math. h> 3 # include <stdlib. h> 4 5 # define K 3 // Number of Nearest Neighbors k 6 typedef float type; 7 8 // dynamically create a two-dimensional array 9 type ** createarray (int n, int m) 10 {11 int I; 12 type ** array; 13 array = (type **) malloc (n * sizeof (type *)); 14 array [0] = (type *) malloc (n * m * sizeof (type); 15 for (I = 1; I <n; I ++) array [I] = array [I-1] + m; 16 return array; 17} 18 // read data, the first line must be in the format of N = data volume, D = dimension 19 void loaddata (int * n, int * d, Type *** array, type *** karray) 20 {21 int I, j; 22 FILE * fp; 23 if (fp = fopen ("data.txt ", "r") = NULL) fprintf (stderr, "can not open data.txt! \ N "); 24 if (fscanf (fp," N = % d, D = % d ", n, d )! = 2) fprintf (stderr, "reading error! \ N "); 25 * array = createarray (* n, * d); 26 * karray = createarray (2, K); 27 28 for (I = 0; I <* n; I ++) 29 for (j = 0; j <* d; j ++) 30 fscanf (fp, "% f", & (* array) [I] [j]); // read data 31 32 for (I = 0; I <2; I ++) 33 for (j = 0; j <K; j ++) 34 (* karray) [I] [j] = 9999.0; // The default maximum value is 35 if (fclose (fp) fprintf (stderr, "can not close data.txt"); 36} 37 // calculate the Euclidean distance 38 type computedistance (int n, type * avector, type * bvector) 39 {40 int I; 41 type dist = 0.0; 42 for (I = 0; I <n; I ++) 43 dist + = pow (avector [I]-bvector [I], 2 ); 44 return sqrt (dist); 45} 46 // bubble sort 47 void bublesort (int n, type ** a, int choice) 48 {49 int I, j; 50 type k; 51 for (j = 0; j <n; j ++) 52 for (I = 0; I <n-j-1; I ++) {53 if (0 = choice) {54 if (a [0] [I]> a [0] [I + 1]) {55 k = a [0] [I]; 56 a [0] [I] = a [0] [I + 1]; 57 a [0] [I + 1] = k; 58 k = a [1] [I]; 59 a [1] [I] = a [1] [I + 1]; 60 a [1] [I + 1] = k; 61} 62} 63 else if (1 = choice) {64 if (A [1] [I]> a [1] [I + 1]) {65 k = a [0] [I]; 66 a [0] [I] = a [0] [I + 1]; 67 a [0] [I + 1] = k; 68 k = a [1] [I]; 69 a [1] [I] = a [1] [I + 1]; 70 a [1] [I + 1] = k; 71} 72} 73} 74} 75 // count the number of elements in an ordered table 76 type orderedlist (int n, type * list) 77 {78 int I, count = 1, maxcount = 1; 79 type value; 80 for (I = 0; I <(n-1); I ++) {81 if (list [I]! = List [I + 1]) {82 // printf ("count of % d is value % d \ n", list [I], count ); 83 if (count> maxcount) {84 maxcount = count; 85 value = list [I]; 86 count = 1; 87} 88} 89 else 90 count ++; 91} 92 if (count> maxcount) {93 maxcount = count; 94 value = list [n-1]; 95} 96 // printf ("value % f has a Maxcount: % d \ n ", value, maxcount); 97 return value; 98} 99 100 int main () 101 {102 int I, j, k; 103 int D, N; // dimension, data volume, label 104 type ** array = NULL; // data array 105 type ** karray = NULL; // The distance between K neighboring points and its label 106 type * testdata; // test data 107 type dist, maxdist; 108 109 loaddata (& N, & D, & array, & karray); 110 testdata = (type *) malloc (D-1) * sizeof (type); 111 printf ("input test data containing % d numbers: \ n ", d-1); 112 for (I = 0; I <(D-1); I ++) scanf ("% f", & testdata [I]); 113 114 while (1) {115 for (I = 0; I <K; I ++) {116 if (K> N) exit (-1 ); 117 karray [0] [I] = computedistance (D-1, testdata, array [I]); 118 karray [1] [I] = array [I] [D-1]; 119 // printf ("first karray: % 6.2f % 6.0f \ n", karray [0] [I], karray [1] [I]); 120} 121 122 bublesort (K, karray, 0); 123 // for (I = 0; I <K; I ++) printf ("after bublesort in first karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 124 maxdist = karray [0] [K-1]; // initialize the maximum distance of the k Nearest Neighbor array 125 126 for (I = K; I <N; I ++) {127 dist = computedistance (D-1, testdata, array [I]); 128 if (dist <maxdist) 129 for (j = 0; j <K; j ++) {130 if (dist <karray [0] [j]) {131 for (k = K-1; k> j; k --) {// copy the element after j to the next bit, prepare for insertion 132 karray [0] [k] = karray [0] [k-1]; 133 karray [1] [k] = karray [1] [k-1]; 134} 135 karray [0] [j] = dist; // insert to j position 136 karray [1] [j] = array [I] [D-1]; 137 // printf ("I: % d karray: % 6.2f % 6.0f \ n", I, karray [0] [j], karray [1] [j]); 138 break; // do not compare karray follow-up elements 139} 140} 141 maxdist = karray [0] [K-1]; 142 // printf ("I: % d maxdist: % 6.2f \ n ", I, maxdist); 143} 144 // for (I = 0; I <K; I ++) printf (" karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 145 bublesort (K, karray, 1 ); 146 // for (I = 0; I <K; I ++) printf ("after bublesort in karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 147 printf ("\ nThe data has a tag: %. 0f \ n ", orderedlist (K, karray [1]); 148 149 printf (" input test data containing % d numbers: \ n ", D-1 ); 150 for (I = 0; I <(D-1); I ++) scanf ("% f", & testdata [I]); 151} 152 return 0; 153}
Lab:
Training data data.txt:
N = 6, D = 9
1.0 1.1 1.2 2.1 0.3 2.3 1.4 0.5 1
1.7 1.2 1.4 2.0 0.2 2.5 1.2 0.8 1
1.2 1.8 1.6 2.5 0.1 2.2 1.8 0.2 1
1.9 2.1 6.2 1.1 0.9 3.3 2.4 0
1.0 0.8 1.6 2.1 0.2 2.3 1.6 0.5 1
1.6 2.1 5.2 1.1 0.8 3.6 2.4 0
Prediction data:
1.0 1.1 1.2 2.1 0.3 2.3 1.4
1.7 1.2 1.4 2.0 0.2 2.5 1.2
1.2 1.8 1.6 2.5 0.1 2.2 1.8
1.9 2.1 6.2 1.1 0.9 3.3 2.4
1.0 0.8 1.6 2.1 0.2 2.3 1.6
1.6 2.1 5.2 1.1 0.8 3.6 2.4
Test results:
1.0 1.1 1.2 2.1 0.3 2.3 1.4 0.5 category: 1
1.7 1.2 1.4 2.0 0.2 2.5 1.2 0.8 category: 1
1.2 1.8 1.6 2.5 0.1 2.2 1.8 0.2 category: 1
1.9 2.1 6.2 1.1 0.9 3.3 2.4 5.5 category: 0
1.0 0.8 1.6 2.1 0.2 2.3 1.6 0.5 category: 1
1.6 2.1 5.2 1.1 0.8 3.6 2.4 4.5 category: 0
Lab: