Implementation of k-Nearest Neighbor (knn) in C Language

Source: Internet
Author: User

Implementation of k-Nearest Neighbor (knn) in C Language

Recently, I am reading The knn algorithm.

 

Knn is a classification algorithm for data mining. The basic idea is that, in the distance space, if the vast majority of the k neighbors of a sample belong to a certain category, the sample also belongs to this category. As the saying goes, "with the big stream ".

In short, KNN can be seen as: There is a pile of data that you already know about classification, and when a new data enters, it starts to calculate the distance from each point in the training, then, pick out the K points closest to the data and check the type of the K points. Then, sort the new data according to the principle of majority.

The algorithm is simple and clear:

  

The following algorithm steps are taken from Baidu Library (the Library is a good thing). The Code is implemented with reference to this idea:

 

Code:

1 # include <stdio. h> 2 # include <math. h> 3 # include <stdlib. h> 4 5 # define K 3 // Number of Nearest Neighbors k 6 typedef float type; 7 8 // dynamically create a two-dimensional array 9 type ** createarray (int n, int m) 10 {11 int I; 12 type ** array; 13 array = (type **) malloc (n * sizeof (type *)); 14 array [0] = (type *) malloc (n * m * sizeof (type); 15 for (I = 1; I <n; I ++) array [I] = array [I-1] + m; 16 return array; 17} 18 // read data, the first line must be in the format of N = data volume, D = dimension 19 void loaddata (int * n, int * d, Type *** array, type *** karray) 20 {21 int I, j; 22 FILE * fp; 23 if (fp = fopen ("data.txt ", "r") = NULL) fprintf (stderr, "can not open data.txt! \ N "); 24 if (fscanf (fp," N = % d, D = % d ", n, d )! = 2) fprintf (stderr, "reading error! \ N "); 25 * array = createarray (* n, * d); 26 * karray = createarray (2, K); 27 28 for (I = 0; I <* n; I ++) 29 for (j = 0; j <* d; j ++) 30 fscanf (fp, "% f", & (* array) [I] [j]); // read data 31 32 for (I = 0; I <2; I ++) 33 for (j = 0; j <K; j ++) 34 (* karray) [I] [j] = 9999.0; // The default maximum value is 35 if (fclose (fp) fprintf (stderr, "can not close data.txt"); 36} 37 // calculate the Euclidean distance 38 type computedistance (int n, type * avector, type * bvector) 39 {40 int I; 41 type dist = 0.0; 42 for (I = 0; I <n; I ++) 43 dist + = pow (avector [I]-bvector [I], 2 ); 44 return sqrt (dist); 45} 46 // bubble sort 47 void bublesort (int n, type ** a, int choice) 48 {49 int I, j; 50 type k; 51 for (j = 0; j <n; j ++) 52 for (I = 0; I <n-j-1; I ++) {53 if (0 = choice) {54 if (a [0] [I]> a [0] [I + 1]) {55 k = a [0] [I]; 56 a [0] [I] = a [0] [I + 1]; 57 a [0] [I + 1] = k; 58 k = a [1] [I]; 59 a [1] [I] = a [1] [I + 1]; 60 a [1] [I + 1] = k; 61} 62} 63 else if (1 = choice) {64 if (A [1] [I]> a [1] [I + 1]) {65 k = a [0] [I]; 66 a [0] [I] = a [0] [I + 1]; 67 a [0] [I + 1] = k; 68 k = a [1] [I]; 69 a [1] [I] = a [1] [I + 1]; 70 a [1] [I + 1] = k; 71} 72} 73} 74} 75 // count the number of elements in an ordered table 76 type orderedlist (int n, type * list) 77 {78 int I, count = 1, maxcount = 1; 79 type value; 80 for (I = 0; I <(n-1); I ++) {81 if (list [I]! = List [I + 1]) {82 // printf ("count of % d is value % d \ n", list [I], count ); 83 if (count> maxcount) {84 maxcount = count; 85 value = list [I]; 86 count = 1; 87} 88} 89 else 90 count ++; 91} 92 if (count> maxcount) {93 maxcount = count; 94 value = list [n-1]; 95} 96 // printf ("value % f has a Maxcount: % d \ n ", value, maxcount); 97 return value; 98} 99 100 int main () 101 {102 int I, j, k; 103 int D, N; // dimension, data volume, label 104 type ** array = NULL; // data array 105 type ** karray = NULL; // The distance between K neighboring points and its label 106 type * testdata; // test data 107 type dist, maxdist; 108 109 loaddata (& N, & D, & array, & karray); 110 testdata = (type *) malloc (D-1) * sizeof (type); 111 printf ("input test data containing % d numbers: \ n ", d-1); 112 for (I = 0; I <(D-1); I ++) scanf ("% f", & testdata [I]); 113 114 while (1) {115 for (I = 0; I <K; I ++) {116 if (K> N) exit (-1 ); 117 karray [0] [I] = computedistance (D-1, testdata, array [I]); 118 karray [1] [I] = array [I] [D-1]; 119 // printf ("first karray: % 6.2f % 6.0f \ n", karray [0] [I], karray [1] [I]); 120} 121 122 bublesort (K, karray, 0); 123 // for (I = 0; I <K; I ++) printf ("after bublesort in first karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 124 maxdist = karray [0] [K-1]; // initialize the maximum distance of the k Nearest Neighbor array 125 126 for (I = K; I <N; I ++) {127 dist = computedistance (D-1, testdata, array [I]); 128 if (dist <maxdist) 129 for (j = 0; j <K; j ++) {130 if (dist <karray [0] [j]) {131 for (k = K-1; k> j; k --) {// copy the element after j to the next bit, prepare for insertion 132 karray [0] [k] = karray [0] [k-1]; 133 karray [1] [k] = karray [1] [k-1]; 134} 135 karray [0] [j] = dist; // insert to j position 136 karray [1] [j] = array [I] [D-1]; 137 // printf ("I: % d karray: % 6.2f % 6.0f \ n", I, karray [0] [j], karray [1] [j]); 138 break; // do not compare karray follow-up elements 139} 140} 141 maxdist = karray [0] [K-1]; 142 // printf ("I: % d maxdist: % 6.2f \ n ", I, maxdist); 143} 144 // for (I = 0; I <K; I ++) printf (" karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 145 bublesort (K, karray, 1 ); 146 // for (I = 0; I <K; I ++) printf ("after bublesort in karray: % 6.2f % 6.0f \ n ", karray [0] [I], karray [1] [I]); 147 printf ("\ nThe data has a tag: %. 0f \ n ", orderedlist (K, karray [1]); 148 149 printf (" input test data containing % d numbers: \ n ", D-1 ); 150 for (I = 0; I <(D-1); I ++) scanf ("% f", & testdata [I]); 151} 152 return 0; 153}

 

Lab:

Training data data.txt:

N = 6, D = 9
1.0 1.1 1.2 2.1 0.3 2.3 1.4 0.5 1
1.7 1.2 1.4 2.0 0.2 2.5 1.2 0.8 1
1.2 1.8 1.6 2.5 0.1 2.2 1.8 0.2 1
1.9 2.1 6.2 1.1 0.9 3.3 2.4 0
1.0 0.8 1.6 2.1 0.2 2.3 1.6 0.5 1
1.6 2.1 5.2 1.1 0.8 3.6 2.4 0

Prediction data:

1.0 1.1 1.2 2.1 0.3 2.3 1.4

1.7 1.2 1.4 2.0 0.2 2.5 1.2

1.2 1.8 1.6 2.5 0.1 2.2 1.8

1.9 2.1 6.2 1.1 0.9 3.3 2.4

1.0 0.8 1.6 2.1 0.2 2.3 1.6

1.6 2.1 5.2 1.1 0.8 3.6 2.4

Test results:

1.0 1.1 1.2 2.1 0.3 2.3 1.4 0.5 category: 1

1.7 1.2 1.4 2.0 0.2 2.5 1.2 0.8 category: 1

1.2 1.8 1.6 2.5 0.1 2.2 1.8 0.2 category: 1

1.9 2.1 6.2 1.1 0.9 3.3 2.4 5.5 category: 0

1.0 0.8 1.6 2.1 0.2 2.3 1.6 0.5 category: 1

1.6 2.1 5.2 1.1 0.8 3.6 2.4 4.5 category: 0

Lab:

  

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.