Before writing the K-nearest neighbor algorithm (http://boytnt.blog.51cto.com/966121/1569629), the test data is not attached, this time to find a, test the effect of the algorithm. Data from http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ Breast-cancer-wisconsin.data, a sample of breast cancer, attribute description see Breast-cancer-wisconsin.names.
The approximate form of the sample is as follows:
1000025,5,1,1,1,2,1,3,1,1,2
The 1th attribute is number, we don't care, the last attribute is the result, 2 means Benign ( benign), 4 indicates Malignant ( malignant). The remaining 9 properties are sample features. Notice that there is missing data in it. Said, a total of 16 lines, accounting for 2.3%), the calculation of the first to do data cleaning, here simple filling into 0 can.
Use the K-nearest neighbor algorithm to test:
Public void testnearestneighbour () { var trainingset = new List<DataVector<double>> (); var testset = new List<datavector<double>> (); //Read Data var file = new streamreader ("Breast-cancer-wisconsin.txt", Encoding.default); for (int i = 0;i < 699;++i) { string line = file. ReadLine (); var parts = line. Split (', '); var p = new DataVector<double> (9); for ( INT&NBSP;J&NBSP;=&NBSP;0;J&NBSP;<&NBSP;P.DIMENSION;++J) &nbsP; { if ( parts[j + 1] == "?") parts[j + 1] = "0"; p.data[j] = convert.todouble (parts[j + 1]); } p.label = convert.toint32 (parts[10]) == 2 ? "Benign" : "malignant"; //used 600 samples for training, leaving 99 to do test if (i < 600) trainingset.add (p); else &nbsP; testset.add (P); } file. Close (); //Inspection var nn = New nearestneighbour (); nn. Train (Trainingset); int error = 0; foreach (var p in testset) { var label = nn. Classify (P); if (Label != p.label) ++error; } console.writeline ("error = {0}/{1}, {2}%", error, testset.count, (Error * 100.0 / testset.count));}
The result is 99 test samples wrong 2, error rate 2.02%, the effect is good.
This article is from the "Rabbit Nest" blog, please be sure to keep this source http://boytnt.blog.51cto.com/966121/1572149
Machine Learning algorithm: Testing a K-nearest neighbor algorithm