Learning opencv-KNN algorithm

Source: Internet
Author: User

From: http://blog.csdn.net/lyflower/article/details/1728642

 

KNN algorithm in text classification, the idea of this method is very simple and intuitive: If a sample has K similarity in the feature space (that is, the nearest neighbor in the feature space) if most of the samples belong to a certain category, the samples also belong to this category. This method only determines the category of the samples to be classified based on the class of one or more adjacent samples.

The KNN method also relies on the Limit Theorem in principle, but in classification decision-making, it is only related to a very small number of adjacent samples. Therefore, this method can effectively avoid the imbalance of samples. In addition, the KNN method mainly depends on a limited number of adjacent samples, rather than the method for determining the category, therefore, for a sample set with many cross or overlapping classes,
KNN is more suitable than other methods.

The disadvantage of this method is that it requires a large amount of computing, because the distance from each text to be classified must be calculated to all known samples before K Nearest Neighbor points can be obtained. Currently, the common solution is to edit known sample points in advance and remove samples that do not have much effect on classification. There is also a reverse
KNN method can reduce the computing complexity of KNN algorithm and improve the efficiency of classification.

This algorithm is more suitable for automatic classification of class domains with a large sample size, and the class domains with a small sample size use this algorithm to easily produce false scores.

The K-nearest neighbor classifier has a good text classification effect. Statistical analysis of the simulation results shows that, as a text classifier, K-neighbor is second only to support vector machines, obviously better than Linear Least Square Fitting, Naive Bayes and neural networks.

Important:

1: feature dimensionality reduction (generally using the chi method)

2: tail truncation algorithm (three types of tail truncation algorithms)

3. Reduced computing workload

DEMO code:

# Include "ML. H "# include" highgui. H "int main (INT argc, char ** argv) {const int K = 10; int I, J, K, accuracy; float response; int train_sample_count = 100; cvrng rng_state = cvrng (-1); // initialize the cvmat * traindata = cvcreatemat (train_sample_count, 2, cv_32fc1); cvmat * trainclasses = cvcreatemat (cost, 1, cv_32fc1); iplimage * IMG = cvcreateimage (cvsize (500,500), 8, 3); flo AT _ sample [2]; cvmat sample = cvmat (1, 2, cv_32fc1, _ sample); cvzero (IMG); cvmat traindata1, traindata2, trainclasses1, trainclasses2; // form the training samples cvgetrows (traindata, & traindata1, 0, train_sample_count/2); // returns a row of an array or a row of cvrandarr (& rng_state, & traindata1, cv_rand_normal, cvscalar (200,200), cvscalar (50, 50); // fill the array with a random number and update the RNG status cvgetrows (traindata, & traindata2, train _ Sample_count/2, rows); cvrandarr (& rng_state, & traindata2, cv_rand_normal, cvscalar (300,300), cvscalar (); cvgetrows (trainclasses, & trainclasses1, 0, values/2); cvset (& trainclasses1, cvscalar (1); cvgetrows (trainclasses, & trainclasses2, train_sample_count/2, train_sample_count); cvset (& trainclasses2, cvscalar (2); // learn classifier cvknearest KNN (traind ATA, trainclasses, 0, false, k); cvmat * nearests = cvcreatemat (1, K, cv_32fc1); for (I = 0; I  height; I ++) {for (j = 0; j  width; j ++) {sample. data. FL [0] = (float) J; sample. data. FL [1] = (float) I; // estimates the response and get the neighbors 'labels response = KNN. find_nearest (& sample, K, 0, 0, nearests, 0); // compute the number of neighbors representing the majority for (k = 0, Ccuracy = 0; k <K; k ++) {If (nearests-> data. FL [k] = Response) Accuracy ++;} // highlight the pixel depending on the accuracy (or confidence) cvset2d (IMG, I, j, response = 1? (Accuracy> 5? Cv_rgb (18180,120, 0): cv_rgb (, 0): (accuracy> 5? Cv_rgb (0,180, 0): cv_rgb (120,120, 0); }}// display the original training samples for (I = 0; I <train_sample_count/2; I ++) {cvpoint pt; PT. X = cvround (traindata1.data. FL [I * 2]); PT. y = cvround (traindata1.data. FL [I * 2 + 1]); cvcircle (IMG, PT, 2, cv_rgb (255, 0), cv_filled); PT. X = cvround (traindata2.data. FL [I * 2]); PT. y = cvround (traindata2.data. FL [I * 2 + 1]); cvcircle (IMG, PT, 2, cv_rgb (0,255, 0), cv_filled);} cvnamedwindow ("classifier result", 1 ); cvshowimage ("classifier result", IMG); cvwaitkey (0); cvreleasemat (& trainclasses); cvreleasemat (& traindata); Return 0 ;}

 

Http://www.cnblogs.com/xiangshancuizhu/archive/2011/08/06/2129355.html
Improved KNN: http://www.cnblogs.com/xiangshancuizhu/archive/2011/11/11/2245373.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.