Original post address: http://www.cnblogs.com/nobadfish/articles/5244637.html
The original paper was named Byeond bags of features:spatial Pyramid Matching for recognizing Natural Scene Categories.
The central idea of this article is the recognition algorithm based on the word bag model + pyramid structure. First, the word bag model is introduced briefly.
1. Word Bag model
The bag of words model also became a "word bag" model, which was originally used for natural language processing, and Svetlana used a "word bag" model when classifying images. The main idea of the word bag model is to use the frequency of each word as a feature to classify it, ignoring its word order and grammatical and syntactic elements.
In the image classification application, each image extracted by the feature is considered as a word, then a picture is an article, but this article is composed of picture features, where we do not consider the sequence of features.
Bag of Words has two main steps, the first basic feature extraction, the second step, the Dictionary generation (Advanced feature), the last step, the classifier classification.
1.1 Basic Feature Extraction
Svetlana the basic feature selected in this paper is the SIFT operator, which extracts a 128-dimensional eigenvector from each sift point. Sift feature points extraction, and the calculation of eigenvectors in other blogs are introduced not to repeat, the code in this experiment to extract sift features of the code is used Vlfeat Library vl_sift function.
1.2 Dictionary Generation
After the basic feature extraction, we get to "word", because Word has a certain amount of information redundancy and noise interference, and the volume of data is often very large, the direct use of classification may not be good. So we need to design some "bag". We realize this by clustering, the clustering method in this paper chooses the K-means algorithm constructs, the construction "bag" quantity is 400.
After "Bag" is generated, we will use the frequency of word in each bag as a feature description vector for an image. Such as
1.3 classifiers
The classifier chooses a simple linear SVM classifier.
2. Pyramid structure
Add pyramid structure to the original word bag model.
The typical word bag model is only the histogram statistics on the original image, in the pyramid structure, each layer will divide the images into different regions, and statistical histograms respectively. The histogram vector of the whole pyramid space is the final eigenvector, which can be used for classification. , this figure is a histogram statistic of the 3-layer pyramid structure.
"CV Knowledge Learning" "Turn" beyond Bags of features for rec Scenen categories. An improved method of natural scene recognition based on word bag model