Spatial pyramid methods represent images

Source: Internet
Author: User
Tags scale image svm

Note: This study note is your own understanding, if there are errors in the place, please correct me, common learning progress.

This paper studies from CVPR paper "discriminative spatial Pyramid", "discriminative Spatial Saliency for Image classificationandBeyond Bags of features:spatial Pyramid Matching
For recognizing Natural Scene Categories, thanks to the author of this paper.

The space Pyramid method indicates that the image is an improvement of the traditional BOF (Bag of Features) method, when the traditional BOF method extracts the image features, the SIFT of each image is extracted first. Feature Description, then the feature description of all the points of interest in the image is clustered to form a BOW visual word bag, and finally the frequency of all visual keywords appearing on each image is counted. Therefore , the BOF is to calculate the feature point distribution feature in the whole image, and then generate the global histogram, so the spatial distribution information of the image can be lost and the image can not be accurately identified. In order to overcome This shortcoming of BOF, the spatial pyramid method is proposed, which is the distribution of the feature points in different resolution, thus acquiring the spatial information of the image. The images are divided into progressively finer mesh sequences at each level of the pyramid, exporting features from each mesh and combining them into a large eigenvector.

1. Image scale Space

the image scale space in the SIFT can be understood as using Gauss to make a convolution of the image, the resolution of the image is still so large, the pixels are still so much, but the details are averaged (smoothed) off, the reason is Gauss, with the surrounding signal than the weak pixels and the middle of the signal is stronger than the point of the average, The average value is, of course, smaller than the strongest signal, which plays a smoothing role. As shown in the following:


Scale variable Gaussian function:


2. Image Pyramid

Pyramid is the main form of multi-scale image representation, and image pyramid is an effective but simple structure to interpret images with multiresolution. The pyramid of an image is a collection of images that are progressively reduced in the form of pyramid-shaped resolutions. As shown in.  


Image pyramid generally consists of two steps:1, the use of low-pass filter smoothing image,2, the smooth image sampling, resulting in a series of image size reduction.

3, the space pyramid represents the image

"discriminative Spatial Pyramid"

The original method is to extract the global features of the original image first, then divide the image into a fine mesh sequence at each pyramid level, extracting the features from each grid in each pyramid level and connecting them to a large eigenvector. However, due to the different amount of information reflected in each local area of the image, a weighted space pyramid method is proposed, and a weighting is assigned to each layer per grid, and each layer is weighted in series by weight. Such as:

The left image is the original method, and the right is the weighted method.

FKL represents the eigenvector of the K-grid of the L - layer , which is represented by a D - dimensional vector, andC (L) represents L the number of meshes for the layer pyramid. In the original method, the spatial pyramid feature vector for an image is represented as FS, as follows:


The weighted method is expressed as FW, as follows:



4. Space Pyramid MatchingSPM

"Beyond Bags of features:spatial Pyramid Matching for recognizing Natural Scene Categories"

Spatial pyramid Matching spatial Pyramid Matching (SPM) is an algorithm for image matching, recognition and classification using space pyramid.

As shown, divide the image of level (i) into a pow (4,i) cell(Bins), Then the histogram feature is counted on each cell , and finally the histogram feature of all level is joined together to form a vector, As the feature of the graph .





The black dots, squares, and doji above represent a certain word in a pitch on an image that belongs to the k --means dictionary;

1) Divide the image into fixed-size blocks, such as left-to-right:1*1,2*2,4*4, and then Count the number of different word in each block morphemes ;

2) from left to right, statistics The histogram in each block of different level;

3) finally , the histogram obtained in each level is concatenated, and the corresponding weights are assigned to each level, and the weights are increased from left to right .

4) The SPM is put into SVM for training and prediction;

the experimental process in this paper is as follows:

        1) with  strong feature detector i.e. sift for feature detection, patch size=16*16 patch The step size of each move spacing grid=8*8

2) build the same method as BOF (i.e. kmeans) with M -words dictionary.


3) use image pyramid divides images into bins of multiple scales ( spatial pyramid layered sub-grid ) , and then calculate the number of word that falls into different categories in each bins , Final Match of image XandY to be(M is the number of keywords ) :(the individual's understanding of this matching kernel function is that this kernel function can be used as a kernel function in SVM to match two images as a class )


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Spatial pyramid methods represent images

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.