Theory of local Invariant Features and Scale Space
Local immutability: Scale immutability and rotation immutability.
Scale immutability: when humans identify an object, whether it is far or near, they can correctly identify it. This is the so-called scale immutability.
The theory of scale space is often associated with biological vision. Some people also call the local immutability feature of images as an immutability Method Based on Biological vision.
Rotation immutability: When this object rotates, we can still identify it correctly. This is called Rotation immutability.
1. Local Invariant Features
Global features: the features extracted from the entire image. It is widely used in image search, such as color histograms.
Local Features: features extracted from the local area of the image, which is often a pixel in the image and its surrounding area.
A good local feature should have the following features:
1) Repeatability: when an object is captured at different angles at different times, the more features it detects, the better.
2) uniqueness: features are unique in this object and can be distinguished from other objects in the scene.
3) locality: features are directed to a certain part of an object, so as to avoid the mismatch problem during occlusion.
4) Quantity: the number of detected features must be large, and the intensity is best to reflect the image content to a certain extent.
5) Accuracy: The obtained features should be precisely located and accurate to pixels.
6) Efficiency: the feature detection algorithm performs fast operations.
Ii. Image Scale Space Theory
When a machine vision system is used to analyze unknown scenes, the computer cannot advance the knowledge of the object scale in the image. Therefore, we need to consider the description of the image at multiple scales at the same time, obtains the optimal scale of an object of interest.
Therefore, in many cases, we will build images into a series of image sets of different scales to detect the features we are interested in at different scales.
2.1 pyramid resolution
The general steps of image pyramid: first, the image is smoothed through a low-pass filter (this step will blur the image, it seems to mimic the human's vision that the distant objects do not have a close-to-point clear principle). Then, this smooth image is sampled (the general sampling ratio is 1/2 in both the horizontal and vertical directions) to produce a series of reduced images.
Mat image = imread (".. /cat.png "); MAT kernel = getgaussiankernel (3, 0.5); // the variance of 3*3 is 0.5 Gaussian Kernel mat pyrimage; filter2d (image, pyrimage, image. depth (), kernel); resize (pyrimage, pyrimage, size (), 0.5, 0.5 );
The pyramid of an image can efficiently (with high computing efficiency) Express the image at multiple scales. However, it lacks a solid theoretical foundation and cannot analyze the various scales of objects in an image.
The scale space of the signal is just proposed to filter the original signal to a group of low-frequency signals through a series of single-Parameter Gaussian filters with increasing width. The obvious question is: in addition to Gaussian filtering, other low-pass filters with parameter t can also be used to generate a scale space.
Later, koenerink, Lindeberg [scale-Space Theory in computer vision], florack, and others proved that Gaussian Kernel is the only transform kernel for scale transformation through different approaches in a precise mathematical form.
The Gaussian filter is used to construct the scale space of the image, so that the scale space has the following properties:
1) weighted average and finite aperture Effect
The signal expression on scale t can be regarded as a series of weighted averages of the original signal in space, and the weight is the Gaussian Kernel with different scale parameters.
The signal expression on the scale t also corresponds to a non-directional aperture function (feature length is σ = T √ ) To observe the signal results. At this time, the feature length in the signal is less σ The Fine Structure of is restrained.
2) cascade smooth
G (μ, σ 1 )? G (μ, σ 2) = g (μ, σ 1 + σ 2)
This property means that the smoothing of different Gaussian check images is continuous.
3) Local Extreme Value Delivery
This feature can be understood from the visual principle of the human eye. When a person looks at an object, the farther the person is, the fewer details the object sees, and the fewer detailed features it has.
Gaussian checksum image filtering has the properties of suppressing local details.
4) scale scaling immutability.
Here is just a question of formula derivation. we add a transform function to the original signal, and then generate a scale space of the Gaussian Kernel for the transformed signal, features such as the extreme points of new signals remain unchanged.
Young has discovered through Physiological studies that the sensory regions of the retina and visual cortex of mammals can be well modeled by Gaussian differentiation within Level 4.
2.2 scale Selection
In general, we do not know the scale of the target we are interested in the image. In this case, we cannot select the appropriate parameter when analyzing the image, for example, edge detection may cause excessive local details due to improper parameters.
In practice, we need to define a feature response function to find an extreme point in different scales.
It should be noted that the image structure is often detected on a rough scale, and the location information is not necessarily the most accurate at this time. Therefore, the image scale analysis usually involves two phases: first, the feature (structure) is detected on the coarse scale, and then precise positioning is performed on the fine scale.