"Computer vision" uses image histograms to detect specific objects (Meanshift, Camshift algorithms)

Last Update:2014-10-16 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Histogram introduction

A histogram is a simple table that gives the number of pixels in an image or group of images that have a given number. As a result, the histogram of grayscale images has 256 entries (or containers). Container No. No. 0 gives the number of pixels with a value of 0, the number of pixels in container 1th gives a value of 1, and so on.

Histogram inverse projection

Histogram is an important feature of image content. If the area of an image shows a unique texture or a unique object, the histogram of the area can be seen as a probability function that gives the probability that a pixel belongs to that texture or object. This allows us to detect specific content using the histogram of the image.
The inverse projection histogram method is a simple method to map the target probability distribution to the observed image. The function is to replace each pixel value in an input image into a histogram in the area of interest (ROI) corresponding to the probability value
whose C + + API in Opencv2 is:

cv::calcBackProject(&image,                1,                //一幅图像    channels,        //通道数量    histogram,        //进行反投影的直方图    result,            //生成的反投影图像    ranges,            //每个维度的值域    255.0              //缩放因子    );

HSV Color Space

HSV (hue,saturation,value) is a color space created by the visual characteristics of the color (Hue h, saturation s, luminance V), also known as the hexagonal cone model (Hexcone models).

Hue h

Degree of saturation s

The value range is 0.0~1.0, and the larger the value, the more saturated the color.

Brightness V

The value range is 0.0 (black) ~1.0 (white).
The three-dimensional representation of the HSV model evolved from the RGB cube. Imagine that the hexagonal shape of a cube can be seen from the white vertex of the diagonal of the cube to the black vertex. The hexagonal boundary represents color, the horizontal axis represents purity, and the lightness is measured along the vertical axis.

HSV Color Space

S is a color saturation; s is a proportional value, ranging from 0 to 1, which represents the ratio between the purity of the selected color and the maximum purity of the color, in layman's terms, s represents the "purity" of a color, the greater the value of s, the more pure the color, the smaller the value, the more gray the color. V indicates how bright the color is, ranging from 0 to 1. V equals 0 indicates the bottom point of the cone, that is, black, V equals 1 for the top of the cone, when v=1 and s=0 are pure white.

HSV Cone Application Example: establishing a color probability model

Because what we're going to do is use an algorithm to detect the function of a particular object (especially a face), so by introducing this example, we'll lay the groundwork for the mean shift algorithm described later.
Because skin tones are fairly concentrated in the color space, they are affected by light and ethnicity. To reduce the effect of skin tone intensity, you typically convert the color space from RBG to a color space separated by luminance and chroma, such as the HSV color space, and then discard the luminance component.
Collect skin color samples, the color of each pixel in the skin sample from the RGB space to the HSV space, statistical its H component histogram, and the histogram is normalized, you can get skin color in h space probability distribution, the probability distribution is the desired tracking mode.

The steps are as follows:
1. Map each pixel in the observed image from RGB space to HSV space and set up a statistical histogram based on the formula to calculate H component size
2, according to the histogram of the inverse projection operation, that is, the observed image of each pixel value, with its H component in the established statistical histogram of the corresponding value instead
3, the resulting output image is the color probability distribution image of the observed image

Now, the program demonstrates the use of color information in the histogram inverse projection algorithm.

Class Contentfinder{private:float hranges[2];    Const float* RANGES[3];    int channels[3];    float threshold; Cv::matnd Histogram;public:contentfinder (): Threshold ( -1.0f) {Ranges[0] = hranges;//All channels have the same value ranges[1] =        Hranges;    RANGES[2] = hranges;    }//sets the threshold value of the histogram [0,1] void setthreshold (float t) {threshold = t;    }//Get threshold float Getthreshold () {return threshold;        }//Set reference histogram void Sethistogram (const cv::matnd& h) {histogram = h;  Cv::normalize (histogram,histogram,1.0);//The input histogram must be normalized} Cv::mat find (const cv::mat& image, float Minvalue,float MaxValue, int* channels,int dim);};                            Cv::mat contentfinder::find (const cv::mat& image, float minvalue,float maxValue,    int* Channels,int Dim) {Cv::mat Result;    Hranges[0] = MinValue;    HRANGES[1] = MaxValue; for (int i=0;i<dim;i++) {This->chAnnels[i] = channels[i]; } cv::calcbackproject (&image,1,//Input picture channels,//All channels list Histo                        gram,//histogram result,//Inverse projection results ranges,//domain    255.0//zoom factor);    Thresholding to get a two value image if (Threshold > 0.0) cv::threshold (result,result,255*threshold,255,cv::thresh_binary); return result;}

Mean Shift (Mean Shift) algorithm

Suppose we now know the approximate position of the object, and the probability map can be used to locate the exact position of the object. The most likely position is where the maximum probability is obtained in the known window area. So, if we start at the initial position and iterate and move, the edge can find the exact position. This is the task to be done by the mean shift algorithm.

Principle

The mean shift algorithm locks the local maximum value of the probability function in an iterative manner. It mainly looks for the center of gravity of the data points in the predefined window, or the weighted average. The algorithm moves the center of the window to the centre of gravity of the data point and repeats the process until the centroid of the window converges to a stable point.
Mathematically speaking, the Mean shift algorithm uses the gradient climb of probability density to find the local optimum . When you enter the range of an image, and then Mean shift iterations based on the inverted projection and the input box, it moves toward the center of gravity, that is, the probability of moving to the reverse projection, so it always moves to the target, and the Mean shift algorithm is a gradient-ascending algorithm with variable step size .
Now consider a point set (which can be the distribution of pixel points that have been inverted by the histogram), and a small window that needs to be moved to the most densely populated area of the pixel. As shown in the following:

Meanshift_basics

The initial window is represented by the large blue circle of C1, whose original center is "C1_o". The center point of "C1_r" is obtained as the center point moves to a region with large local density. With this iterative process, you will end up with an area of the largest pixel distribution, the C2, which contains the largest number of points.

Summing up, thebasic idea of Mean shift is that starting with each pixel, we first estimate the density (local density) gradient of neighboring pixels with similar color, and then use the iterative algorithm to find the peak of local density (i.e. the center of gravity), Divides all pixels that can be clustered to the same peak point into an area.

Basic process

(1) Select the size and initial position of the tracking window. In the mean shift tracking algorithm, the size of the kernel window width (that is, the size of the definition field of the kernel function is the size of the search window) plays a very important role. Because it not only determines the number of samples that participate in the mean shift iteration, but also reflects the size of the tracking window. Typically, the width of the kernel window is determined by the size of the initial tracking window and is no longer changed throughout the tracking process.

Meanshift_step1

(2) calculates the centroid (or center of gravity) within the tracking window. In the discrete two-dimensional (2D) probability distribution image, calculating the centroid of a window is the same as physically calculating the centroid of an object, that is, using the relationship between the window's 0-order moment M00 and (x, y) First-order Moment (M10,M01), the centroid of the window is computed.

Formula
Meanshift_step2

(3) Adjust the center of the Tracking window to the centroid;
(4) Repeat the second and third steps until the center of the Tracking window and the centroid "converge", that is, each time the window moves less than a certain threshold value.

Meanshift_step3

Generally an image is a matrix, pixel points evenly distributed on the image, there is no point of dense. So how to define the probability density of a point, this is the most critical. And we can define the probability density by the color of the pixel point.
Its application as a face tracker is demonstrated by:

Meanshift_face Advantages

Mean shift, as an efficient pattern matching algorithm, has been widely used in various pattern recognition, real-time visual tracking and other fields because it does not need global search and high precision of search.

Insufficient

Lack of the necessary model updating method, the tracking window size remains the same during the whole tracking process, and when the target has a scale change, it will cause the scale positioning to be inaccurate.

Continuous adaptive Meanshift (camshift) algorithm

Bradski based on the insufficiency of mean shift algorithm, the Camshift algorithm is proposed. Camshift algorithm, namely continuously Adaptive mean-shift algorithm, the basic idea is to the video image of the multi-frame meanshift operation, the previous frame results as the initial value of the next frame, iteration down.
The algorithm uses the invariant moment to estimate the size of the target, realizes the size and position of the tracking window continuously and adaptively, and applies it to the fast tracking of moving object in the continuous color image sequence.
To put it simply, Mean shift is looking for the best iteration results for a single picture, whereas Camshift is for the video sequence and calls Mean shift for each frame of the sequence to find the best iteration results. It is precisely because the Camshift is processed for a video sequence so that it can adjust the size of the window continuously, so that when the size of the target changes, the algorithm can adjust the target area to continue tracking in an adaptive manner.

Principle

The Camshift algorithm first transforms the video image into a probabilistic distribution image (PDI) based on the tracking target color probability model, initializes a rectangular search window, and uses the mean shift algorithm to search for the optimal region of target matching for each frame PDI image. According to the invariant moment of the search region, the center and size of the target are estimated, the search results are saved and output, and the search window is initialized with the current frame search result as the next frame image. This cycle enables continuous tracking of the target. Camshift algorithm is a non-parametric density function gradient estimation method with dynamic change .

Specific steps

(1) Create a color probability distribution map of the specified window, initialize a search window W, whose size is s

Camshift_1

The histogram of the left graph can be used to know the most common chroma range of the selected area. Assuming that the new video frame arrives, the chroma value of each pixel point is detected and the probability value of the pixel is given by the histogram.
(2) Using the mean shift algorithm to "converge" the search window; in the 2D probability distribution image, the centroid of the search window is computed; Adjusts the center of the Search window to the computed centroid position. Repeat the process until "convergent" (that is, the center's displacement is less than the given threshold)
(3) Reset the size of the search window s and calculate the output parameters of the tracking target, and initialize the next frame mean shift search window with a new window size

Camshift_2

Represents an image in which a human face may be present in a video frame image when the face is tracked by the camshift algorithm. The black pixel has the lowest probability value, the white pixel probability is the highest, and the gray pixels are between them.
(4) Jump to the second step to the next frame of the loop

The following is a human face tracking demo of the Camshift algorithm:

Camshift_face Reference

OpenCV 3.0.0-dev Documentation
OpenCV 2 Computer Vision Programming Manual, Science Press
Research on human face tracking and recognition system based on OPENCV, Reiging, Xidian University, Master's degree thesis, 2010
Meanshift, Clustering algorithm

"Computer vision" uses image histograms to detect specific objects (Meanshift, Camshift algorithms)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More