Summary of Target Tracking

Source: Internet
Author: User

I. Introduction: in the environment to be monitored, how can we determine the target entering a specific area and track the target. There are two scenarios: Target Tracking in a static background and target tracking in a dynamic background.

2. Target Tracking Method in a static background

1. Single Target: The target tracking can also be divided into single object tracking and multi-object tracking. In the static background of a single object, the target tracking means that the camera is fixed to a certain position, and the field of view observed by the camera is also static. Generally, the background difference method is used to model the background first, then read the image from the video stream (we call it a foreground image), and make the foreground image and the background image worse, you can get the target object entering the field of view. For the description of the target, the size of the target is usually expressed by the number of shards in the target connected area, or the aspect ratio of the target area. The location information of the target can be located by projection.

2. multi-object: For multi-object tracking in a static environment, you need to determine the features, positions, motion directions, speeds, and other information of each target.

3. Pre-processing: Because the obtained image will always have noise, some pre-processing operations need to be performed on the image, such as Gaussian smoothing, mean filtering, or some grayscale stretching and other image enhancement operations.

3. Target Tracking in a dynamic background

The rotation of the camera under the control of the cloud platform makes the image that it collects changeable. Therefore, for the entire target tracking process, the background is changed, the target is also moving throughout the process, so it is difficult to track.

The current solution proposed by the research team is: tracking process: Get a number of background images under different camera angle, and create a background image library --> get the current frame image when the camera remains fixed, matching with the background image in the Image Library, background difference (gray difference ?), Target retrieval> Feature Extraction> real-time acquisition of the current frame image and trackingAlgorithmDynamically tracks the target.

Feature extraction is a difficult issue. Our research team proposed a method for multi-color space analysis. According to the color image, the homomorphic nature of the same object is characterized in different color spaces. The target object can be decomposed in different color spaces and these key feature information can be fused, to identify the essential features of the target.

Descriptions of various methods used in the tracking process:

1) when the angle is 0 to degrees different, the background image is obtained, and a mixture of Gaussian Background Modeling and Image Library can be created. Each background image is matched with different signs of the pitch angle and deflection angle;

2) After the background difference is obtained, the differential image must be smooth, de-noise, and other processing needs to be done to remove interference factors;

3) multi-color space (HSV and YUV) feature extraction for the target, and the feature phases (and) for different color spaces are obtained, to better find the target in the current frame image;

4) Real-time images of the current frame are modeled using Gaussian mixture to eliminate background changes caused by leaf shaking;

5) The tracking algorithm can use multiple sub-block matching methods and camshift methods.

Iv. Introduction to relevant theories

In recent years, a tracking algorithm named camshift has received more and more attention due to its excellent performance in real-time and robustness. Currently, the camshift algorithm has been widely used in Face Tracking on the perception user interface, as well as semi-automatic motion target tracking. On the one hand, the camshift algorithm should be a region-based method that uses the color information in the region to track the target. On the other hand, the camshift algorithm is a non-parameter technique, it searches for moving targets by means of clustering.

Simply put, the camshift algorithm uses the color features of the target to locate the position and size of the moving target in the video image. In the next video image, use the current position and size of the target to initialize the search window. repeat this process to achieve continuous tracking of the target. Before each search, set the initial value of the search window to the current position and size of the target. Because the search window is located near the area where the target may appear, this saves a lot of search time and enables the camshift algorithm to have good real-time performance. At the same time, the camshift algorithm finds the moving target through color matching. In the process of moving the target, the color information does not change much, so the camshift algorithm has good robustness. Because RGB color space is sensitive to illumination brightness changes, to reduce the effect of illumination brightness changes on the tracking effect, the camshift algorithm converts the image from RGB color space to HSV color space for subsequent processing.

The algorithm flow of camshift is 3.4. First, select the initial search window so that the window exactly contains the entire tracking target, and then sample the value on H of each pixel in the window to obtain the color histogram of the target, save the Histogram as the color histogram model of the target. During the tracking process, the probability that the pixel is the target pixel can be obtained by querying the target color histogram model for each pixel in the video image processing area, areas other than the image processing area are used as areas with a probability of 0. After the above processing, the video image is converted to the target color probability distribution chart, also known as the target color projection chart. For ease of display, the projection chart is converted into an 8-bit grayscale projection chart. The pixel value of Probability 1 is set to 255, and the pixel value of probability 0 is 0, other pixels are also converted to the corresponding gray value. Therefore, the brighter pixels in the gray-scale projection map indicate that the pixel is more likely to be the target pixel.

The dotted line marked in the figure is the core part of the camshift algorithm. The main goal is to find the position of the moving target in the video image. This part is called the Mean Shift algorithm. Since mean shift is the core of camshift, correct understanding of meanshift becomes the key to understanding the camshift algorithm. Next we will focus on the Mean Shift algorithm.

2. Gaussian Mixture Model

When the leaves shake in the background, it will repeatedly overwrite a certain pixel and then leave. The value of this pixel will change dramatically to effectively extract the moving target of interest, the shaking leaves should also be regarded as the background. At this time, any single-peak distribution cannot describe the background of the pixel, because the background of the pixel is assumed to be static except for a small amount of noise, A single-mode model cannot describe complex backgrounds. In the existing background models with better results, some have established multi-peak distribution models (such as Gaussian Mixture Models) for Pixel points, and some have predicted the expected background images, the success of these algorithms lies in the definition of the appropriate stationarity (stationarity) criterion. pixel values that meet this criterion are considered as backgrounds and ignored during moving target detection. For a specific application scenario, to evaluate the weaknesses and advantages of a specific algorithm, the pixel-level steady state criterion must be clarified.

For complex background with chaos, you cannot use a single Gaussian model to estimate the background. Considering that the background pixel value distribution is multi-peak, you can use the single mode method, multiple Single-modal sets are used to describe the changes of pixel values in complex scenarios. The Gaussian mixture model uses multiple single Gaussian Functions to describe the background of multi-modal scenarios.

The basic idea of the Gaussian mixture model is to define k States for each pixel to express its

Color. The value of K is generally 3-5 (depending on the computer memory and the algorithm speed requirements ),

The larger the K value, the stronger the processing fluctuation capability, and the longer the processing time required. In K states

States are represented by a Gaussian function, and some of these States represent the pixel value of the background

The pixel value of the motion foreground.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/tumblerman/archive/2009/04/03/4025627.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.