Intelligent Video Retrieval algorithm

Source: Internet
Author: User

Video retrieval relies on the video algorithm to analyze the video content, by extracting the key information in the video, marking or related processing, and forming the corresponding monitoring mode of events and alarms, people can quickly search through various attribute descriptions. If the camera is seen as a human eye, the Intelligent video surveillance system can be understood as the human brain. Intelligent video technology with the powerful computing function of the processor, the high-speed analysis of the huge amount of data in the image to obtain the information people need;

Frame Difference Model


Frame difference can be said to be the simplest of a background model, the specified video image as the background, with the current frame and background to compare, according to the need to filter the smaller differences, the results are the foreground.

Background Statistics Model

The background statistic model is: the background of a period of time statistics, and then calculate its statistical data (such as mean, average difference, standard deviation, mean drift value, etc.), the statistical data as a background method.

coding this background model

The basic idea of the code is this: to create multiple (or one) box (variable range) containing all recent changes for each pixel's change on the timeline, and the background if the current pixel falls within the range of any box when detecting it, using the current pixel to compare with box.

Mixed Gaussian model

Mixed Gaussian background modeling is a successful model of background modeling.

Why do you say that? Machine vision algorithm extracts the basic problems of moving target: image jitter, noise disturbance, light change, cloud flapping, shadow (including Target shadow and outside Object shadow), area interior reflective (such as water surface, monitor), moving target moving slowly and so on. So let's take a look at how mixed Gaussian background modeling solves these problems?

Through background modeling and foreground extraction, the target object in the video frame is extracted, but all the non-background objects are extracted, which is mixed, may contain many people, cars, animals and other objects, and finally the image search is to compare the similarity between the object and the search target, It is necessary to separate the mixed objects by target detection and tracking.

In the aspect of target detection, the algorithm has the Bayesian method, Kalman filter, particle filter, the relationship between them is as follows:

The Bayesian method uses known information to establish the probability density function of the system to obtain the optimal solution of the system State estimation.

For the estimation of linear Gaussian, the expected probability density function is still Gaussian distribution, its distribution characteristics can be described by mean and variance, and Kalman filter solves this kind of estimation problem well.

Particle (particle) filter-sequence importance sampling particle filter is a kind of simulation-based statistical filter which is suitable for strong nonlinear and non-Gaussian constraints.

In general, the effect of particle filter is better.

Light Processing: The same object, the visual effect in different light is different, the corresponding data is different, so in order to improve the accuracy of analysis, recall rate, the target object needs to do light processing; In light processing, the most popular algorithm in the industry is the Eigen image decomposition method.

eigen Image Decomposition

The most important of the information represented by the properties of each pixel value in the image obtained by the camera is luminance (shading) and albedo (reflectance). The luminance corresponds to the illumination information in the environment, the albedo corresponds to the material information of the object, that is, the reflection characteristic of the object to the illumination, and the albedo mainly shows the color information of the object. The problem of eigen-image solving is to restore the luminance and albedo information in the scene corresponding to all pixel points, and to form the luminance Eigen-map and the albedo eigen-map respectively from the image.

The Eigen-image decomposition can be expressed as I (x, y) = L (x, y) R (x, y), where I (x, y) represents an input image, R (x, y) represents an albedo image, and L (x, y) represents a luminance image. Because in a logarithmic domain, multiplication is converted into an addition that is easier to calculate, so we calculate in the log field of the image, the record/(x, y) = log (I (), r0,y) = log (R (x, less)), L (y, O) = log (L (x, y)). So the original multiplication relationship is converted to: I (x,y,t) = R (x, y) + L (x,y,t).

Key Frame Extraction

Security monitoring of the acquisition of video data is very large, if the video of each frame of the feature extraction, the establishment of high-dimensional index, retrieval, then in the video analysis and retrieval of time overhead will be very large, so the first step, the video stream to the key frame extraction, only the key frame feature extraction, the establishment of high-dimensional index, retrieval operations , and greatly shorten the computational capacity;

Video Keyframe extraction refers to the extraction of frames that represent the original video content according to certain rules, which can remove most of the redundant information from the video data and preserve only the useful parts of the video data; key frame extraction is a subsequent feature extraction. The premise of index establishment, the merits and demerits of the algorithm will directly affect the accuracy and performance of the whole video analysis.

Key frame Extraction Methods

Key frame extraction methods are mainly divided into two categories: The method based on the whole image sequence and the method based on compressed video;

At present, most of the key frame extraction research is based on the full image video analysis. The difference of the implementation method mainly lies in the application of the detection method, the choice of the feature and the partition of the frame image sub-block. Can be divided into the following categories:

a method based on lens boundary

This method divides the video stream into a number of lenses, taking the first and last frames in the lens and the middle frames as key frames. This method is simple and easy, and is suitable for a lens with small content activity or unchanged content. However, the complexity of the lens's visual content is not considered: the number of key frames is limited: the extracted keyframes are not strong enough to be stable.

a method based on content analysis

This method extracts key frames based on the change of visual information such as color and texture of each frame. The classical method is the frame average method and the histogram averaging method. The frame averaging method calculates the average of the pixel values of all frames at a certain position in the lens. The pixel value closest to the average of the point in the lens is then taken as the keyframe, and the histogram averaging method averages the statistical histogram of all frames in the lens. Then select the frame closest to the average histogram as the keyframe.

a method based on motion analysis

This method extracts the keyframes based on the motion information. The representation algorithm is the motion minimization algorithm proposed by Wolf, which calculates the amount of exercise in the lens by optical flow analysis. Select the key frame at the local minimum value of the exercise.

Clustering-based approach

This method is the mainstream technology of key frame extraction at present. The basic idea is: first determine an initial class of heart. Then, according to the distance between the current frame and the center of the class, determine whether the current frame is classified as the class or as a new class heart. After the frames in the lens are sorted. Take the nearest frame from the center of the class to the key frame.

a method based on compressed video

All of the above methods are based on the full image sequence, that is, before the keyframe is extracted. Unzip the video and restore it to a frame image. Large computational capacity. The compression domain-based approach is to extract keyframes directly from the MPEG compressed video stream. Eliminates the need to decompress the video stream or simply partially decompress, reducing the computational complexity.

It is commonly used in industry to extract key frames based on clustering method and compression domain method.

After the key frame extraction is completed, it is necessary to extract the key frames to extract the features, mainly color features, texture features, shape features and other angles; When the feature extraction is completed, a high-dimensional index is established for the target feature to improve the retrieval speed, and finally the index is searched to search for the target image, and the search results are returned.

Intelligent Video Retrieval algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.