key words:motion estimation and compensation, Motion Analysis,video encode
Today, suddenly there is this idea: My research direction is computer vision, but I have studied in the past year in the static scene, to understand the analysis, including OPENCV processing images, PCL processing point cloud, are static, and have not seriously to deal with the dynamic, that is, from the video directly to get the content to be processed, such as target detection and tracking, movement history and prediction trajectory and so on. In fact, OpenCV processing objects from content acquisition to classify, is two chunks: pictures and video. OpenCV on the video processing is also a lot of official routines, but PCL processing video, or very few, in fact, think is justified, if a video is composed of point cloud sequence, what is the video AH (slam real-time map reconstruction should have it). So the next stage, I will take good use of OPENCV to deal with video, perceptual video. Add two points, in the processing of pictures, I would like to put the time dimension into the idea of processing together, and the video itself with time scale, should be very good to realize the idea, 2nd, is the design pattern, most of the offline training learning online test (end of the article), but if there is real-time requirements, real-time reconstruction, real Results real-time display, expect to meet such requirements, wait and see ...
For Video tracking SEE: http://en.wikipedia.org/wiki/Video_tracking
A small station with crooked Nuts, dedicated to the http://www.robots.ox.ac.uk/~gk/of visual tracking
Bo Master Simple Life ff:http://blog.csdn.net/gxf1027/article/category/858368 to the visual tracking literature
Study of the blog address of H. http://blog.csdn.net/xiangjiantui/article/details/7871152:
(a) Online learning for Visual Tracking from http://blog.sina.com.cn/s/blog_5138ac890100geha.html
Goal:
The application of online learning in visual tracking is mainly to solve the problem of tracking the appearance of the target (appearance). The reasons for the change of the target appearance can be summed up in two categories [1]: First, because of the change of the target itself, such as posture, shape deformation, and the other is caused by the outside world, including the change of illumination, camera motion and angle of view conversion, occlusion and so on. The traditional off-line learning (off-line learning) approach uses a large number of samples to train the classifier, then detects and tracks the target online. As a result, this requires a high level of training samples, such as the need to track the possible state of the target in a variety of situations, and even with such training samples, it may be difficult to suggest better features for detection or classification, given that they are too varied. While the method of offline learning is tracking, the detector or classifier is updated to adapt to the current target tracking, so as to try to solve the difficulties encountered in offline learning and tracking.
Paper Review: (Keep up with the new)
R.t.collins and others proposed a method of online selection of tracking features [2]. Their core point of view is that visual tracking is good or bad, depending on how much of a tracking target is distinguishable from the surrounding background. If the target has a regional branch, then a simple tracker can also solve the problem. They choose {w1*r+w2*g+w3*b}, w*∈{-2,-1,0,1,2} as a feature library, using the target and background of these characteristics of the distribution of different, to each pixel to the target and background of the distinction, and finally use the Meanshift method to find the target location and scale. Here, each time only n the most differentiated features to detect the target, the selection criterion is whether the feature can make the target and background distribution has a large variance distance, similar to Fisher Discriminator. To prevent the model drift in the tracking process, they chose the target and background of the first frame and the last few frames to build the target and background distribution [P (feature_i | object), p (feature_i | background)].
h Grabner [3] and others proposed on-line boosting method. It differs from off-line boosting in that, at off-line boosting, all samples are used to select K weakly classifiers at the same time, while on-line boosting is a sample each time it is obtained. In order to train all the classifiers (the classifier is divided into K-levels), and then on each layer to choose the best classifier (the criteria selected according to the classification of the classifier error rate, the minimum error rate will be selected). However, the defect of the on-line boosting is that it is not possible to set the required detection rate and the error rate so as to automatically select the required number of classifiers, which has been set in advance to the superposition of K weak classifiers (or smaller than k). In [4], the boosting method for multi-sample learning is given. His main concern is that if the tracking location is inaccurate, the resulting sample will be inaccurate, and the subsequent classifier update process using the sample will degrade the performance of the classifier, a vicious cycle that causes the target to be unable to track. While many examples learn to learn from multiple samples at the same time, it is assumed that as long as there is a positive sample in these samples, they can learn the correct results. Therefore, even if the target position of this test is offset, there is at least one accurate target by selecting N samples around this target. These samples will be learned at the same time to ensure that the classifier update does not appear to be problematic.
[1] The proposed incremental learning (incremental learning), is not yet read too, but the starting point is also the use of online samples to constantly update the classifier. The target is tracked using the particle filter.
Challenge [5]:
Visual object Tracking is one of the classic tasks in computer vision. However, in the general case of tracking any object, tracking remains challenging. The proposed Ethz Tracker evaluation framework focuses on a well defined Sub-problem in visual tracking:
(1) Single object tracking
(2) Model-free tracking (i.e., only the initial position of the object is known)
An ideal model-free tracking algorithm should being able to track the object of interest accurately, in spite of distraction (e.g., occlusion), while not ending on an unreliable state (e.g., tracking a different object).
Summarizing, the challenge is to track any object which might undergoes various appearance changes by using as little Prio R information as possible. The method should is robust in the sense of the partial and full occlusions, changes in illumination and background clutter. (minimal prior, anti-occlusion, illumination, background confusion, of course, for non-rigid bodies and deformation)
Reference:
[1]. D. Ross, J. Lim, R-s Lin, M-h Yang, Incremental Learning for robust Visual Tracking, no.77, PP:125-141,IJCV 2008.
[2]. r.t. Collins, Yanxi Liu, and M. Leordeanu, Online Selection of discriminative Tracking Features.
[3]. h Grabner, and H. Bischof, on-line boosting and Vision, CVPR 2006
[4]. B. Babenko, M-h Yang, and S. Belongie, Visual Tracking with Online multiple Instance learning, CVPR 2009.
[5]. http://www.vision.ee.ethz.ch/trackerEvaluation/
(ii) Motion prediction and compensation (motion estimation and compensation)
The residual block data is DCT, quantized, Entropy coding compressed into strings for storage or transmission, many of which can improve the compressibility, for example, in the DCT conversion to the waveform plane to provide some necessary means for the subsequent quantization process, the human eye can be sensitive to the low-wave generation of the picture, Can be less sensitive to the images produced by high waves, please refer to the HVS documentation for this knowledge. But these countries can all be based on the size of the original data. How can the size of the original data be smaller? According to My learning process, only intra-frame prediction and out-of-frame prediction, intra-frame predictive poetry uses decoded units to predict the next unit (the unit in this case is a block), and the out-of-frame prediction is determined by the correlation of the images of each image, for example, the previous and present frame has a very large nature, We can use this relationship to predict the current picture. The following figure shows intra-and out-of-frame predictions.
This chapter describes out-of-frame forecasting and methods. Out-of-frame prediction can greatly improve the compression rate in the first phase of picture compression. However, the computational complexity of out-of-frame prediction is very high, I read some articles where the complexity of the station to the entire encoding 4-50%. So it can be said that the ability to compress depends on the computational complexity of out-of-frame prediction. The factors that determine the video encoding capability are the following.
1. Ability to write code: How to find residual data?
2. Complexity: Does the algorithm take full advantage of the data already written?
3.strorage or delay: Is there a clock delay on the hardware or is the software in the data latency?
4. Information transfer question: Does the vector generated after out-of-frame prediction be sent to encode? How is the transmission transmitted?
In addition to the questions I have listed, there are a number of questions that need to be addressed in the video encoding. One of the straightforward questions in the discussion is that in the compression theory you can compress a small size with a large amount of computational complexity, which can lead to very small compression, such as a lossless compression method. (Compression force has 2 compression method one is loss, the other is lossless).
What is the requirement for out-of-frame prediction?
Intra-frame prediction produces a residual data that is referenced by the encoded frame (reference frame). This frame can be a time-past picture, or a picture to be played later. The purpose of the design is to make out-of-frame predictions with high accuracy. Data that is projected outside the frame is small, preferably in a completely similar way. This depends on the complexity of the calculation. When a picture is projected out of frame, he is transferring the data from the current frame minus reference frame to the final data after several paces. At the same time, the decode work in the encode is used for later prediction, because there is no raw data on the decode that can only be reproduced by decode data. The best compression rate depends on the size of the remaining blocks.
Finally upload some documents in the processing block diagram is mainly train-test pipeline:
Visual tracking------Object Tracking && Video tracking