Video data preprocessing and video data preprocessing
Video data preprocessing
Video data preprocessing can be divided into three steps: Video lens segmentation, key frame extraction, and feature extraction.
1. Video lens segmentation (lens boundary detection)
Lens segmentation is the first step in video processing and is the basis for subsequent video processing and analysis. The variation of video features in the same lens is mainly caused by two factors: the movement of the Object/camera and the change of light. There are two main conversion methods between lenses: CutTransition and Gradual Transition ).
(1) pixel Difference Method
First, define a pixel difference measure, and then calculate the inter-Frame Difference Between Two Consecutive Frames and compare it with a preset threshold. If the difference exceeds the threshold, the scenario is considered to have changed.
(2) histogram-Based Method
The histogram-based algorithm is the most common scenario segmentation method. It is easy to process and can achieve better results for most videos. The histogram-based method divides the gray level, brightness or color of each pixel of the adjacent frame into N levels, and then counts the number of pixels for each level to make a histogram comparison. This method collects statistics on the total gray scale or color distribution. It can tolerate motion in the camera and slow motion of the camera, it may only cause missed or missed checks when the lens content changes rapidly and the camera gradient.
(3) block matching
Based on the block matching method, each frame is first divided into small area blocks, and the similarity between Consecutive Frames is determined by comparing the corresponding blocks, this method uses the local features of the image to suppress noise and the impact of camera and object motion.
(4) motion-Based Method
The motion-based algorithm fully considers the motion conditions and features of objects and cameras in the same lens, and uses Motion Compensation and other methods to reduce the variation of the intra-lens frame difference caused by the movement of objects and cameras.
(5) contour-Based Method
When a simple video is separated, the contour-based algorithm works well, especially in the detection of gradient lenses. However, the main objects or backgrounds in most videos may have many complicated, subtle, or constantly changing outlines, which will interfere with the judgment on the lens edge and cause false checks; however, when the light is dark and the contour is not obvious (such as evening and fog), it may cause missed detection because it is difficult to detect the contour.
2. Key Frame Extraction
2.1 key frame meaning
A key frame is one or more of the most important and representative images in a lens. Based on the complexity of the lens content, one or more key frames can be extracted from one lens. The key frame selection should contain the main information of the current shot. It cannot be too complex to facilitate processing.
2.2 typical Key Frame Extraction Technology
2.2.1 first and last frame method and intermediate Frame Method
The first frame method uses the first and last images as the key frame, and the middle frame method is used to select an image centered on time as the key frame. Its disadvantage is that it limits the number of key frames of the lens and cannot accurately represent the lens information.
2.2.2 color, texture, and shape features
(1) Color Feature Extraction
Color is a major physical feature of an image. Few objects have similar color features. Color Features include color histograms, dominant colors, and average brightness. Color Feature retrieval is mainly based on color histograms. Color histogram represents the color frequency distribution of images. It is actually a statistical feature like color distribution. To describe the key frame changes of a shot. Two new content descriptors must be introduced: the main color histogram and the spatial structure histogram. The primary color is the color that occupies a relatively large proportion in an image. The primary color histogram can capture the longest colors that last. These colors are the main colors of the objects or backgrounds that interest this video clip from the color block graph. Spatial Structure histograms are a set of features that describe spatial motion information of images. It reflects the average brightness of the image on each axis of the color space. This adaptive Key Frame Structure Based on Motion detection. It comprehensively represents the content changes of the lens.
Simply put, the current frame is compared with the image of the last key frame. If many features change, it is a new key frame. Different video lenses get key frames of different data.
(2) Texture Feature Extraction
Texture is an irregular and macroscopic pattern of some images. Texture features include modularity, directionality, and contrast. Texture features are extracted using a gray-scale co-occurrence matrix during texture feature extraction. The gray level co-occurrence matrix defines the gray level of an image as N. Then, the gray level co-occurrence matrix is NxN, which can be expressed as M delta (I, j ). In the gray-scale co-occurrence matrix, four statistics indicating texture features are selected: contrast (contrast), texture consistency (uniformity), pixel-to-gray correlation (correlation), and entropy (entropy) as a feature vector. The preceding four texture features are extracted from four directions: 0 °, 45 °, 90 °, and 135 ° to form a 16-dimensional feature vector.
(3) Shape Feature Extraction
Contour shape is the main feature of an image, and Shape Feature Extraction relies on edge detection. Edge Detection mainly uses moment. The moment is used to describe the shape and the computation speed is fast. In the shape feature extraction algorithm, the moment of each color after quantification is calculated. Compared with the image segmentation method, the algorithm is more robust and simple. In each moment, the zero and one moment are selected as the spatial features of the image.
2.2.3 Motion Analysis
It is also an important factor for extracting key frames when significant motion information is generated by camera motion. If the focal length of the camera changes, the first and last frames are the key frames. If the angle of the camera changes and the overlap with the previous key frame is less than 30%, the current frame is the key frame.
2.2.4 clustering-Based Method
For a large image database, a clustering algorithm is used to classify images in the image database. Extraction of key frames greatly reduces the computing workload. This method is highly efficient in computing and can effectively obtain visual content with significant video lens changes. For low-activity lenses, a small number of key frames are extracted. Conversely, a large number of key frames are extracted.
3. Video Feature Extraction
The basic features of a video can be classified into static and dynamic features.
3.1 static features
Static features are mainly the image features of key frames. The feature extraction methods for key frames are the same as those for general static images. Static features include color features, texture features, and shape features;
3.1.1 Color Features
(1) Advantages of Color Features: color features have many advantages, including simple operations, stable properties, and some changes
Such as rotation, translation, scale transformation, and so on, so they are very robust.
(2) Color Space: The color is usually defined in three-dimensional color space, including RGB (red, green, blue), HSV
(Tone, saturation, brightness value) or HSB (tone, saturation, brightness ). The most commonly used color space is RGB,
HSV, LUV, and YCrCb. The RGB space structure does not conform to people's subjective judgment on color similarity, While HSV
Color Space is closer to people's subjective understanding of color space. The RGB space can be converted to the HSV space (in
RGB2HSV can be directly used in Matlab ).
(3) color histogram: The main representation of the image color information is the color histogram (with scale immutability and rotation is not
). In the color histogram, the value of the X axis depends on the number of colors in the image.
Number of pixels. Color histograms describe the proportions of different colors in the entire image, regardless
The spatial distribution of colors.
Distance Measurement: the distance between two colors can be measured in different ways. For example, a measure of the HSV space
Method:
This similarity measurement method is equivalent to Euler's distance in a cylindrical color space.
(4) color moment: Any color distribution in the image can be expressed by its moment, and the color distribution information is mainly concentrated.
In the lower-order moment. Compared with color histograms, this method does not require feature vectorization.
(5) color set: it is an approximation of the color histogram method. This method expresses the image as a binary color cable.
To support quick search in large-scale image libraries.
(6) color aggregation vector: it is an evolution of the color histogram. Its core idea is to belong to every bin of the histogram.
The pixels are divided into two parts: aggregated pixels and non-aggregated pixels. Contains the Color Distribution space information.
3.1.2 texture features
The texture contains important information such as the structure of the object surface and describes the object surface and its surrounding environment. Texture features include coarseness, contrast, directionality, linelikeness, regularity, and roughness ).
Common texture analysis and classification methods:
(1) wavelet transformation: wavelet transformation refers to the decomposition of signals into a series of basic functions, such as σmn (x ). Four sub-bands,
The frequency features are called LL, LH, HL, and HH. There are two types of wavelet transformations that can be used for texture analysis,
Here is the wavelet transform (PWT) of the pyramid structure and the wavelet transform (TWT) of the tree pile structure ).
(2) symbiotic matrix: first, a symbiotic matrix based on the directionality and distance between pixels is established, and then extracted from the matrix.
Meaningful statistics are used as texture features.
3.1.3 shape features
A shape can be defined as a surface structure profile feature of an object. It makes it possible for an object area to be different from other objects in its surrounding environment. Shape features can be expressed in two ways: contour features and regional features. The former only applies to the outer boundary of the object, while the latter is related to the entire shape area. The most typical methods to express these two types of shape features are Fourier Descriptor and shape-independent moment.
3.1.4 spatial relationship features
The location of the object in the image and the spatial relationship between the object are also very important features in image search. Spatial Relationship features can be divided into two categories: one method is to automatically divide the image, divide the objects or color areas contained in the image, and then index the image based on these areas; another method is to divide the image evenly into several sub-blocks and index each sub-block for feature extraction.
3.2 Dynamic Features
Dynamic features are unique to video data, including global motion (camera motion, such as shaking, pulling, and tracking) and local motion (movement of objects in the camera, motion Track, relative speed, location change between objects, etc ). Dynamic features are important features of video data. Because it is difficult to describe the motion changes of video sequences by using only the image features that represent frames.
(1) Global Motion
It mainly includes camera translation, rotation, and scaling. You can create a general model of camera motion to portray the camera.
. When estimating the parameters of the camera motion model, we first select enough observation points from the adjacent frames, and then use a certain matching algorithm to find the observed motion vectors of these points, finally, we use the parameter fitting method to estimate the model parameters.
(2) Local Motion
Local motion feature extraction technology: Motion Vector Extraction Technology Based on Optical Flow Field (using gray in Motion Image Sequence
The time-domain changes and correlation of degree data to determine the motion of image pixels ). Optical flow is an apparent motion based on Pixel gray scale and is not equal to the motion vector.
U optical flow constraint equation: the basic idea is to use the Motion Image function as the basic function, establish the optical flow constraint equation based on the gray conservation principle of the image, and calculate the motion vector by solving the optical flow constraint equation.
U Horn-Schunck Optical Flow Field Calculation Method: because each pixel has two unknown values
The flow equation is an uncomfortable problem. Based on the optical flow field caused by the same moving object, Horn should be continuous and smooth, that is, the speed of adjacent points on the same object is similar, then the optical flow changes projected onto the image should also be smooth, this paper proposes a method to convert the computing problem of the optical flow field into an optimization problem by using the additional constraint (the overall smoothing constraint) added to the optical flow field.