The transformation from analog video to digital video is bringing long-awaited benefits to the security system. The main advantage is that digital compression allows the transmission and storage of more image data. However, the new advantage comes from the price. The low cost of digital video applications allows more cameras to be deployed, but more people are needed to monitor the cameras. Storing video signals can reduce the number of videos to be viewed, because the motion vectors and detectors used in the compression algorithm can be used to delete those frames without significant activity. However, since motion vectors and detectors cannot provide any information about what is happening, someone must actually filter the captured video to confirm suspicious behavior.
Therefore, the industry urgently needs to develop methods to improve the security and monitoring efficiency of videos. Video Content Analysis (VCA) can use electronic methods to identify important features in a frame. In the event of a specific type of event, the system generates an alarm, thus accelerating the response speed of the real-time security system. VCA can automatically search for specific content in captured videos, free people from browsing for a long time, and reduce the workload of manually filtering camera videos.
VCA technology is evolving and will be widely used in the near future. It is certain that VCA requires a large amount of processing capabilities to identify the target of interest in a large video pixel data stream. In addition, the VCA system must be programmable to meet various application changes, identify different content, and adapt to evolving algorithms. The latest video processor provides extremely powerful performance and programming flexibility to fully meet compression, VCA, and other digital video requirements. Software platforms and tools used to assist processors simplify the development of security and monitoring products. With the development of VCA technology, they will become key technologies.
VCA Process
There are no international standards for VCA, but the following is the conventional procedure of VCA:
1. Divide a long sequence into several separate scenes or images for analysis. Because different scenarios have different histograms or chromatographic distributions, if the histogram of a frame changes significantly from the previous frame, it can be considered that a scene transformation has taken place.
2. Because the foreground (foreground) is separated from the static background, changes to foreground objects in the scenario can be detected.
3. Some foreground objects are extracted or split and then tracked frame by frame. The tracking process involves the location and speed of the detection object.
4. If necessary, you can extract the features of an object and classify it.
5. If the current event is an event of interest, an alarm will be sent to the management software and/or related personnel.
Figure 1: Two DaVinci processors are used to process high-end VCA, which can encode X high-definition video sources at 30 frames per second.
Foreground/background Detection
VCA can detect activities that are represented by foreground changes in a context that is not normally static or interested. In the past, foreground/background detection was limited in computing workload. Today, higher-performance digital signal processors and video processors can be used to execute more complex detection algorithms.
Generally, there are two kinds of foreground/background Detection Methods: one is non-adaptive method, which only uses some video frames and does not maintain the background model; the other is adaptive method, it maintains a time-evolving background model. In the adaptive VCA algorithm, the feedback from step 1 to step 2 in the VCA process can be used to update and maintain the background model. This model is then used as the input in step 2.
Non-adaptive detection
In the simplest non-adaptive condition, the absolute difference between each pixel in the current frame and the corresponding pixel in the previous frame can be determined, and the absolute difference of the pixel can be compared with the preset threshold value. This threshold value represents the "zero" level after the noise compensation from the scene and from the photo generator. If the absolute difference value exceeds this threshold, the corresponding pixels will be in the foreground. Otherwise, pixels belong to the background.
Short-term tracking and recognition of video objects in a controlled environment can be achieved using only three frames. Even so, the non-adaptive method is only applicable to short-term Tracking Applications with high control and no significant changes in video scenarios. Manual Initialization is required when the scenario or background changes. If Initialization is not performed, errors accumulated over time will make the detection results unreliable.
Adaptive Detection
Due to the limitations of non-adaptive methods, adaptive foreground and background detection are used in VCA applications. The adaptive detection method maintains a background model, which is constantly updated by mixing the data of each new video frame. Compared with non-adaptive methods, adaptive methods require higher processing capabilities, and the complexity of the background model changes. In the Basic adaptive method, the algorithm determines the foreground by subtracting the background model pixel by pixel in the current frame (instead of the frame after the subtraction adopted by the adaptive algorithm ). The results are also fed back to the model, so that the model can adapt to the changing background without resetting. This method is very suitable for many video surveillance scenarios where the target is constantly moving or has background noise for a long time.
For more complex Foreground/background detection, a statistical background model is used. Each background pixel in a specified frame is modeled as a random variable conforming to Gaussian distribution. Based on the video data of each frame, the average and standard deviation of each individual pixel will change over time.
Object Tracking/recognition and Classification
After Foreground/background detection, a template (mask) is created ). Due to environmental noise, all components of a single target object may not be connected, therefore, before connecting all components into a whole object, a high-intensity morphological expansion (morphological dilation) operation is required. Expansion involves adding a grid to the template, calculating the foreground pixels of each area of the grid, lighting up the remaining pixels of each area, and counting the independent objects to be connected. After expansion and component join, you can get a boundary box for each object. This box represents the smallest rectangle containing the complete object, which may appear in different frames for segmentation.
The foreground object after the trail is split has three steps: predicting the position of each object in the current frame; determining which object can best match its description; and correcting the object trajectory to predict the next frame. Steps 1 and 3 are completed through the Kalman recursive filter. Since only the object location can be observed in a single frame, it is necessary to use the matrix calculation method to calculate its speed and next position instantly.
At the beginning of the process, the filter is initialized to the foreground object position relative to the background model. For frames tracked by each internal object, the filter predicts the relative position of the foreground object in subsequent frames. When the scene is transferred to a subsequent frame, the filter can locate the object and correct its trajectory.
The second step of Tracking involves data association, which determines the object correlation between frames based on the feature similarity. The object size, shape, and position can be based on the overlap between frames and frames. Speed is a parameter to be predicted by the Kalman filter. The histogram associates different objects with their colors. However, any or all of these features may change.
Consider the situation where a white truck with a red cab is very close to a video camera along the street, And it enters the driveway, turns around and goes in the opposite direction. All features of an object are constantly changing throughout the scenario: size, shape, speed, and color. The software must be able to adapt to this change in order to accurately identify trucks. In addition, when tracking multiple objects, the software must be able to differentiate the features between them.
The complexity of tracking will lead to problems related to object classification. For example, an object uses a line in front of a camera to give a system a warning more easily than a person uses that line. The size and speed of an object can provide vectors for rough classification, but more information is required for more precise classification. Large objects can provide more pixel information, but it may appear too much for fast classification. In this case, we need to adopt the dimension reduction technology to achieve real-time response, even if the subsequent investigation may still use the full pixel information in the stored frames.
In addition to object classification, efficient VCA implementation also has to overcome many challenges, including light changes caused by dusk, water surface, clouds, wind and rain, snow and fog; the path of multiple objects to be tracked is cross, so that each foreground pixel is merged temporarily and then separated. Objects in different windows are tracked in multiple camera systems. To solve these problems, you still need VCA to do a lot of work.
VCA System Design
High-performance processors and various configurations are required to implement VCA and video encoding. The emergence of the new analysis technology requires programming flexibility, which can be met by the processor and Video hardware coprocessor integrated with the highest performance and programmable DSP and the RISC microprocessor kernel. Suitable processors also need to be integrated with high-speed communication peripherals and video signal chains to reduce the number and cost of system components.
Using this solution to integrate VCA into the camera provides a robust and efficient network implementation form. VCA software can also be integrated into a computer, making the computer a centralized processing device for multiple cameras. In addition to the VCA process, preprocessing steps may be required for de-interleaving before Foreground/background detection and other analysis steps ).
Application Software may require additional processing steps for object recognition or other purposes. Both Single-processor and dual-processor design versions provide sufficient processing capabilities for new software features.
The foreground object is separated from the background and then tracked. If necessary, the adaptive method for classifying suspicious activities is the full content of VCA, which requires a high degree of real-time processing and adaptive capabilities. The DSP-based video processor provides the performance required for VCA and video encoding, and has high programming flexibility to adapt to the changing application requirements and technologies.
Author: Cheng Peng
DSP video Application Engineer
Texas Instruments