Original URL: http://www.csdn.net/article/1970-01-01/2826441
1. Chinese Academy of Engineering academician Gao Gao brings a keynote speech called "Multimedia Big Data Analysis and search", and deeply analyzes the three major technical challenges that media big data cannot hold, see, or find, and the solution is: ultra-efficient video coding to solve the compression problem, in the video coding tools to introduce the scene model, Get better coding efficiency, solve pattern recognition problems with object-oriented detection, tracking and recognition, support ROI, GPS and multi-camera correlation, solve cross-camera search problems with large-scale visual search, and develop new visual descriptors to support effective search.
2. In modern society, big data sources are rich, more directly to the traffic, health, education, security and other changes, and in the smart city system, monitoring video is the largest volume of big data. Based on this, today, I mainly share the media big Data three challenge question. First, the amount of data generated by the 24-hour accumulation is large. Second, can not see clearly, with the eyes, horizontal look at the vertical look, or do not see clearly, there may be a time to guess to guess, but also need a very experienced talent can see the approximate. Why? Save the time to do the compression, compression, do not know what to do in the future, in order to save the amount of storage, pressure too hard, and then untie it when the basic can not see clearly. Third, I can't find it. Now the cameras are everywhere, the cameras are on, but are you looking for them? Do not know, even if you see clearly, a cross-camera can not find. So the camera network cross-camera search problem is also a problem.
3. Just said that from the point of view of coding, we have to go to space redundancy, time redundancy and de-coding redundancy three techniques to remove the redundancy in the video stream, these three technologies contain many algorithms, such as transformation, filtering, motion compensation, entropy coding and so on.
4. Now we propose a support is the ROI, that is, when the code identifies which area may be to identify the area to be used, the area is defined as an area of interest, for the area of interest to be depicted, now the grammar of the area of interest has a special description, in addition to this area also includes other, For example, you can put GPS information, camera parameters information. With this, when I encode in the back, I will adjust the encoding parameters, the ROI area is slightly lighter, so the key information will be lost less.
5. Now there is a special technology is recertification, a person appeared in a camera, when you go to the second camera, then I can recognize you. Because sometimes it may not be the face, relying on face recognition has no use, it depends on color, body, gait, shape and so on comprehensive identification is you.
6. The key technology in this, one is to select feature points, and then select features, these features are aggregated, compressed, point compression, and finally formed a very small. For example, how small, such as you take a photo, this photo has 3, 4 trillion large size, we put forward to about 500 bit, even 1 k are not, you can search, up to 16k,16k retrieval efficiency is higher, we judge the characteristics of good is the recall rate to judge, We all want the recall rate to reach 90%, less than 90% to think that this feature is not selected. What is recall rate 90%? I use the full picture to the library to search for things, and I use 521 to search, is not 90% of the 100 i have just searched inside, if so you this characteristic is possible, we are based on this criterion.
7. Its background technology involves, for example, data compression technology, involving the extraction of computer vision features, involving machine learning and visual mining. Features and vision are mainly local descriptors, if you know a bit about pattern recognition, there is a very good description of the word "sift characteristics", which has some characteristics, it can keep the translation unchanged, rotation unchanged, scaling scale unchanged, it is very useful in the recognition of this feature. However, this feature is a bit defective, one is a patent problem, in addition, the consumption of storage is relatively large, costing time is relatively large.
8. To summarize, there are three challenges for multimedia big data, smart city or Intelligent City: Compression problem, pattern recognition problem, visual search problem. In view of these problems, compression is mainly by improving the efficiency of coding compression to deal with; identify problems we want to do the coding, do some ROI support, to include this information as much as possible, so that the loss of information in the back less; the problem of visual search can be described in a condensed form, making the search faster, Now this system is basically 1 million images can be completed in 1 seconds, the speed is very fast.
Reading "Chinese Academy of Engineering academician Gao Wen: Multimedia Big Data analysis and search" summary