Editor's note: The 2014 top Conference on Computer vision CVPR the curtain last month. In this conference, Microsoft Research Asia a total of 15 papers were selected. What are some of the highlights of this year's CVPR, and what are the trends? To listen to the Microsoft Research Asia Intern Vujicic, who attended the meeting in the United States.
Vujicic
Microsoft Research Asia Intern
Computer Vision (computer vision) is one of the hottest directions in computer science for nearly more than 10 years, while the International Conference on Computer Vision and pattern recognition (Conference on computer vision and pattern recognition, Abbreviated as CVPR) is definitely the leader in computer vision conferences.
this year's CVPR in the United States Ohio Capital Columbus (Columbus), the location is a little before the village nowhere feeling, most people have to honestly meeting. But even with meetings in such a remote location, the number of attendees in CVPR is not shrinking, which can be appreciated at noon. Of course, the core of the meeting is definitely not about the location and the food provided, although it is also important. All the way all the way from all over the world. The main purpose of the gathering is to feel the trend of popular research and to communicate with everyone about their research breakthroughs and innovative ideas. I am honored to have three papers accepted by the General Assembly this time, which is a great affirmation of my personal research in the field of computer vision, and gives me the opportunity to go to the CVPR conference site to experience the most cutting-edge research and forward-looking trends in this field. Next, I would like to share with you the experience of this Conference.
First, the deep learning of prosperity
Deep Learning (learning) is one of the hottest trends in the present year, with 16 papers in the title of the paper (including oral presentation 4--in large-scale conferences such as CVPR, the number of papers, Most of the papers are presented in the form of posters. And the Committee of the General Assembly selects a small number of outstanding jobs in all (5% of all submissions) to address all the researchers. Since the upsurge of deep learning, many researchers have been working tirelessly to apply it to the various tasks of computer vision, from high-level (high-level) recognition (recognition), Classification (classification) to low-level (low-level) de-noising (denoising). It is reminiscent of the sparse of the sparse expression (representation) of the year, and the degree of deep learning is now more prevalent than it seems. Deep learning also has a tendency to sweep through high-level problems, and many of the directions of high-level are constantly being refreshed with data. The previous improvements were 1, 2-point growth, now using deep learning to easily brush out 5, 6 points, which gives great pressure to many non-deep learning methodology researchers. Although deep learning is a hot direction, the researchers in the Computer vision field are also very distinct in their attitude toward deep learning-support and wait-and-see, as well as other research trends (for reasons to say). As a powerful feature (feature) learning tool, get a lot of support and promotion naturally needless to say, many of the originally watched researchers have been engaged in this after witnessing the excellent performance of deep learning. One part of the wait-and-see attitude may still not understand the mechanism of deep learning, while the other is believed to be conservative in the contribution that deep learning will bring to computer vision. Although I admire the strong ability of deep learning, can also be conservative attitude. Admittedly deep learning is exceptionally powerful as a tool. Given enough training sets, it can help the user learn the characteristics of strong resolution under this task. However, the training process is almost black-box, and the learning system is difficult to provide a deeper understanding of the problem solved.
Second, to justify the research of the basic model
Perhaps because of this, I think this time the award some direction of the feeling. The best paper for this conference was presented in the study of camera motion and Shape recovery related articles what camera motion reveals about Shape with Unknown BRDF ( single author!), honorable mention gave a paper on shape using structured light 3D Shape and Indirect appearance by structured light Transport . Both of these papers can be considered 3D geometric models. Not only was the award so, but the predecessors in the field of computer vision personally stood up to the research of the basic model, in which Jean Ponce himself wrote a paper (oral) to continue to explore Trinocular Geometry 's traditional assumptions about how to ensure the intersection of the visual rays of three cameras are not immediately guaranteed. None of these papers is an in-depth study of the basic problems and basic models of computer vision, considering the systematic analysis of the problems that have not been studied by predecessors. The award of these articles is also a study that encourages our young researchers to be quiet about basic issues and not easily catch up with the current craze. It is easy to catch the tide, but how to ensure that research work is not easily forgotten in the time, or how to contribute to the field of research work, is definitely our researchers need to think about the problem. In addition, do not think that the content has been written into the textbook has been a certainty no research value. Some theories are based on the premise that the ideal assumption is satisfied, so they can continue to pursue further studies in the eyes of doubt.
third, has not been penetrated by deep learning Low-level Vision
the problems of computer vision can be divided into three categories, low-level,mid-level, and high-level according to their research objects and goals. The low-level problem is mainly aimed at the analysis and processing of the image itself and its intrinsic properties, such as judging the illumination, reflection influence and light direction, and further inferring the geometrical structure of the photographed object, and how to remove the bad effects such as jitter and noise encountered in the picture shooting. The high-level problem is mainly aimed at the understanding and cognition of the image content, such as identifying and tracking the specific object and its behavior in the image, and based on the deep inference of the identified object, such as predicting the scene of the object and the upcoming behavior. Mid-level is a level between the two, and personal understanding is focused on feature representations, such as how to describe a target object in a high-level problem, making this description different from other objects. It can be broadly believed that the content of low-level can serve the mid-level problem, and mid-level content can serve the high-level problem. Because this classification is not very strict, there will also be cross-cutting situation. Deep learning in computer vision is mainly a feature learning tool, which can be thought of as mid-level. So the high-level problem mentioned earlier is very much influenced by deep learning, which is why. In comparison, low-level problems are much less impacted by deep learning, and of course deep learning is used to study low-level problems such as de-noising (denoise) and de-blurring (deblur). For young researchers who are plagued by deep learning, it may also be useful to explore low-level's many interesting studies. Over the years, the Bill Freeman Group at MIT did some interesting low-level issues, such as zooming in on a small, imperceptible change in the visual of a video (Eulerian video magnification for revealing Subtle changesin the world, and this CVPR article camouflaging an Object from many viewpoints It's about how to place and dye a cube in the natural environment to create a chameleon-like hiding effect. Such studies have also made it interesting and fun to study the matter.
The author is also focusing on the study of low-level (Deblur). To blur means to restore the blurred image in a photograph to a clear image using some method. This problem is a problem that has been studied for many years--deconvolution (deconvolution), since the last century 5, from the 60 's, there are many well-known research work appeared. This research has made a lot of breakthroughs in the past ten years, and there are many influential work in dealing with the blur caused by camera jitter. In 2013, Adobe introduced the algorithm as an important feature in Photoshop, and it has become a motivating force in the field. The FBI has an example of using Photoshop's blur function to fix images and help solve cases, and I saw a thank-you note from the FBI last year during an internship at Adobe.
this time the author was accepted by CVPR 2014 three articles are about to go fuzzy research. One is aimed at a main source of blurred images-the image in the case of dark illumination, a de-blurring algorithm based on the Light streak (deblurring low-light Images-Streaks ).
This algorithm automatically detects the common spot in dark light, and uses the spot as the constraint of the fuzzy kernel (blur kernel). It is very effective to solve the blurred image under the dark light, and the phenomenon of the spot is not only in the low light, in the ordinary blurred image will also appear, only need to have in the scene with the surrounding environment has a color difference of small objects appear. Readers can download the code on my profile to try it out. Another article is that from a blurred image, we can not only estimate the camera jitter, but also discover the depth of the scene (Joint Depth estimation and Camera Shake removal from a single )。 This may sound like an impossible task, but the actual blur includes both camera jitter and scene depth information. Readers can also think that the process of shooting blurry images can be seen as a process of capturing a small video, so we have a multi-angle stereoscopic (stereo) input! The third one is a simple and effective algorithm for the design of text blurred images, which can be used for preprocessing before word recognition (deblurring text image via l0-regularized Intensity and Gradient prior< /c7>).
Iv. Depth Sensor ( Depth Sensor ) and depth image correlation
In recent years, the research on the depth image obtained from depth sensor has been a major concern in academia and industry. Especially in the industry, many depth sensor-related startups have sprung up in the industry, they have received a wide range of attention and a lot of investment, this sponsorship CVPR has a number of such start-up companies. Not only that, many big companies are also actively engaged in making their own depth Sensor, or embedded in their own products. Why Depth sensor has such a strong influence, we must have a lot of ideas already. As a new input data, it gives the traditional input data (2D) a new pixel-level dimension-depth. This not only gives the researchers a new direction for the old problem, which takes the RGBD input data as the core, but also makes many algorithms more practical with the help of the depth image. This also allows CV research to be closer to industry products. The maturation of Depth sensor and the development of research related to the CV field also provide an important interface to augmented reality (augmented Reality) in the direction of the future of technology. All of this shows that depth sensor is a very valuable and will be a very popular direction for some time.
Microsoft Research Asia also has a oral paper in this direction that uses depth sensor to do hand tracking ( Realtime and robust Hand Tracking from Depth ). By redefining the model of the hand and the energy equation, the work is done in real-time with the hand tracking and the algorithm is robust. On the PC, the GPU is also 25FPS (frames per second), while the average error is reduced to 10mm on the test data, or about 50% more than other methods. The accurate recognition of gestures is a very popular problem now. Because advances in technology have allowed traditional input methods (such as mice) to be on the verge of renewal, today's real-time interaction with depth sensor and gestures can lead to a revolution in the next input mode. So the job is very valuable and far-reaching, and it is therefore eligible for the oral speech.
In addition, Microsoft Research Asia in this year's CVPR published another oral paper is also a very wide application of a problem-human face alignment ( Face Alignment at + FPS via regressing Local Binary Features ).
By using the local learning criterion to reduce the task difficulty of random forest (forest), a better local feature (local feature) is obtained. At the same time, the overall structure learning helps the algorithm to be more robust. This project enables fast face alignment as well as face tracking. In the same precision, it is dozens of times times faster than the previous method, on the PC single core 3000FPS, mobile phone on the single core 300FPS. The results are exciting, because mobile phones and mobile devices have greatly changed the way people live, but compared to the PC, mobile phone processing capacity is limited, then need more fast and stable algorithm. This work provides a solid foundation for real-time processing of human faces in mobile phones and mobile devices.
Author Profile:
Vujicic, Microsoft Research Asia Intern, graduated from Zhejiang University and is currently studying for a PhD at the University of California at Merced. He has been an intern at the Adobe Innovation and Technology Laboratory for computer vision and image processing. In CVPR, ECCV, BMVC and other well-known international conferences published 7 papers (oral 2), and as a number of periodicals and conference reviewers, such as Tip, ECCV, ACCV and so on.
____________________________________________________________________________________
Related reading
Pembroke Welsh Corgi Dogs, computer vision, and the power of deep learning
The 2014 Human Computer Interaction (CHI) Conference
The genesis of graphic science--written on the edge of SIGGRAPH 40 years
Welcome attention
Microsoft Research Asia Renren homepage:http://page.renren.com/600674137
Microsoft Research Asia Weibo :http://t.sina.com.cn/msra
From:http://blog.sina.com.cn/s/blog_4caedc7a0102uyjj.html
The newest hotspot in the field of computer vision from CVPR 2014