Computer Vision's awkward---by Lindahua

Last Update:2015-06-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Computer vision is a very active field of AI, the annual conference of the small will continue to publish thousands of articles (only CVPR admission more than 300 per year, a number of second-rate meetings each year more than countless articles), new models new algorithm new application endless. But behind the glitz, where is the foundation?
For vision, although there is no big, but for several years, there is also a glimpse of the view. Vision is exploring a very complex world, how to model such a world, how to analyze it, but has not been universally recognized theoretical system. Most of the research work follows several patterns:
O Get out-of-the-box methods from upstream disciplines (such as stereo geometry, machine learning, optimization, etc.), slightly changing, and applying to a specific application.
O Some shortcomings of the existing model methods, such as adding or simplifying parameters in formulation, or adjusting the solution process.
O Select multiple methods to form an application system.
These efforts have really solved a lot of problems. However, its deficiency lies in the fact that one law is difficult to accumulate. Therefore, the new work published every year, although voluminous, splendid sight, on the core of academic theory, and 10 years 20 years ago, compared to the state, there is no fundamental breakthrough.
In the past year, with the inspiration of the mentors, we have been sit idle in some other disciplines, knowing the breadth of learning. In fact, thanks to Alan for his inspiration, he usually does not have very specific guidance, but he often says, "You can look at some areas, and this problem may have been solved by them in another context decades ago." "At first, I was not convinced--my literature survey in vision showed that it was really a new problem in vision--but when I saw articles in those areas, Had to admire Alan's breadth of knowledge and insight into similar issues in radically different areas.
I'm not going to talk about a topic, but I suggest that a friend of vision will have time to look at some surface applications that are completely different, but the core theory is the same field.
o do sampling, particle filtering, may wish to look at statistical physics (statistical Physics), they have been applied to the Monte Carlo method for decades, accumulated very deep, Some of the new approaches that are likely to be proposed in vision or learning have been put forward by them in a different form or name.
o do tracking, video, and optimization, you can see Cybernetics (Control theory). Control science is extremely thorough in the study of dynamic systems (or other processes that change over time). Alan was supposed to be in control, and I went to see the dynamic system theory and cybernetics, and after reading some chapters, it was like clairvoyant. I used to spend a lot of time on my own. The solution of a set of matrix differential equations is the form of a Peano-baker series under certain conditions that has been deeply explored in control theory. As for conduction model or semi-supervised learning, many viewpoints and methods in cybernetics are helpful.
o do graphical model, and various statistical models, information theory (information theory) is definitely necessary, this does not need me in this wordy. One is called information geometry (information geometry), which is also worth a view.
Compared to the difference between the square. Many of the friends who do vision are theoretical enthusiasts, like to list the formula in paper to show "theoretical depth"-but, I have read most of the formula deduction in the article, is generally the derivation of the rules, its level may not be better than solving a classic textbook math problem. Admittedly, this deduction is an integral part of the entire study, written in the article is also understandable, but if only this is the result of the deduction as theoretical contribution, it is inevitable. The true meaning of the theoretical contributors, not the amount of the formula in the text, but also not in the depth of mathematics, but in whether the internal principles of the problem can be deeply analyzed, found, the words of the people do not speak, to give people with a new inspiration.
Newton's Three laws as a classical physical basis, from the perspective of the current vision, but the summary of the experiment, the conclusion, in addition to the second law has a simple multiplication formula (to the high depths, also is a constant coefficient linear second-order ordinary differential equation), and there is not much mathematics in it. Nevertheless, the towering edifice of classical physics is thus laid. Perhaps this example analogy to the vision of the study may not be appropriate, but it can at least explain that the meaning of theoretical contribution lies in the chaff, that is, to open the complex appearance, to explore the deep but simple law. However, in the theoretical contribution that vision paper claims, how many follow this righteousness and how many are passed down after Qianhua away all.
Theoretical Foundation is not enough, but vision is the applied discipline, if it can be widely used, its significance will be carried forward. Despite decades of effort, vision does have many applications in social life, but it is dwarfed by other disciplines. And not to mention such as communications, software engineering and other long-established global industry, with vision more connected to the video coding,signal processing, and medical image, the breadth of its application is beyond vision. Vision failed to form a proper industrial application, and a real problem was that it faced difficulties, the practical level was not easy to achieve, and the second was quite related to the fact that our research was largely divorced from reality.
I used to study in Hong Kong in the face recognition, this is a very strong application of topic, history is not short, but in the actual condition of the recognition level, do this friend also in mind understand. A lot of people are studying this topic, published "New Method" also many, in paper to identify the correct rate does not reach 90% is not to take the shot-but in those several standard library (even the latest FRGC) on the performance and actual how much difference? A lot of work assume picture area are aligned well, lighting conditions rules, under the condition of the algorithm can achieve 100% of the recognition performance, in the environment extremely complex conditions can be really applied? Until today, a large number of articles are still bored to discuss a variety of subspace, kernel, SVM, boosting change patterns, but never think of the real elements of face recognition, is not the trifles of the lifting.
At the same time, many in the practical engineering practice of trick, for performance improvement, but because there is no "theoretical depth", not presentable, even if the paper, but also in the experimental part of the grass slightly too. However, a method, regardless of whether it was originally proposed or not, if it can solve the problem, there must be a reason. If we can calm down, and temporarily forget the so-called wonderful theory, which is formed by speculation, it is not much more important to explore some of the original theories behind the methods that can really solve the problem. Perhaps every such work is very small, really can accumulate, in time, in the promotion of a certain aspect of the application will have a real cheyne. There may also be opportunities to sum up some really valuable theories.
Since its inception, vision has been in development for decades, but it is still in its infancy and its roots are weak and chaotic compared with many fields. For this reason, it is more challenging for the researchers involved, and every real contribution is particularly valuable. The way of scholarship is not the pursuit of the trend, but in the depth of the original rationale. This is the new term new account for the first time to write a blog, with this, and every love research friends to encourage each other.

Computer Vision's awkward---by Lindahua

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Computer Vision's awkward---by Lindahua

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Computer Vision's awkward---by Lindahua

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support