Computer vision approaching inflection point, Microsoft algorithm is crossing "one step Away"

Source: Internet
Author: User

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7A/1B/wKioL1aiLLXwC0lFAAIiLesVFvM376.jpg "title=" Computer vision1.jpg "alt=" Wkiol1aillxwc0lfaaiilesvfvm376.jpg "/>

In almost all sci-fi movies, artificial intelligent robots can use computer vision to dynamically identify the surroundings, people and objects as humans do. "Star Wars 7" in the new robot BB-8, but also through the head of the cooperation, to create a variety of Meng Meng eyes and expressions.

However, compared with thousands of years of human culture, computer science itself is not a decade old. The human eye can easily identify a variety of 3D images, and for computers it is just a collection of countless pixels. How to let the computer analyze these captured pixel points, "recognize" out of various objects, has been the computer vision to overcome the difficulties.

As one of the largest companies in the world, Microsoft Research has long been trying to open the eyes of machines like other technology organizations. At the end of 2015, Microsoft won all three major projects in the world's top computer Vision Challenge, imagenet. At the same time, hardware-side Microsoft also developed a visual recognition model for intelligent terminal compression algorithm.

Can say, now from the computer vision of a comprehensive breakthrough, only one step away.

Hundred-Layer convolutional neural network

One of the projects of the Imagenet Challenge is the classification of 1000, 1.2 million Internet images, each of which is manually labeled with 5 related categories, and the result of computer recognition is the same as the manual label category. For this picture set, the human eye identification error rate of about 5.1%, currently only Google and Microsoft, and other individual teams of the algorithm can achieve less than 5% results.

December 10, 2015, Microsoft Research Asia Vichier in the Imagenet computer recognition challenge again to break the record, access to image classification, image positioning and image detection of all three major projects champion, the system error rate reduced to 3.57%.

In the field of computer vision recognition, convolution neural network is the main recognition algorithm. Microsoft Research Asia Vichier, chief researcher Sun Jian, explained that the 3.57% breakthrough was due to the fact that his research team used the previously unprecedented deep convolutional neural network algorithm, up to 5 times times more than any previous successful algorithm layer.

At present, the neural network algorithm is generally 20 to 30 levels, and in this challenge, the Microsoft team's convolutional neural network implemented 152 layers. Sun Jian's team also adopted a new "residual learning" principle, which solves the contradiction between the level number and the accuracy of the neural network. Sun Jian said, "deep residual network" is very common, will greatly improve the study of computer vision.

To push visual recognition into the smart terminal.

Imagenet Challenge continues to inspire the participating research institutions to develop and optimize the computer vision recognition algorithm, but in the actual commercial application, it is necessary to put the scientific algorithms that have been researched, with simple and easy to use, efficient and reliable engineering method in smart phones and other intelligent terminals.

Microsoft Research Asia Hardware Computing group and Tsinghua University Electronic Engineering department has been in the field of hardware acceleration for nearly 10 years, starting from 2013, both sides to study how to combine the deep convolutional neural network and intelligent hardware, the result is based on FPGA (programmable chip) technology of the A-eye vision chip.

FPGA has important value in image recognition and compression algorithm for large data processing by convolution neural network algorithm. Microsoft's research shows that the high-end GPU's image processing power is 2 to 3 times times the FPGA, but the power consumption of the FPGA is about 1/10 of the high-end GPU, multiple FPGAs can be combined with low power to achieve the GPU processing power.

Microsoft Research Asia, director of Hardware Computing group Xu Ningyi, said that the A-eye vision chip essentially wanted to embed the existing computer vision recognition algorithm into a common camera. Now, the layer of deep neural network algorithm is more and more high, the computing power needs more and more, and the computing, storage and power consumption of the terminal devices such as smartphones are very limited, so a-eye is essentially through software compression and custom hardware acceleration, so that the computer vision recognition algorithm is suitable for the ordinary intelligent terminal.

A-eye Vision Chip includes a software-based compression algorithm and FPGA-based hardware implementation, can compress a typical convolutional neural model nearly five times times, and in a 2cmx2cm area of FPGA chip implementation. The A-eye vision Chip is a versatile platform for any deep convolution neural network algorithm. After compression, an original 500M visual recognition algorithm can be compressed to about 100M.

Distance to Universal vision recognition "one step Away"

The current camera can complete image recording, storage and other functions, and embedded in the camera head a-eye vision chip, you can make the camera with visual comprehension ability. High-performance intelligent vision chip based on A-eye technology can be widely used in intelligent security, infant and elderly care, unmanned cars and unmanned aerial vehicles and other areas requiring computer vision. Security cameras in the city if equipped with a A-eye chip, you can also find a "wearing a yellow shirt back black backpack", and do not miss every frame of image.

So, with hundreds of layers of neural networks and compression technology for intelligent terminals, are intelligent terminals comparable to human visual recognition capabilities? Not yet. Xu Ningyi said that although the A-eye technology can be engineered to use a lower cost, smaller general-purpose chip to achieve the visual identity of smartphones or webcams, but because the deep neural network algorithm is only for a certain class of image data "recognition", similar to human eyes such a general image recognition, there is still a way to go.

The so-called deep neural Network algorithm or machine algorithm, itself is only a certain kind of graphic image data mathematical modeling, that is, according to the application scenario to establish the corresponding mathematical model. Taking the flower identification as an example, Microsoft Research Asia and the Chinese Academy of Sciences in cooperation, the 2.6 million flowers photos of the deep neural network training, the general mathematical model of Flower class. Based on this mathematical model, the camera can be scanned in any flower images to identify, exactly named flowers and their respective boundaries, doors, outlines, eyes, branches, genera, species.

In other words, the current visual recognition algorithms and compression techniques are just a tool for building mathematical models. A huge amount of animal pictures into this tool, the other end of the mathematical model can only identify animal pictures, and so on. Therefore, based on Microsoft's research results, can develop a specific scenario, commercially available computer vision intelligent Solutions.

So, how far are we from general-purpose computer vision intelligence? Xu Ningyi says this vision may not be difficult to achieve. The so-called "see the world to know what the world is like", one possible way is to generate a semi-general image recognition model through the search engine, and then the specific image to be identified with this semi-general model of the combination, quickly get customized, scene model.

From the current research progress of Microsoft and other companies, human-to-commercial universal real-time computer vision has approached the inflection point. There is still a driving force to ignore: the Global 5G technology research and Development has been launched in early 2016, the mobile network bandwidth will soon no longer be a problem, smart phones and other intelligent terminal performance is also in the rapid improvement. (Text/Ningchuang, this article first Titanium media, Itvalue)

"More exciting content in the era of cloud technology No.: Cloudtechtime "


This article is from the "Cloud Technology Age" blog, please be sure to keep this source http://cloudtechtime.blog.51cto.com/10784015/1737783

Computer vision approaching inflection point, Microsoft algorithm is crossing "one step Away"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.