In Shanghai, December 20, Frosty, a five-floor conference room at 1189th Wu Zhong Road, is standing between a projection screen and a whiteboard between Professor Alan Yuille, a hand-holding cheek, thoughtfully. In front of him is an admiring, selected from Tsinghua University, Zhejiang universities with computer-related background of more than 40 students. Because Professor Alan obtained a Ph. D. in theoretical physics under Hawking's guidance, and was also a top scholar in the field of computer vision.
At the invitation of his student Leo Zhu, the founder of Science and Technology, he traveled all the way to China to open a two-day computer vision course. Professor Alan has a full head of silver, deep-set eyes and pale faces look like ordinary white old men, but they differ in depth and breadth of knowledge--with a deep background in mathematics, theoretical physics, Computer science, psychology, psychiatry, and biological behavior.
Since then, Professor Alan began to turn his interest to artificial intelligence, focusing on its branch of computer vision (Computer Vision), and worked at the MIT Artificial Intelligence Laboratory, Harvard University Computer department, and is currently serving at the UCLA Department of Statistics and director of the UCLA Vision Recognition and machine learning Center. For more than 30 years, Professor Alan has been a leading expert in the academic and industrial circles of computer vision.
On the one hand, Professor Alan is here to popularize computer vision knowledge and industry status to Chinese students, on the other hand, to support his student Leo's Entrepreneurial Project "map"-a start-up that focuses on visual understanding, providing information acquisition and human-computer interaction based on image understanding, Dedicated to building the future of machine vision.
See the world from Mukalin
What is computer vision? What did Professor Alan say in the course of the two days? At the application level, what can computer vision technology do?
In a nutshell, computer vision is the cognitive ability to give computers and people the ability to deal with visual information, that is, to make machines with human visual abilities, so that computers can approach the understanding of the human image by learning algorithms, and analyze the obvious visual information deeply. And when the computer realizes the initial understanding of the image and the video, computer vision can help the human to break through the limitation and improve the life.
Unlike computers, humans can quickly see and understand a scene as soon as they open their eyes, because at least half of the neurons in the human cortex are involved in the completion of visual tasks. While computers and human brains operate under very different physical or biological constraints, even if the ideal state of computer vision is closer to human intelligence, in a controlled environment, a computer system can perform a definite task better than humans.
The open tasks of computer vision generally include object detection, face recognition, human behavior recognition and scene understanding. This is also the task that is trying to solve through the technology application. Object recognition is the core issue of computer vision research, but when the computer is not learning, do not know what to look at, obviously can not understand what, so need a system of church computer recognition objects.
In a traditional object recognition system, when a computer tries to study a particular type of digital image, it first detects the salient features of the image, that is, the edge detection (edges detaction) and Image segmentation (segmentation) that Professor Alan teaches in class. Assuming the system needs to recognize the human face, it looks at the edges of the organs such as the eyes, nose and mouth, and then determines the space between them.
This means that computer systems dealing with tens of thousands of objects will become uncontrollable and bulky. Whenever a new object is added to a computer system library, it is necessary to determine the important part of the object from the beginning. While objects have an intrinsic component, but different angle objects look different, so the computer needs to constantly view the edge to determine the location of space, which occupies a large number of storage.
And Professor Alan and Leo Zhu in 2010 to adopt a new approach to solve the problem. The new system they have developed uses a recursive tower structure to represent the structure of the object-the system does not need to be told beforehand what features of the object need to be found, will first determine the details, from the lower-level structure into a slightly more complex shape, and then identify how slightly complex shapes can be combined into a higher part, assembled into a tower structure, The tallest layer represents the model of the whole object.
Based on the research results, Leo Zhu realizes that computer vision has gradually reached the stage from the theory to the application level, and will enter the period of technology explosion in the next few years. In 2012, he had the idea of starting a home.
Machine vision of "according to picture"
After obtaining the support of Professor Alan, Leo Zhu with many years of friends Lin early morning (former Aliyun technical director, ICPC Global Undergraduate Program Design competition Asia's first World champion team member) to create a focus on image understanding of the start-up company "according to the map", and the station in Shanghai.
At present, the main application products based on the graph are focused on face recognition and object recognition. Although humans want computer vision to be close to human intelligence, in face recognition, the computer processing power is higher than the human brain. For this, Sina Science and technology fortunately in accordance with the picture company experience the process of intellectual compaction.
In order to make small white better understanding of how the computer to face recognition, according to the map produced a game, a total of 20 questions, each one selected a person in the real environment of the head picture, so that participants in five people with similar looks in the selection of photos. Because the correct choice of shooting angle is different, plus confusion options how similar, so it is extremely laborious to choose.
The final experiment results show that the correct rate of judging by the human brain is about 50%, and the computer can do more than 90%. For this kind of heart-beating game, the experience has lamented that the face blindness of a friend is simply "cure the Gospel." But this is just a game show, the real product is able to truly and quickly judge a person's identity, can be used in the field of security.
In accordance with the map of this High-tech companies, access control system has applied face recognition technology to quickly match the identity of a person. Sina Science and Technology observes, the staff enters the door will take out registers the authentication the handset, chooses any angle to be self-portrait, if the human face matches, the door will open automatically. According to the picture also has a specialized application display room, when each person near the doorway, the room camera scans to the human existence, will lock the human face in real time, then retrieves the face image to match in the system database, this process is approximately only 3-5 seconds.
Face recognition of the use of a wide range, including the public security system to determine the identity of suspects, security systems identify suspicious personnel identity and so on. At present, according to the picture of this technology has been used in the Jiangsu public security system, will also be extended to other parts of the country. The initial promotion also experienced some twists and turns, according to the figure partner Lin Dawn said, people initially feel not fresh, because face recognition technology has long been applied, but in the precision is not mature enough to use.
Initially, Leo Zhu and map of the joint creation of Human Lin Dawn has machine vision technology but did not find the industry pain point, so in the promotion is also gradually found the application of the scene and direction, early received the approval of Jiangsu Province public security system. Traditional public Security Bureau face recognition accuracy of about 40%, and vehicle identification is limited to the determination of the license plate, can not be accurate to the brand, model, purchase time and owner. To solve this problem, Leo Zhu and Lin started to collect data first, so they started their employees, and each day they took photos of cars on the street and set up a database.
Suzhou Municipal Public Security bureau in charge of Technology deputy Director of the Deputy director Chen Binhua told them: "If your vehicle brand recognition can be done 70%, the vehicle can determine the model, we will consider using your products." "For this difficult point, according to the figure of the staff of the day and night spent three months to finally conquer, only to the accuracy of the identification of Santana more than 90%."
Through this set of "Dragonfly Eye" system, Suzhou Public security can accurately carry out license plate recognition and vehicle identification, with the application of the computer automatically found in the road driving the ability of fake sets of vehicles, which in the national public security belongs to the first. After a year of data accumulation and system improvement, the screening work that would take at least 3-4 hours to complete for a licensed vehicle would now take only more than 10 minutes.
Not long ago, Suzhou Public Security as a breakthrough, quickly destroyed a special theft taxi ceiling lamp, meter of the criminal gangs, greatly improve the efficiency. At present, according to the diagram can be captured in the operation of the camera in the lens of all vehicles to carry out the brand, model, purchase time and license plate number of true and false identification. Subsequently, Fujian and Chengdu and other public security system also carried out with the cooperation of the map.
However, compared to face recognition, simple rigid object recognition of the technical threshold is relatively low, because the object will not be the same as the human face to form different angles and different state of shape distortion. In this problem, according to the figure has been done in 100 million people face recognition static comparison.
In the face recognition application, according to the diagram and the Suzhou City Bureau cooperation developed the static portrait comparison system. In July this year, according to the figure of the photo gallery of the fugitives and the Suzhou Public Security Portrait Library for batch comparison, found 25 in Suzhou have activities of the fugitive personnel clues information. At present, the Green Olympic security system, Zhuhai Aviation exhibition security system has been successfully applied according to the image of face recognition products.
According to the development direction of the map is not only the development of security products, but to create a computer vision in the field of international cutting-edge technology research and development platform. In the future, the image will be extended from the recognition of human face and object to the recognition of body and clothing. For example, to help users determine a person wearing clothes on the brand, and this will be used to enhance the reality of technology, the need for a perfect combination of hardware and software.
What the next step in computer vision can do is unknown to practitioners, as technology evolves faster than imagined, and it is now not appropriate to set limits.
From academia to industry
The research field of computer vision has been in the critical stage of rapid development in the past five years, and academic research has made great progress in the past two years, and the Application field will enter the stage of concentrated outburst in the next year and one year.
According to the National Science Foundation white Paper, computer vision arose in the Sixties or seventies of the 20th century, and since the 2010, computer vision has been confronted with two major problems, one being that the field is closely related to computers, engineering, mathematics, statistics, psychology and neuroscience, However, limited domain is often a state of division research, and the second is that the research in this field is conducted in a unstructured manner, so the relationship between academia and industry is not very close and interaction is not much.
Geographically, according to the White Paper, the 1991 computer vision study was dominated by the United States, with a limited number of activities in Europe and few in Asia. In the past 20 years, the United States has developed a stable computer vision, while in Europe there has been an undeniable expansion. In recent years, a great breakthrough has been made in Asia. To a large extent, the development of Asia and Europe in this area is driven by strong financial support. Even in the United States, most researchers are non-native born.
The development of computer vision mainly benefits from the development of computer, sensing technology and mathematics. In the eyes of computer vision expert E. Adelson, progress is about to occur in "people are learning how to use applied mathematics and engineering to solve visual problems." People are becoming better at control theory, optimization problem, signal processing and so on. ”
Now, academia and industry are becoming more and more closely connected, and the gap between theory and reality is narrowing. Professor Alan said that, especially in these five years, a lot of work is becoming more and more practical, American technology companies such as Microsoft, Facebook, Google, Amazon and Baidu, etc., are in the field of artificial intelligence, "I am very happy to see these progress, because we finally want to be able to convert to practical products ”
This is also for He Yitu to choose at this point in time to engage in computer vision application research. Leo Zhu said: "No one really knows the distance between reality and theory, the next two years of technology is a relatively large outbreak period, this is what we foresee, so this is why we do this thing." "We used to know that computer vision was the future, but we didn't know when it would come, but we had a specific derivation, so we decided to start in 2012, and now we're doing it," he said. ”
As Professor Alan's student, Leo Zhu naturally has a strong academic background, and he did not continue to do research, but to return to business. Leo Zhu hopes to use computer vision as a lifelong career, but as a college researcher can only pass on knowledge through their own students, and the return of entrepreneurship will be translated into practical application, and promote the development of domestic related areas, is he is more inclined to go a path.
Professor Alan says he is duty-bound to support Leo Zhu's choice: "It's not quite the same as it was 10 years ago." Ten years ago there were many students to be professors, and the past five years have changed. My best students want to start a business, and computer vision is a good area, and now is a good time. He explains that the academic achievements of the past are still a long way from solving practical problems, but that there has been significant progress in academic research and will play a huge role in practice.
Professor Alan and Hawking
Professor Alan sensed the need for mutual advancement between academia and industry, and the enormous growth potential in China, so he came to Shanghai to communicate with Chinese students. In the course of two days, the professor has been in continuous lectures, he said this is his first time teaching experience.
However, despite Professor Alan's dry and thirsty lectures, he also accepts questions from his classmates, but he is still patient, methodical and respectful. In addition to the theoretical framework of computer vision, he recalls his academic experience, explores the methods of scientific research with his students, and tells his experiences in his research career.
At the time of his division, Professor Alan was studying relativity and quantum physics, which described the structure in the macroscopic category, while the latter was studied in the microscopic. And he was Hawking's second student to study the subject's desire to unify two theories. "This is a bit of an abstract subject, and it's really hard to put the two theories together, even if it's 30 years from now, no one seems to have been particularly successful." ”
In an interview with Sina Science and technology, Professor Alan shared some stories with his mentor, Stephen Hawking. Hawking, he said, was a very humorous person who sometimes manipulated his wheelchair and circled around in his room. At ordinary times, Professor Alan as the student union and Hawking have a relatively close connection, and sometimes help him to do something, including feeding him to eat things. After Hawking is more famous, many scientists will come to visit him, the professor will listen to the side, but also will often see his family, lunch and look after his children.
In the second half of this year, the film "Everything Theory" release, tells about Hawking Gehrig's illness before the onset of life and wife story. Professor Alan said: "This is based on his wife's book, the story seems to be quite real, especially the actor who plays Hawking, looks and sound and Hawking, I think, Hawking is on the screen." But other people are not like the people I know, so I feel a little strange, like a group of strangers around him to do something. Said he laughed.
This year, Hawking has also sounded a lot of alarm bells about the development of artificial intelligence. He warns that humanity is facing a threat from intelligent technology, and that as technology itself begins to learn to think for itself and adapt to the environment, mankind will face an uncertain future. "There is almost no limit to the technology if we look further forward," Hawking said. Even though we may be facing the best or worst of times in human history, all of us should ask ourselves what we should do to make the ultimate difference. ”
To this, in Alan's view: "Remember about 30 years ago, when Hawking became Professor Lucas Physics (Lucasian Professor This is a title that Einstein and Newton have had), he made a speech to solve theoretical physics, and the result is that machines might eliminate these physicists." I think Hawking may have had this concern since 30, and it's not just the rise of AI that worries, maybe I'll write a letter next week and ask him this question. ”
For the future of Computer vision, Professor Alan believes that the ultimate goal, of course, is to build an intelligent system that can understand the world like a human being, or it may be able to transcend human beings and understand the world better than humans. Because neurons in the brain are mainly used to process vision, they can understand the human brain through vision. In the future, humans will be able to understand the human brain through computers and understand the world in which we exist.
(Responsible editor: Mengyishan)