convolutional network training too slow? Yann LeCun: Resolved CIFAR-10, Target ImageNet
Kaggle recently held a contest on the CIFAR-10 dataset, which contains 60,000 32*32 color images, divided into 10 types, collected by Alex Krizhevsky, Vinod Nair and Geoffrey Hinton.
Many competitors have used convolutional networks to complete the race, some of which have scored against the performance of human abilities in the classification task. In this series of blogs, we will interview the father of three players and convolutional networks, the director of the Facebook AI Lab and professor Yann LeCun of New York University.
is a sample of the CIFAR-10 data set
The following is an interview with Yann LeCun:
What other scientists have contributed significantly to the success of convolutional networks?
There is no doubt that the neuro-cognitive machine (Neocognitron) proposed by Japanese scholar Kunihiko Fukushima has enlightening significance. Although the early forms of convolutional networks (Convnets) did not contain too many Neocognitron, the versions we used (with pooling layers) were affected.
This is a demonstration of the mutual connection between the middle layer and the layers of the neuro-cognitive machine. Fukushima K. (1980) in the neuro-cognitive machine article, the self-organizing neural network model of pattern recognition mechanism is not affected by the change of position.
Can you recall the "epiphany" moments or breakthroughs that occurred in the early days of convolutional network research?
From around 1982, I've been working on a multi-tier network of local nodes (despite the lack of proper learning algorithms, the Backprop BP algorithm didn't appear yet). I got a post-doctoral degree in 1988, and then I was doing a research experiment on shared weights network (GKFX weight nets).
Not a bit earlier. The reasons for the study of convolutional networks are quite simple: the lack of software and data. When I got to Bell Labs, I was exposed to a lot of data sets and running fast computers (at that time), so I could try to build a complete convolutional network, and surprisingly it was working well (though it took two weeks of training).
How do you feel about the recent surge in target recognition in convolutional networks? Have you ever expected it?
Yes, I know it's going to be like this, it's just a matter of time, depending on whether the data set is large enough, whether the computer is strong enough to support deep learning algorithms to do better than human engineers in designing vision systems.
MIT has a "forefront of computer Vision" workshop in August 2011, and I've been speaking on the topic "Everyone will learn this feature for five years (you might as well start now)" and David Lowe (inventor of the SIFT algorithm) has expressed the same view.
LeCun Y. A slide in the 2011 presentation
But I'm still amazed at the speed at which the change took place and the extent to which convolutional networks are better than others, and in my expectation the transition will be more gradual. Similarly, I have been looking at the excellent performance of unsupervised learning.
at The characteristic cognition model is not only a simple classifier, but a complete pipeline. Do you have an in-depth overview of the implementation issues facing your team?
To do this, we have to implement our own programming language and write our own compilers at the same time. As early as 1987/1988 years, Leon Bottou and I wrote a neural network simulator called SN, which is a LISP interpreter containing the numerical library (multidimensional arrays, neural network Graphics ...). )。 We used it in Bell Labs to develop the first convolutional network.
In the early 90, we wanted to use our own code in our products. At first we hired a development team to convert our Lisp code into C + +, but the results of the system couldn't be easily improved (it wasn't an excellent research platform), so I wrote a compiler for SN with Leon and Patrice Simard. Used to develop next-generation OCR engines.
This system is the first to integrate node segmentation, convolution network and graphic model, and train the whole point-to-point.
The graphical model is called "Graph Transformation Network", which is conceptually similar to conditional random field (CRF) or structured perceptron (which precedes it), but allows the inclusion of nonlinear scoring functions (both CRF and structured perceptron allow only linear functions).
The system's overall infrastructure is written and compiled in Sn, and can be deployed on an automated machine and read graphic characters in the 1996, with 10% to 20% of graphic characters readable at the end of the 90.
LeNet 5 Animated Demo
In comparison with other methods, the training of convolutional networks is very slow. How do you trade-offs between experimentation and increased model training time? What is the typical iterative development?
In the experiment, the best large-scale learning system training always cost three weeks, which is not counted as tasks, methods, hardware or data.
I don't know if convolutional networks are "too slow", you mean compared to what? It may be slow to train, but if not, it will take the engineers months of futile effort. In addition, convolutional networks (after training) are actually very fast to run.
In a practical application, no one cares about how long it takes to train, and how long it takes to run.
What are the most exciting articles about convolutional networks? What should we pay attention to?
Over the past more than 20 years, there have been so many ideas about convolutional networks and deep learning that no one cares about, and it's always hard to publish papers, so there are a lot of ideas that have not been tried or published, or have tried to be published but are completely ignored and quickly forgotten, Who remembers the first effective attempt on face monitoring using convolutional networks (as early as 1993, eight years earlier than Viola-jones)?
Vaillant R., Monrocq C., LeCun Y. The original means of target orientation in portrait, 1993
It's amazing to see so many promising young people investing so much in this topic and proposing so many new ideas and applications today. The hardware/software infrastructure is getting better, and the training of large networks in hours or days is becoming possible, so people can try more ideas.
One idea I am very interested in is the "Spectral convolution network". This is a paper published by my peers at the University of New York, ICLR 2014, on the topic that generalized convolution networks can be applied to arbitrary images (the convolution network of rules can be applied to 1d,2d or array of arrays, as the image can be seen as a regular grid). There are actually some real problems, but it opens up a door that allows us to see more application directions to the Convolutional network's unstructured data.
Mnist numbers in the ball body
Source: Bruna J., Zaremba W., Szlam A., LeCun Y. Deep local Connection network in spectral networks and graphics, 2013
I am very interested in convolutional networks and the use of recurrent networks in natural language understanding (along with the groundbreaking work of Collobert and Weston).
given that human error rates are estimated to be 6% about, while Graham the results of the PhD show 4.47% , do you think CIFAR-10 It's been solved?
This problem has been solved as MNIST, but frankly, compared to CIFAR-10, people are now more interested in ImageNet (the largest database for image recognition). In this sense, CIFAR-10 is not a "real" problem, but it is not a bad benchmark for a new algorithm.
What are the requirements for the wider adoption of the convolutional network by industry? Is it easier to train and build the required software for convolutional networks?
To see what you're saying, convolutional networks are already ubiquitous in industry (or almost everywhere), including Facebook,google, Microsoft, IBM, Baidu, NEC, Twitter, Yahoo! and more.
Even so, virtually all of these companies have important research and development resources, and convolutional network training can be challenging for small companies or companies with less advanced technology.
If you have not been trained, it will still take a lot of experience and time to get training on convolutional networks, but there are a few simple, efficient back-end open-source packages that can appear soon.
How far are we from the limits of convolutional networks? Or will CIFAR-100 be "solved"?
I don't think it's a good test, ImageNet will be better.
The training of shallow networks allows you to perform similar complex, well-designed, and deeper convolution architectures. Do deep learning networks really need to be that deep?
Yes, deep learning networks need this. Trying to simulate a deep convolutional network trained on imagenet through shallow network training, you will find that the theoretical deep learning network can be close to the shallow, but in complex tasks, the shallow network is far apart.
Are most of your scholarly writings inherently highly practical and are intentionally or deliberately requested by the company? Can you tell us the difference between theory and practice?
I've been working in academia since 2003, and I've also been a part-time professor at New York University. Theoretical research also helps me understand things. Theory usually helps us to understand what is possible, what is impossible, and to indicate the most appropriate way of doing things for us.
But sometimes theory limits thinking, and some people don't use theoretical models because the underlying theory is too obscure, but in general, a technology that has worked well before people know why it works well, in theory, will get a fuller understanding.
A theoretically fully understood thing will put you in bondage, and you will have to use it in a simple way.
In addition, sometimes the theory makes us blind, for example, some people will be dazzled by the core method attached to the cute math, but as I said before, the final core machine is those who perform "beautify the template match" of the shallow network. This is absolutely true (SVM is a good way), but it has terrible limitations and we should be fully vigilant.
LeCun Y. Slides, from stable features to learning hierarchies, 2013
In your opinion, what is a convolution network that works well and is not affected by the "Why does it work well" theory? Do you usually prefer to do more than theory? How to balance?
I don't think there is a need to choose between execution and theory, and if it does, there is a theory that can be explained.
Besides, what kind of theory are you talking about? Do you mean the generalization limit (generalization bound)? Convolutional networks have a limited VC dimension (VC Dimension), so they are consistent and have a typical VC dimension. What else do you want? Is it a tighter line like the one in SVM? As far as I know, none of the theoretical boundaries will be rigorous enough to apply the actual needs, so I really don't quite understand the problem. Of course, the generic VC dimension is not strict enough, but the non-generic range (like SVMs) will only slightly lack a bit of rigor.
If you want proof of convergence (or assurance), it's a bit more complicated. The loss function of a multilayer network is not a convex value, so it is simple to prove that the function is convex and not OK. But we all know that in fact convolutional networks will almost always converge to the same level, regardless of the starting point (if initialization is done correctly). There is a theory that there are many equivalent local minimums, and there is a small "bad" local minimum value for a number. Convergence, therefore, does not cause problems.
What do you think of the hype about AI? What kind of practice do you think is detrimental to this field (i.e. General AI, especially convolutional networks)?
The hype of artificial intelligence is dangerous, and it has reduced the artificial intelligence of different means by at least four times times, and whenever it comes to the hype, whether it's from the press, the startups looking to invest, the big company looking for PR, or the academic who's looking for an investment, I'm going to go out loud.
Now, of course, there's a lot of hype about deep learning, and I haven't seen a particularly powerful hype about convolutional networks, more cortical layers, neurons and neurological patterns. Unlike many of these things, convolutional networks can actually deliver good results on useful tasks and are widely used in industry applications.
Facebook are there any interesting items related to convolutional networks? Can you talk about the basic situation?
Face recognition: The convolution network of human face recognition, and also the convolution network of image tagging, is very large.
Graphics that describe the architecture
Source:Taigman Y., Yang M., Ranzato M., Wolf L. "Unrestricted facial recognition", 2014
Recently you published "type four serious researcher", how do you classify yourself?
I am the 3rd, which contains a little 1 and 4 traits.
- People who want to explain or understand learning (perhaps intelligence) at the basic or theoretical level.
- People who want to solve practical problems have no interest in mental system science.
- People who want to understand intelligence, build intelligent machines, and have some interest in how the brain works.
- The primary interest is understanding the workings of the brain, but people who think they need to build operational machine models before they can understand them.
the CIFAR-10 the winner of the challenge do you have anything to say? Do you have any wishes for researchers or interested people who study convolutional networks? What about CIFAR datasets or problems?
I was deeply impressed by the creativity and design skills of the participants, and it was gratifying to promote the progress of the relevant science and technology.
But for independent researchers and interest enthusiasts, it's becoming easier to study these and apply them to large datasets, and I think the winner of CIFAR-10 should be imagenet-1k-128*128, which is a version with 1000 kinds of image network classification tasks. While the image is a standardized 128*128, I see some advantages:
- For a high-end player's operation, the network is small enough to be trained within a reasonable time;
- The network you end up with can be used in practice for useful applications (like robot vision);
- The network can be run in real-time on an embedded platform, such as a smartphone or NVIDIA Jetson TK1.
Prediction of Imagenet
Source:Krizhevsky A., Sutskever I., Hinton. G.E. Imagenet classification of deep convolutional neural Networks, 2012
The need to have a large number of tagged data can be a problem, what do you think about the online training of unmarked data or automated tagging of data using image search engines?
There are tasks like video comprehension and natural language comprehension, and we plan to use unsupervised learning methods. But these forms have time dimension factors that can affect the way we solve problems.
Clearly, we need to design algorithms that learn to perceive the world's architecture without having to be told the name of everything. Many of us have been doing research in this area for years and decades, but there is no perfect solution.
What's your latest research about?
There are two answers to this question:
- I am personally engaged in a project (enough to serve as one of the project's paper authors)
- The projects that are being prepared, the projects that support others, and the proposed projects at the conceptual level, in which I am not involved enough to appear as one of the project's authors.
Project type one mostly at NYU, project type two mostly on Facebook.
The broad areas include:
To explore the non-supervised learning of "invariants", the combination of deep learning and structured forecasting, the consistency of supervision and unsupervised learning, the problem of long-term dependence on learning, the construction of short-term or temporary learning systems, learning plans and a series of actions, the different ways of optimizing functions, from imagery learning to comprehensive reasoning (reading Leon Bottou's excellent opinion paper, from machine learning to machine reasoning, learns the usefulness of efficient reasoning and many other topics.
Original link: convolutional Nets and Cifar-10:an interview with Yann LeCun (translation/Wei Sun Zebian/Zhou Jianding)
convolutional network training too slow? Yann LeCun: Resolved CIFAR-10, Target ImageNet