Algorithm theory and application have achieved great success. Since 2006, a topic called deep learning in the field of machine learning has begun to receive widespread attention in the academic world. Today it has become a boom in Internet big data and artificial intelligence. Deep learning can extract the mapping from the bottom layer to the high level by inputting the hierarchical data structure similar to the human brain to the input data step by step, so that the mapping relationship from the underlying signal to the high level semantics can be well established. In recent years, Google's Microsoft IBM Baidu and other high-tech companies with big data have invested a lot of resources in deep learning technology research and development, and made significant progress in the field of voice image natural language online advertising. Deep learning from the contribution to practical applications may be the most successful research direction in the field of machine learning in the past decade.
In June 2012, the New York Times disclosed Google’s google brain project, which attracted widespread public attention. The project was developed by the well-known Stanford University machine learning professor Ng and the world's top expert in large-scale computer systems Dean, using a 16,000 CPU Core parallel computing platform to train a type of deep neural network (Deep Neural Networks, DNN) machine learning model, which has achieved great success in the fields of speech recognition and image recognition.
In November 2012, Microsoft publicly demonstrated a fully automatic simultaneous interpretation system at an event in Tianjin, China. The speaker gave a speech in English, and the back-end computer automatically completed the speech recognition English-Chinese machine translation and Chinese speech synthesis. Smooth. According to reports, the key technology behind the support is also DNN, [or deep learning (DL)]
In January 2013, at the annual meeting of Baidu, China's largest Internet search engine company, founder and CEO Li Yanhong announced that Baidu Research Institute would be established. The first focus is on deep learning and the establishment of Institute Of. Deep Learning (IDL). This is the first time that Baidu has established a research institute since its establishment more than 10 years ago. In April 2013, MIT Technology Review magazine ranked Deep Learning as the top of the 2013 Breakthrough Technology “Top Breakthrough Technology”.
Why is deep learning so widely valued by academic and industrial circles? What scientific and engineering problems are faced by deep learning technology research and development? How will the advancement of science and technology brought about by deep learning change people's lives? This article will briefly review the development of machine learning over the past 20 years, introducing in-depth learning yesterday, today and tomorrow.
Two waves of machine learning: from shallow learning to deep learning
Before explaining deep learning we need to understand what machine learning is. Machine learning is a branch of artificial intelligence, and in many cases it has become synonymous with artificial intelligence. Simply speaking, machine learning is an algorithm that allows a machine to learn the rules from a large amount of historical data to intelligently identify new samples or predict the future. Since the late 1980s, the development of machine learning has gone through two shallow learning shallow learning shallow learning and deep learning Deep Learning. It should be pointed out that the division of the historical stage of machine learning is a matter of benevolence and wiseness, and different conclusions can be drawn from different dimensions. Here we look at the hierarchy of machine learning models.
The first wave: shallow learning
In the late 1920s, the back propagation algorithm for artificial neural networks, also called Back Propagation algorithm or BP algorithm, brought hope to machine learning and set off a machine learning craze based on statistical models. This craze has continued to this day. It has been found that the BP algorithm can be used to make an artificial neural network model learn statistical rules from a large number of training samples to predict unknown events. This statistical-based machine learning approach has shown superiority in many respects compared to past manual rule-based systems.
The artificial neural network at this time is also called the multi-layer Perceptron. Due to the difficulty of multi-layer network training, most of the actual use is a shallow model with only one hidden layer node.
In the 1990s, various shallow machine learning models were proposed. For example, support vector machine Support Vector Machine, Boosting, maximum entropy method, such as logistic regression, LR, etc. The structure of these models can basically be seen as having a layer of hidden layer nodes such as SVM, Boosting, or no hidden layer nodes such as LR. These models have achieved great success in both theoretical analysis and application. In contrast, due to the difficulty of theoretical analysis and the training methods require a lot of experience and skills, the multi-layer artificial neural network in this period is relatively quiet.
The rapid development of the Internet since 2000 has placed great demands on the intelligent analysis and prediction of big data. The shallow learning model has achieved great success in Internet applications. The most successful applications include search advertising systems, such as Google's Adwords, Baidu's Fengchao system's CTR estimate, web search rankings, such as Yahoo and Microsoft's search engine, spam filtering system, content-based recommendation system, etc. Wait.
The second wave: deep learning
In 2006, Professor Toronto University of Toronto in the field of machine learning and his student Salakhutdinov published an article in the top academic journal Science, which opened up a wave of deep learning in academia and industry. This article has two main messages:
Many hidden layers of artificial neural networks have excellent feature learning ability, and the learned features have more essential characterization of the data, which is conducive to visualization or classification.
The difficulty of deep neural network training can be effectively overcome by “Layer-wise-pre-training”. Layer-by-layer initialization in this article is achieved through unsupervised learning.
Deep learning has continued to heat up in academia since 2006. Stanford University, New York University, and the University of Montreal, Canada, have become the center of deep learning. In 2010, the US Department of Defense's DARPA program funded a deep learning program for the first time. The participants included Stanford University, New York University, and NEC American Research Institute. An important basis for supporting deep learning is that the brain nervous system does have a rich hierarchy. One of the most famous examples is the Huble-Wiesel model, which won the Nobel Prize in Medicine and Physiology for revealing the mechanism of visual nerves.
In addition to the perspective of bionics, the current theoretical research on deep learning is still in its infancy. But there has been tremendous energy in the field of application. Since 2011, Microsoft Research and Google's speech recognition researchers have used DNN technology to reduce the speech recognition error rate by 20%-30%, which is the biggest breakthrough in the field of speech recognition for more than 10 years. In 2012, DNN technology achieved amazing results in the field of image recognition. In ImageNet evaluation, the error rate was reduced from 26% to 15%. In this year, DNN was also applied to the pharmaceutical company's DrugeActivity forecasting problem and achieved the best results in the world. This important result was reported by the New York Times.
As described at the beginning of the article, today's well-known high-tech companies with big data such as Google's Microsoft Baidu are rushing to invest in the technical commanding heights of deep learning, precisely because they have seen that they are more complex and more powerful in the era of big data. The depth model can deeply reveal the responsible and rich information carried in the massive data and make more accurate predictions of future or unknown events.
Big data and deep learning
There has always been a popular view in the industry that simple machine learning models are more effective than complex models under big data conditions. For example, the simplest linear model in many big data applications is heavily used. And the recent dramatic progress in deep learning has prompted us to be able to rethink this point of view. In short, in the case of big data, perhaps only a more complex model or a model with strong expressive ability can fully exploit the rich information contained in massive data. Now we are ready to rethink "big data + simple model". Using more powerful depth models, we may be able to extract more valuable information and knowledge from big data.
To understand why big data requires a depth model, let's take an example. Speech recognition is already a big data machine learning problem. In its acoustic modeling part, it is usually faced with a training sample of one billion to one billion.
In a speech recognition experiment in Google, it was found that the predicted error of the trained DNN on the training sample and the test sample is basically the same. This is very common sense. Because the prediction error of the model on the training sample is usually significantly smaller than the test sample. The only explanation is that because of the rich information dimension in big data, even high-capacity complex models such as DNN are in an under-fitting state, not to mention the traditional GMM acoustic model. So in this example we see that big data requires deep learning.
An important feature of the shallow model is the assumption that the characteristics of the sample are extracted by artificial experience, while the emphasis is mainly on the classification or prediction. Under the premise that the application of the model is not wrong, for example, if the Internet company hires an expert in machine learning, the quality of the feature becomes the bottleneck of the performance of the entire system. So usually more manpower in a development team is devoted to exploring better features. Finding a good feature requires a deep understanding of the problem that developers have to solve. To achieve this level, it is often necessary to repeatedly explore and even grind a sword for several years. Therefore, artificially designing sample features is not an extensible approach.
The essence of deep learning is to learn more useful features by constructing machine learning models with many hidden layers and massive training data, thus ultimately improving the accuracy of classification or prediction. Therefore, the depth model is the means, and the feature learning is the purpose. Different from traditional shallow learning, the difference in deep learning lies in:
Emphasizes the depth of the model structure, usually with 5 layers, 6 layers or even 10 layers of hidden layer nodes;
Clearly highlights the importance of feature learning. That is to say, through the layer-by-layer feature transformation, the feature representation of the sample in the original space is transformed into a new feature space. This makes classification or prediction easier.
Compared with the method of artificially constructing features, using big data to learn features, it is more able to scribe data richer intrinsic information. So in the next few years we will see more and more, depth model + big data applications, rather than shallow linear models.