Deep Learning thesis notes (8) Latest deep learning Overview

Source: Internet
Author: User
Tags neural net

Deep Learning thesis notes (8) Latest deep learning Overview

Zouxy09@qq.com

Http://blog.csdn.net/zouxy09

 

I have read some papers at ordinary times, but I always feel that I will slowly forget it after reading it. I did not seem to have read it again one day. So I want to sum up some useful knowledge points in my thesis. On the one hand, my understanding will be deeper, and on the other hand, it will facilitate future surveys. You can also share your blog with us. Because of the limited foundation, some of my understanding of the paper may be incorrect. I hope you will not give me any comments. Thank you.

 

The thesis in this article comes from:

Bengio, Y., courville, A., & Vincent, P. (2012). Representation Learning: A Review and new perspectives.

This is a new summary of deep learning. But it's so long that I don't understand it after reading it. I translated the first two sections while reading it. It will take time to update later. In addition, due to the limited level, translation and comprehension may be incorrect in some places. I hope you can correct them. Thank you.

In addition, there is a reading-list for deep learning, and it feels good. You can refer to the list to learn.

Http://deeplearning.net/reading-list/

 

The following is your understanding of some of these knowledge points:

Representation Learning: A Review and new perspectives

Summary

The success of a machine learning algorithm mainly depends on the data expression data representation. We generally guess that different expressions will confuse or hide more or less factors that can explain different data changes. Although specific domain knowledge can help design or select data expression, it is also effective to learn expression through general prior knowledge. Moreover, the requirements of AI also force us to find more powerful feature learning algorithms to implement these prior knowledge.

This article reviews some recent work in the areas of unsupervised feature learning and deep learning, including the development of probability models, automated machine learning, popular learning, and deep networks. Through these analyses, we can think about problems that have not been solved for a long time, for example, how to learn to express them well? How can we select appropriate target functions for computation? Are there geometric connections between expression learning, density estimation, and popular learning?

 

1. Introduction

As we all know, the performance of machine learning methods depends largely on the choice of data expression (or feature. It is precisely for this reason that, in order to make the machine learning algorithm effective, we generally need to devote most of our effort to data preprocessing and transformation. This feature engineering is very important, but it is time-consuming and labor-intensive. This disadvantage exposes the disadvantages of the current learning algorithm: It seems powerless in extracting and organizing the data differentiation information. Feature Engineering is a method that uses human intelligence and prior knowledge to compensate for the above shortcomings. To expand the application scope of machine learning, we need to reduce the dependence of Learning Algorithms on Feature Engineering. In this way, we can build new applications faster. More importantly, we have taken a huge step in the AI field. The most basic capability of artificial intelligence is to understand the world ). We feel that this goal can be achieved only when it learns how to identify and unlock the explanatory factors implied in the observed low-level sensory data.

This article focuses on expressing and learning representation learning, or learning a data expression that makes it easier to extract information useful for building classifiers or schedulers. Taking the probability model as an example, a good expression can always capture the posterior probability distribution of the hidden explanatory factors of the observed input data. A good expression is also useful as an input to the monitor estimator. Among the many different ways of expressing learning, this article mainly focuses on deep learning methods: combining multiple nonlinear transformations to get more abstract and more effective expressions. Here, we will review this rapidly developing field, which will also emphasize the specific issues in the current progress. We believe that some basic problems are driving research in this field. Specifically, what makes one expression better than the other? How should we calculate its expression? In other words, how should we extract features? In addition, in order to learn good expressions, what target functions are suitable?

 

2. Why do we care about expression learning?

Expression Learning (also known as deep learning or feature learning) has opened up its own field in the Machine Learning Community and has become a new favorite in the academic world. In some of the top conferences, such as NIPs and icml, we have our regular army (studying its workshops). This year (2013) We have also held a new meeting for it, iclr (International Conference on learning representations) shows that it is favored by academia. Although depth (depth) is a major part of this myth, it cannot be ignored in other ways, because sometimes, prior knowledge can help the learning of expression and draw the eye, it is easier to learn better expressions, which will be discussed in detail in the next chapter. The most rapid progress in expressing learning-related academic activities is that it has had an empirical significance of success in academia and industry. Below are some simple points to focus on.

 

2.1 speech recognition and signal processing

Speech is also one of the earliest applications of neural networks, such as convolution (or latency) neural networks (bengio's work in 1993 ). Of course, after Hmm's speech recognition is successful, the neural network is also relatively quiet. Up to now, the rise of neural networks, deep learning, and expression learning have made great strides in the field of speech recognition. They have played a heavy role in some academic and industrial schools (dahlet Al ., 2010; Deng et al ., 2010; seide et al ., 2011a; mohamedet Al ., 2012; Dahl et al ., 2012; Hinton et al ., 2012) has made breakthroughs, making these algorithms more widely used and productized. For example, Microsoft released a new version of their speech recognition Mavis (Microsoft Audio Video Indexing Service) system in 2012. This version is based on seide et al., 2011a ). Compare the existing acoustic modeling methods of Gaussian Mixture Models that keep the leading position, they reduced the error rate by about 30% in four major benchmark test sets (for example, in the rt03s database, the error rate dropped from 27.4% to 18.5% ). In 2012, Dahl and others studied mythology again. He concentrated on a small large vocabulary speech recognition Benchmark Test (Bing mobile commercial search database, 40 hours of speech) the error rate is reduced to 16% to 23%.

The expression learning algorithm is also applied to the music aspect. It is more advanced than the current polyphonic trancoder (Boulanger-lewandowskiet Al ., 2012) the error rate has increased by 5% to 30%. Deep Learning also won the mirex (music information retrieval) Music Information Retrieval competition. For example, in 2011, the audio was labeled as audio tagging (hamelet Al., 2011 ).

 

2.2 Object Recognition Target Recognition

In 2006, deep learning began to focus mainly on mnist handwritten image classification (Hinton et al ., 2006; bengioet Al ., 2007), which impacted the dominant position of SVMs In This dataset (1.4% error rate ). The latest record is still occupied by the deep Network: ciresanet Al. (2012) the error rate of claiming that his unrestricted version in this task (for example, using convolution architecture) is 0.27%, Which is state-of-the-art. Rifaiet Al. (2011c) maintains a 0.81% error rate in the Knowledge-free version of mnist, Which is state-of-the-art.

In recent years, deep learning has moved its eyes from digital recognition to natural image recognition, the latest breakthrough is to bring the leading 26.1% error rate to 15.3% (krizhevskyet Al ., 2012 ).

 

2.3 natural language processing

Apart from speech recognition, deep learning also has many applications in natural language processing. The distributed expression of symbolic data was introduced by Hinton in 1986. In 2003, bengio and others developed the statistical language model for the first time, called the neural net language models (bengio, 2008 ). They are all based on learning a distributed expression about each word, called word embedding. Added a convolution architecture, collobertet Al. (2011) developed a Senna system, which shares expressions in language modeling, part of speech tagging, chunking (node recognition), semantic role tagging, and syntactic decomposition. Senna approaches or exceeds the current leading method in these tasks. However, it is simpler and faster than traditional schedulers. Learning word embeddings can be combined with learning image expressions in some way, so that you can contact text and images. This method was successfully applied to Google's image search and used a large amount of data to establish a ing between images and problems in the same space (Weston et al., 2010 ). In 2012, Srivastava and others extended to a deeper multi-mode expression.

The neural network language model is also improved by adding recurrence to the hidden layer (mikolovet Al., 2011 ). The improved performance is not only less complex than the current leading smooth n-gram language model, but also reduces the speech recognition error rate (because the language model is an important part of the speech recognition system ). This model is also applied to machine translation statistics (schwenk et al., 2012; Leet Al., 2013), improving complexity and bleu scores. Recursive automatic recursive auto-encoders (generate recurrent network) in full sentence interpretation detection full sentenceparaphrase detection has also reached the existing leading level, which is twice the F1 score (socheret Al) of the previous technology ., 2011a ). Expression learning also uses bordeset Al., 2012 on Word Sense Disambiguation, improving accuracy from 67.8% to 70.2%. Finally, it was successfully applied to sentimentanalysis (glorotet Al., 2011b; socher et al., 2011b) and surpassed existing technologies.

 

2.4 Multi-task and transfer learning, domain adaptation multi-task and migration learning, and domain adaptive

Migration Learning (traditional machine learning assumes that the training data and test data are subject to the same data distribution. If we have a large amount of training data in different distributions, it would be a waste to discard the data completely. How to make proper use of the data is the main solution for Migration learning. Migration learning can migrate knowledge from existing data to help you learn in the future. The goal of transfer learning is to use the knowledge learned from an environment to help learning tasks in the new environment .) A learning algorithm can share the advantages of statistics and knowledge of data migration between different learning tasks. In the following discussion, we assume that the expression learning algorithm has the ability to learn the expression of a subset that can capture hidden factors. This subset is related to each specific task. 1. This assumption is verified by many empirical results and demonstrates that expression learning also has excellent capabilities in the migration learning field.

Figure 1: expression learning discovers the implicit explanatory factors (the red points in the middle hidden layer. Some explain the input (semi-supervised setting) and some explain the goal of each task. Because these subsets overlap, the statistical advantages will be contributed, facilitating the generalization of generalization.

I was impressed by the fact that both of the migration learning challenges in 2011 were defeated by expression learning algorithms. First, in the transfer learning challenge held by a workshop in icml2011, the unsupervised layer-by-layer pre-training method unsuper-vised layer-wise pre-training (bengio, 2011; Mesnil et al ., 2011. The second challenge was held in the same year and won by Goodfellow et al. (2011. In terms of domain adaptation, the target remains unchanged, but the input distribution changes (glorot et al., 2011b; Chen et al., 2012 ). In multi-task learning, expression learning also shows its unique superiority (krizhevskyet Al. (1, 2012); collobertet Al. (2011) because it can share factors between tasks.

 

To be continued ......

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.