Deep learning part of the direction of Paper, for personal use.
a RNN
1 Recurrent neural network based language model
The RNN used in the language model
2 statistical Language Models Based on neural Networks
Mikolov's doctoral dissertation, which focuses his work on the language model of RNN in tandem
3 Extensions of recurrent neural Network Language Model
Continuation of the RNN, some improvements in the network, such as the use of category information to reduce the parameters of the model
4 A Guide to Recurrent neural networks and backpropagation
RNN Network introduction and optimization algorithm is a good article to understand the RNN network
5 Training Recurrent Neural Networks
Ilya Sutskever's doctoral dissertation, RNN network training has always been a difficult point, introducing the training optimization method of RNN network.
6 Strategies for Training Large scale neural Network Language Models
Introduce some trick of training RNN network training language model
7 Recurrent neural Networks for Language understanding
RNN the work of network semantic comprehension
8 Empirical Evaluation and combination of advanced Language Modeling techniques
This paper introduces some experiences of some language model joint techniques, including the work of RNN language model and other model Combinine.
9 Speech recognition with deep recurrent neural Networks
The work of RNN network used in speech recognition
Ten A neural Probabilistic Language Model
Not Rnn,yoshua Bengio The early use of neural networks for the training of language models, but also for the follow-up RNN for language models paved the foundation.
On the diffculty of training recurrent neural Networks
This paper introduces the difficulties of RNN network training, such as vanishing gradient, and some solutions proposed.
Subword Language Modeling with neural Networks
The word-level language model does not adapt to new words because of the oov problem, while the character-level language model can overcome this problem, but the complexity of model training should be improved.
In order to combine the two characteristics, the RNN language model training of sub-word level is proposed, and the model parameters are compressed with K-means.
Performance analysis of neural Networks in combination with N-gram Language Models
On the performance analysis of the combined model of N-gram and neural network language model, the performance will be improved from the point of view of experiment.
Recurrent neural Network based Language Modeling in meeting recognition
Using RNN and N-gram to improve the performance of speech recognition system with revaluation scores
Two DNN
1 A Practical Guide to training restricted Boltzmann machines
Introduce the RBM and the N multi-trick when training the RBM, if you want to implement the RBM algorithm, this article must look
2 A Fast Learning algorithm for deep belief nets
Hinton Classics, deep learning of the mountain, is the beginning of the deep learning eruption
3 A Learning algorithm for Boltzmann machines
85 Older introduction how Boltzmann training algorithm
4 greedy layer-wise Training of deep Networks
can be regarded as Yoshua Bengio to the 06 Hinton work continuation and summary, and 06 article is very complementary, is the entry deep learning essential article
The article also introduces some trick, such as how to deal with the first layer node as the real value of the situation, etc.
5 Large scale distributed deep Networks
Google's Jeffrey Dean group work, the Distbelief framework, mainly introduces how Google uses distributed and model segmentation to deal with deep networks to accelerate its training effectiveness.
6 Context Dependent pretrained deep neural Networks fo Large vocabulary Speech recognition
Microsoft's successful use of speech, the voice recognition system relative error rate dropped more than 20%, is deep learning in the industry's first successful case, its impact sensation.
7 Deep belief Networks for phone recognition
Hinton Group uses DNN for early speech work, which is the foundation of Microsoft's work
8 Application of pretrained deep neural Networks to Large vocabulary Speech recognition
DNN in large vocabulary session speech recognition, there are some voice search and YouTube experiments reported
9 An empirical Study of learning Rates in deep neural Networks for Speech recognition
Some of the tuning experience of Google's DNN-HMM speech recognition system on the learning rate
Ten acoustic Modeling using deep belief Networks
The early work of the Hinton Group on phonetics is mainly about how to apply DNN to acoustic model training
Neural Networks for acoustic Modeling in Speech recognition
Some of the industry giants such as Microsoft, Google and IBM have shared views on DNN's speech recognition
Belief Networks Using discriminative Features for Phone recognition
Hinton Group and IBM work on training DNN networks using some distinguishing features, using LDA to reduce dimensions to 40-D
A Comparison of Deep neural Network Training Methods for Large vocabulary Speech recognition
DNN experimental comparisons, such as the use of different pre-training methods: differentiation pre-training and DBN-generation pre-training mode comparison, and neuron nonlinearity change
Asynchronous Stochastic Gradient desent for DNN Training
Chinese Academy of Sciences, asynchronous GPU parallel training, the idea is basically similar to distbelief, but the hardware replaced by the GPU, the model did not do the segmentation
Improving deep neural Networks for LVCSR using rectified Linear Units and dropout
Enhance DNN-HMM system with relu and dropout technology
Improving the speed of neural networks on CPUs
Google accelerated neural network forward propagation speed, such as the use of fixed-point computing, SIMD technology, etc.
Improved bottleneck Features Using pretrained deep Neural Networks
Related work of Microsoft DNN-HMM system
Improved feature processing for deep neural Networks
Using feature processing technology to enhance the DNN-HMM system, in particular, 13-dimensional MFCC feature splicing 9 frames, lda-mllt transformation, and finally
can also be added to the SAT module to get processed 40-D features, as a DNN-HMM system
Improving neural networks by preventing co-adaptation of feature detectors
This paper mainly describes the dropout technique and its experimental comparison results, and considers dropout as the result of model averaging.
Exploiting sparseness in deep neural Networks fo Large vocabulary Speech recognition
Using soft regularization and convex constraint means to make the DNN model more sparse, the purpose of thinning is
Reduce the complexity of the model, increase the computational speed and the generalization ability of the model
Feature learning in deep neural Networks studies on Speech recognition Tasks
This paper mainly discusses DNN network from the angle of feature learning, discusses why DNN network deeper better, why DNN can learn more lupin features and so on.
Improving neural Networks with dropout
Hinton student Nitish Srivastava's master thesis, mainly discusses the function of Droput technology in neural network.
Learning Features from Music Audio with deep belief Networks
DNN Depth Network in the application of music classification, characterized by MFCC, category for Hiphop, blues and other genre types
Low-rank Matrix factorization for deep neural Network Training with high-dimensional Output Targets
IBM's work, using low rank matrix decomposition technology to solve the problem of DNN classification layer weight parameter too much
Multilingual Training of deep neural Networks
DNN Multi-language applications, tuning when only the classification layer parameters can be
A cluster-based multiple deep neural Networks Method for Large vocabulay continuous Speech recognition
By using category information to train the data, the small model information trained by all data is integrated into the Bayesian framework, which accelerates the whole training process, but the accuracy can be lost and decoded
It slows down too.
Restructuring of deep neural Network acoustic Models with Singular Value
This paper proposes using SVD technique to compress the weight matrix and reduce the complexity of the model.
Sparse Feature Learning for deep belief Networks
Marc ' Aurelio Ranzato proposes a way of unsupervised feature learning, which has the advantage of low-dimensional and sparse features,
In this paper, the RBM and PCA methods are compared.
Training products of experts by minimizing contrastive
Hinton proposed the Poe model, the article discusses how to train the Poe model, the RBM model is also a special Poe model, RBM training has evolved from this, if
To understand the principle of CD algorithm, this article must read.
Understanding how deep belief Networks Perform Acoustic modelling
This paper mainly discusses the reasons why the DBN model can achieve better system performance in acoustic model training, but there is no theoretical support.
pipelined back-propagation for context-dependent deep neural Networks
Using multi-GPU technology to pipelined the network in parallel, some parallel measures, such as data parallelization and model Parallelization, are also mentioned in this paper.
Recent advances in deep learning for Speech, in Microsoft
This paper mainly introduces the progress of Microsoft's work in deep learning, such as regression primitive feature, multi-task feature learning, adaptive DNN model and so on.
Rectified Linear Units Improve Restricted Boltzmann Machines
This paper introduces the application of Relu technology in RBM model, that is, the substitution of nonlinear layer.
Reducing the dimensionality of Data with neural Networks
Hinton published in the Science article, mainly introduces how to use the neural network for nonlinear dimensionality reduction, the paper compares the PCA linear dimensionality reduction Technology
Data normalization in the learning of Restricted Boltzmann machines
The trick of data processing in RBM training makes RBM training more Lupin by the 0 mean value processing.
connectionist probability estimators in HMM Speech recognition
The method of early neural network used in acoustic model training is the foundation of DNN-HMM work now.
Learning for robust Feature Generation in audio-visual Emotion recognition
Deep learning in the application of emotion analysis in audiovisual system, the hybrid training model of multiple visual signals and auditory signals is presented in this paper.
Panax Notoginseng improving Training time of deep belief Networks Through Hybrid pre-training and Larger Batch Sizes
The combination of the pre-training and the differentiated pre-training is adopted, and the size of minbatch can increase the granularity of data parallelism.
Training Restricted Boltzmann machines using approximations to the likelihood Gradient
A new algorithm for training RBM PCD, unlike the CD algorithm is only one Markov chain, parameter update without restarting a new Markov chain, of course, a
Assuming that the parameter is updated, the model change is not very large, the text also mentions the use of small learning rate.
Classification using discriminative Restricted Boltzmann machines
Differentiated DRBM is proposed, compared to the generated model RBM optimization is the P (x, y) function, the Differentiated DRBM optimization is the P (y|x) function, and here y is the label, the text also proposed a hybrid version.
Learning multiple Layers of Features from Tiny Images
Hinton student Alex Krizhevsky's Master's thesis, mainly DNN work of some tandem
Making deep belief Networks effective for Large vocabulary continuous Speech recognition
Discuss how to effectively train DNN, focusing on how to train in parallel
Optimization techniques to Improve Training speed of deep neural Networks for Large Speech Tasks
IBM's Tara N Sainath team DNN Some tips on how to improve parallelism and reduce model parameters, and IBM mainly uses low-rank matrix decomposition for classification layers.
Although CNN is DNN's evolutionary version, the number of parameters is relatively small, but at present, the best CNN effect in speech recognition is similar to the DNN effect with the same number of parameters.
Parallel Training of neural Networks for Speech recognition
The work of neural network parallel training is mainly divided into two parts: multithreading multi-core parallelization and SIMD-based GPU parallelization.
Accurate and Compact Large vocabulary Speech recognition on Mobile Devices
Google's practical work on mobile speech recognition, especially DNN and LM optimizations, DNN's optimizations include fixed-point computing, SIMD acceleration, Batch lazy computing, and frame skipping technology
The language model also does some compression skill. Refer to practical articles of great value.
Cross-language knowledge Transfer Using multilingual deep neural Network with Shared Hidden Layers
DNN Multi-language training, all languages share the same hidden layer features, and the classification level to different languages, this training reduces the 3-5% around, the reason is somewhat similar to transfer learning,
The knowledge between different languages can be transfer for reference.
Improving wideband Speech recognition using Mixed-bandwidth Training Data in CD-DNN-HMM
Using 8-khz and 16-khz to do different frequency bands of CD-DNN-HMM hybrid training, which is more important is how to design different frequency bands Filter-bank alignment problem,
There are also some training techniques for filter-bank, such as whether to use dynamic characteristics and static characteristics training.
Robust Visual recognition Using multilayer generative neural Networks
Hinton student Yichuan Tang's master's thesis, DNN series of work on visual recognition
Deep Boltzmann Machines
The DBM model begins with the article.
Rectified Linear Units for Speech processing
Performance analysis of Relu in speech recognition
Three CNN
1 deep convolutional Network Cascade-Facial Point Detection
CNN uses a face-critical detection job.
2 applying convolutional neural Networks concepts to Hybrid nn-hmm Model for Speech recognition
CNN used in speech recognition system
3 ImageNet classification with deep convolutional neural Networks
12 Hinton Group in the Imagenet contest of the CNN algorithm, but the details are not much, inside the network introduced the use of trick, especially Relu
4 gradient-based Learning applied to Document recognition
Yann LeCun's classic article, CNN, to understand that CNN must first read this
5 A theoretical analysis of Feature Pooling in Visual recognition
The principle analysis of pooling in visual recognition and the summary of some similar methods such as hog and sift in visual recognition
6 What's the best multi-stage Architecture for Object recognition
This paper discusses how to design multi-level structure to obtain better recognition performance on the or problem, and discuss the problem of model architecture, such as how to structure
Get the invariant of the feature, how to go to the joint level of information, do visual should take a good look at this article
7 Deep convolutional neural Networks for LVCSR
CNN is actually using it on LVCSR.
8 Learning mid-level Features for recognition
This paper should look at the analysis of the current visual recognition framework and the relationship between the framework parts, such as coding and pooling technology.
9 convolutional Networks and applications in Vision
Convolutional networks in the visual application of the analysis, do visual should look. The thought of layering is a good internal expression in visual application. In this paper, the convolution network is split into
The Filter bank layer, nonlinear layer and pooling layer are analyzed.
Ten convolutional neural Networks applied to house Numbers Digit classification
Convolution network used in the housing digital classification case, the paper uses the LP pooling technology, through the Gaussian kernel to produce increased stronger feature weight, suppress weaker feature weight effect.
Visualizing and understanding convolutional Networks
Convolution network feature visualization work, very meaningful work, through the deconvnet way to visualize the characteristics of convolutional network layer, with these features can help us to adjust the model.
Stochastic Pooling for regularization of deep convolutional neural Networks
The stochastic pooling technique is proposed, which is different from the form of Max pooling and average pooling,pooling, which is randomly selected,
The paper argues that the stochastic pooling technique is similar to dropout, which is equivalent to the input image through adding noise to form many different copy training samples through the max pooling layer, effectively preventing overfitting
Adaptive deconvolutional Networks for Mid and high level Feature learning
The non-supervised learning method of middle and high level features is reconstructed by deconvolution way to learn the image features.
Best practices for convolutional neural Networks applied to Visual Document analysis
Practical convolutional network work, the article mentions how to deal with training data less than the method can be referenced.
Multi-column deep neural Networks for Image classification
Combine multiple deep network models to do averaging processing.
Differentiable Pooling for hierarchical Feature learning
A differentiable pooling based on Gauss method is presented, reading this article first to read 13 articles, compared to max pooling, average pooling in the use
Refactoring in deconvolution mode has some advantages.
+ Notes on convolutional neural Networks
A more detailed convolutional neural network, including the calculation of gradients and so on.
Fast inference in Sparse Coding algorithms with applications to Object recognition
Unsupervised learning algorithm PSD, on the basis of the sparse coding framework, adds the limitation of the sparse base near the sparse coding by the nonlinear transformed base.
When the objective function is optimized, some parameters are fixed first, and the thought is somewhat similar to the coordinate gradient descent algorithm.
Neural Networks for Object Detection
Google uses DNN-based (actually CNN) regression to do object Detection, first to precipitate mask, and then pinpoint.
Multi-gpu Training of Convnets
Some engineering techniques of multi-GPU parallel training convolutional networks
Flexible, high performance convolutional neural Networks for Image classification
CNN uses a real-story GPU-trained article, which is an early article.
Multi-digit number recognition from Street View Imagery using deep convolutional neural Networks
The digital image recognition of Google Street View is transformed into sequential sequence recognition problem by CNN, and the traditional OCR digital recognition is usually divided into
And here as a whole sequence of identification, the paper also reported the proposed model in a variety of data sets of the recognition rate. The training framework is also based on Google's distbelief framework.
Four other
1 An Introduction to deep learning
Deep Learning Summary of the brief, relatively short, the text simply refers to some commonly used deep learning model
2 The difficulty of Training deep architectures and the Effect of unsupervised pre-training
This paper mainly discusses the difficulties of depth structure training, analyzes the advantages of pre-training from the point of view of experimental data, and discusses the behavior of pre-training in this paper.
Similar to the regularization weights matrix.
3 Why Does unsupervised pre-training help deep learning
This paper discusses several aspects of non-supervised learning to help deep learning, and puts forward the viewpoint of pre-training as a regularizer, and analyzes from the experimental data,
There is no theoretical basis, this is also the deep learning is the most criticized at this stage, there is no complete theoretical system support.
4 Learning deep architectures for AI
Yoshua Bengio in deep learning review article, want to know about deep learning field can first look at this, can sweep look.
5 Representation Learning A Review and New perspectives
Yoshua Bengio's summary article on representation learning.
6 on optimization Methods for deep learning
Several optimization methods of deep learning are discussed in this paper: SGD, L-bfgs, CG. Experiments on the advantages and disadvantages of several optimization methods.
7 Using Very Deep autoencoders for content-based Image retrieval
The image global feature is characterized by the autoencoder of the middle node, which is used for image searching.
8 Deep Learning for Signal and information processing
2013 Dragon Star Machine learning Li Deng Lecture material, mainly focus on deep learning in the voice aspect, more detailed.
9 on the importance of initialization and Momentum in deep learning
This paper introduces the importance of initialization and momentum technology in deep learning, and more on experimental analysis.
Ten dropout Training as Adaptive regularization
This paper analyzes dropout technology from principle, which is equivalent to adaptive regularization technology.
One deep learning via Hessian-free optimization
At present, most deep learning optimization is based on stochastic gradient optimization algorithm, and a second order optimization algorithm based on Hessian-free is proposed in this paper.
Stacking Networks For information retrival
Work on information retrieval for DSN networks
Deep convex net:a Scalable Architecture for Speech Pattern classification
The model that Microsoft has designed to overcome the difficulty of DNN parallelism has great advantages in computational scalability.
Parallel Training of deep stacking Networks
DSN training parallelization
Scalable calable Stacking and learning for Building deep architectures
DSN related articles, the relevant several can be combined together to see
Paper List about Deep learning