Welcome reprint, Reprint annotated Source:
http://blog.csdn.net/neighborhoodguo/article/details/47387229
Imperceptible to the third part, the whole course is almost over, although not the first time the whole public class, but still a little excited about it!
Don't say much nonsense, start summing up!
This class introduces a model, which is a very popular CNN (convolutional neural Networks) in computer vision. However, it is introduced in the application of NLP, the model can be seen in fact there are different applications Oh, maybe our brains are so working, haha pulled away.
Introducing CNN from RNN
RNN is the first to complete a parsing tree and then step-by-step climb (snail?) --), CNN does not have to complete a parsing tree, in the words of the paper is induced Feature Graph, you can generate their own topological structure. It looks pretty good. RNN is a used vector in the same layer will not use the second time, CNN is used by the vector in the same layer can also be used, never mind.
CNN is a lot like convolution, pick a window, then move from left to right, then create layers on the previous layer.
The first part introduces a most simple CNN, which is the structure of a pyramid:
Then use the same calculation steps as in Rnn, step-by-step recursive
This model is simple, but not effective
CNN and pooling
So just think of a good way, the first step and the previous pyramid structure, is to generate the pyramid of the bottom two layers:
Then a vector of C was generated, followed by the process of pooling, c_hat = Max{c}
But there's only one way to extract it. Build several pyramids soon good, not to build a few pyramid of the base, professional point is to use a different width of the window to extract feature map, and then pooling, so extract features more.
Tricks
One trick is said to improve accuracy.
In the final step of the training phase, before Z element-wise by a r,r is Bernoulli random variables.
This can intentionally increase the likelihood of overfitting, and then can train as many as possible features
When using, of course, do not have to multiply R, but weight will be very large, so:
Make it smaller, and then calculate the final result
There is also a trick to avoid overfitting:
There is also a trick--
Because our model is very complex, the dataset is very large, the training time is very long, but the final training results may not be optimal, so at every time, recorded the corresponding weight and accuracy, at the end of the completion of training to use the accuracy highest weight.
Complex Pooling schemes
At the end of the course, we talked about one of the most complicated and most paper CNN, and after reading it, I figured it out.
First floor
The first floor uses the wide convolution
On the left is narrow to the right is wide.
How to convolution exactly? First choose a m as weight matrix
Then generate M
Finally, calculate the first layer:
Second floor
The K-max pooling used in this model is not the same as before, where the first K-Max, unlike the previous model, extracts only the largest
The first is the calculation method of K:
Where L is the total number of layers, L is the corresponding layer, S is the number of words in the sentence. K_top this on its own optimization.
The calculation method is similar to this, the difference is to take out the first k max
Folding
The simplest is to add every two lines together, plus the first is the D line, plus the D/2 line
Last Layer
Get the desired features and finally use fully connected layer to make predictions!
CNN Application Machine Translation
Let's say what kind of CNN you used to generate the topmost feature, and then use RNN to generate the corresponding target language
The pros and cons of various model
Bag of vectors is good for simple classification
Window model is good for the classification of words, long segment of the classification is not
CNNs was blown to the balm.
Recursive nn semantics plausible, but to generate a parsing tree (how I feel this is not reliable)
Recurrent nn is plausible in cognitive science, but now performance is not optimal
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
CS224D Lecture 13 Notes