Welcome reprint, Reprint annotated Source:
Http://www.cnblogs.com/NeighborhoodGuo/p/4720985.html
This lesson is also invited to the guests to lecture, is said to be a doctor Standford. The subject is neural Networks in Speech recognition
But speech recognition is profound, this class is like the lecturer in the overview before class said, just do a perspective
Let everyone have a brief understanding of this aspect, how to operate, this is not detailed to say ... Many of the model and recommended papers in the class said some of the model parts I am not familiar with, I am really painful to see. Take a moment to record some of the difficulties and the areas you do not know, if you have the opportunity to do this work, then turn out the detailed study.
Before the class gave a overview, and then the whole class was followed by this outline, the context is clear.
Speech Recognition Systems Overview
, Speech recognition is mainly divided into three parts. The first part is noise reduction This is not covered in this lesson, the second part is transcription, this is the content of this lesson, and the third part is understanding, and all that was said before is about the third part.
Finally, the lecturer recommended a Dataset named speech data named Switchboard.
http://www.isip.piconepress.com/projects/switchboard/
I found such a corpus on the Internet. I don't know the same as the instructor said.
Hmm-dnn (Hybrid) Acoustic modeling
First, in order to introduce HMM-DNN first introduced a speech recognition field once rage Hmm-gmms
GMMs I learned when I was watching CS229, hmm also saw in the PGM class, but the two things together or not quite understand.
The above layer is hmm; The following is the calculation p (x|s), according to the state to find the probability of corresponding features. This makes me a little puzzled, our aim is to seek the hidden state in Hmm, first S is unknown, how can it be based on it, and secondly the purpose is to seek hidden state for P (X|s) what is the meaning of
Then introduced the Hmm-dnn Hybrid acoustic Models
Compared to HMM-GMM, acoustic model was replaced with DNN.
This model is said to have a long history, but why has it only recently come to the fore?
One of the most important reasons is that the speed of our computers is now fast enough to support, run such a large model, and be able to run multiple experiments, making it possible to optimize.
The previous model was mostly a single-layer NN, which is now multi-layered model. It is also important to use the Non-linear model much better than before.
In experiment, the lecturer used a dataset called Timit as a test.
The number of layers in the model is too small to be able to capture features, but too much is also very easy to overfitting performance will also fall.
When choosing the Non-linear function, the rectify function is less costly to error when it is in BP, making it better than tanh in terms of performance.
Two methods of optimizing DNN
The first is to replace the ordinary dnn with CNN, which is superior to the information extracted from the distorted sound.
The second is to replace the normal DNN with the recurrent NN.
Hmm-free RNN Recognition
This translates the traditional sub-phone extraction into the collapsing function.
The word output no longer takes the whole word as a unit, and the part of the word fragment as a unit
For a period of time when the gap between the voices is not pronounced, the "_" occupies the position.
Using RNN is a lot lower than normal nn Error rate.
One advantage of this model is that it can be created for words that are not in the corpus.
Conclusion
HMM-DNN is currently the best speech recognition model
The instructor finally predicts that the speech recognition model in all our electronic devices will be replaced by HMM-DNN in the near future.
Link:
Speech recognition model's Open source project
http://kaldi.sourceforge.net/
Two datasets:
Timit
http://blog.163.com/gz_aaa/blog/static/37834532201471881923177/
http://www.fon.hum.uva.nl/david/ma_ssp/2007/TIMIT/
Switchboard
http://www.isip.piconepress.com/projects/switchboard/
CS224D Lecture 14 Notes