CS224D Lecture 14 Notes

Source: Internet
Author: User
Tags dnn

Welcome reprint, Reprint annotated Source:

Http://www.cnblogs.com/NeighborhoodGuo/p/4720985.html

This lesson is also invited to the guests to lecture, is said to be a doctor Standford. The subject is neural Networks in Speech recognition

But speech recognition is profound, this class is like the lecturer in the overview before class said, just do a perspective

Let everyone have a brief understanding of this aspect, how to operate, this is not detailed to say ... Many of the model and recommended papers in the class said some of the model parts I am not familiar with, I am really painful to see. Take a moment to record some of the difficulties and the areas you do not know, if you have the opportunity to do this work, then turn out the detailed study.

Before the class gave a overview, and then the whole class was followed by this outline, the context is clear.

Speech Recognition Systems Overview

, Speech recognition is mainly divided into three parts. The first part is noise reduction This is not covered in this lesson, the second part is transcription, this is the content of this lesson, and the third part is understanding, and all that was said before is about the third part.

Finally, the lecturer recommended a Dataset named speech data named Switchboard.

http://www.isip.piconepress.com/projects/switchboard/

I found such a corpus on the Internet. I don't know the same as the instructor said.

Hmm-dnn (Hybrid) Acoustic modeling

First, in order to introduce HMM-DNN first introduced a speech recognition field once rage Hmm-gmms

GMMs I learned when I was watching CS229, hmm also saw in the PGM class, but the two things together or not quite understand.

The above layer is hmm; The following is the calculation p (x|s), according to the state to find the probability of corresponding features. This makes me a little puzzled, our aim is to seek the hidden state in Hmm, first S is unknown, how can it be based on it, and secondly the purpose is to seek hidden state for P (X|s) what is the meaning of

Then introduced the Hmm-dnn Hybrid acoustic Models

Compared to HMM-GMM, acoustic model was replaced with DNN.

This model is said to have a long history, but why has it only recently come to the fore?

One of the most important reasons is that the speed of our computers is now fast enough to support, run such a large model, and be able to run multiple experiments, making it possible to optimize.

The previous model was mostly a single-layer NN, which is now multi-layered model. It is also important to use the Non-linear model much better than before.

In experiment, the lecturer used a dataset called Timit as a test.

The number of layers in the model is too small to be able to capture features, but too much is also very easy to overfitting performance will also fall.

When choosing the Non-linear function, the rectify function is less costly to error when it is in BP, making it better than tanh in terms of performance.

Two methods of optimizing DNN

The first is to replace the ordinary dnn with CNN, which is superior to the information extracted from the distorted sound.

The second is to replace the normal DNN with the recurrent NN.

Hmm-free RNN Recognition

This translates the traditional sub-phone extraction into the collapsing function.

The word output no longer takes the whole word as a unit, and the part of the word fragment as a unit

For a period of time when the gap between the voices is not pronounced, the "_" occupies the position.

Using RNN is a lot lower than normal nn Error rate.

One advantage of this model is that it can be created for words that are not in the corpus.

Conclusion

HMM-DNN is currently the best speech recognition model

The instructor finally predicts that the speech recognition model in all our electronic devices will be replaced by HMM-DNN in the near future.

Link:

Speech recognition model's Open source project

http://kaldi.sourceforge.net/

Two datasets:

Timit

http://blog.163.com/gz_aaa/blog/static/37834532201471881923177/

http://www.fon.hum.uva.nl/david/ma_ssp/2007/TIMIT/

Switchboard

http://www.isip.piconepress.com/projects/switchboard/

CS224D Lecture 14 Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.