CS224D Lecture 14 Notes

Last Update:2015-08-11 Source: Internet

Author: User

Tags dnn

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Welcome reprint, Reprint annotated Source:

Http://www.cnblogs.com/NeighborhoodGuo/p/4720985.html

This lesson is also invited to the guests to lecture, is said to be a doctor Standford. The subject is neural Networks in Speech recognition

But speech recognition is profound, this class is like the lecturer in the overview before class said, just do a perspective

Let everyone have a brief understanding of this aspect, how to operate, this is not detailed to say ... Many of the model and recommended papers in the class said some of the model parts I am not familiar with, I am really painful to see. Take a moment to record some of the difficulties and the areas you do not know, if you have the opportunity to do this work, then turn out the detailed study.

Before the class gave a overview, and then the whole class was followed by this outline, the context is clear.

Speech Recognition Systems Overview

, Speech recognition is mainly divided into three parts. The first part is noise reduction This is not covered in this lesson, the second part is transcription, this is the content of this lesson, and the third part is understanding, and all that was said before is about the third part.

Finally, the lecturer recommended a Dataset named speech data named Switchboard.

http://www.isip.piconepress.com/projects/switchboard/

I found such a corpus on the Internet. I don't know the same as the instructor said.

Hmm-dnn (Hybrid) Acoustic modeling

First, in order to introduce HMM-DNN first introduced a speech recognition field once rage Hmm-gmms

GMMs I learned when I was watching CS229, hmm also saw in the PGM class, but the two things together or not quite understand.

The above layer is hmm; The following is the calculation p (x|s), according to the state to find the probability of corresponding features. This makes me a little puzzled, our aim is to seek the hidden state in Hmm, first S is unknown, how can it be based on it, and secondly the purpose is to seek hidden state for P (X|s) what is the meaning of

Then introduced the Hmm-dnn Hybrid acoustic Models

Compared to HMM-GMM, acoustic model was replaced with DNN.

This model is said to have a long history, but why has it only recently come to the fore?

One of the most important reasons is that the speed of our computers is now fast enough to support, run such a large model, and be able to run multiple experiments, making it possible to optimize.

The previous model was mostly a single-layer NN, which is now multi-layered model. It is also important to use the Non-linear model much better than before.

In experiment, the lecturer used a dataset called Timit as a test.

The number of layers in the model is too small to be able to capture features, but too much is also very easy to overfitting performance will also fall.

When choosing the Non-linear function, the rectify function is less costly to error when it is in BP, making it better than tanh in terms of performance.

Two methods of optimizing DNN

The first is to replace the ordinary dnn with CNN, which is superior to the information extracted from the distorted sound.

The second is to replace the normal DNN with the recurrent NN.

Hmm-free RNN Recognition

This translates the traditional sub-phone extraction into the collapsing function.

The word output no longer takes the whole word as a unit, and the part of the word fragment as a unit

For a period of time when the gap between the voices is not pronounced, the "_" occupies the position.

Using RNN is a lot lower than normal nn Error rate.

One advantage of this model is that it can be created for words that are not in the corpus.

Conclusion

HMM-DNN is currently the best speech recognition model

The instructor finally predicts that the speech recognition model in all our electronic devices will be replaced by HMM-DNN in the near future.

Link:

Speech recognition model's Open source project

http://kaldi.sourceforge.net/

Two datasets:

Timit

http://blog.163.com/gz_aaa/blog/static/37834532201471881923177/

http://www.fon.hum.uva.nl/david/ma_ssp/2007/TIMIT/

Switchboard

http://www.isip.piconepress.com/projects/switchboard/

CS224D Lecture 14 Notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More