End to end speech recognition (ii) CTC__ speech recognition

Source: Internet
Author: User
Related notes

CTC Learning Notes (i) Introduction
CTC Learning Notes (ii) training and formula derivation
CTC Learning Notes (iii) decoding
CTC Learning Notes (iv) decoding-WFST
CTC Learning Notes (v) Eesen training source History

ICML-2006. Graves et al. [1] introduced the connectionist temporal Classification (CTC) objective function for phone recognition.
ICML-2014. Graves [2] demonstrated that Character-level speech transcription can is performed by a recurrent neural network with mini Mal preprocessing.
Baidu. 2014 [3] Deepspeech, 2015 [4] DEEPSPEECH2.
ASRU-2015. Yajie Miao [5] presented Eesen framework.
ASRU-2015. Google [6] extended the application of Context-dependent (CD) Lstm trained with CTC and SMBR loss.
ICASSP-2016. Google [7] presented a compact large vocabulary speech recognition system that can run efficiently on mobile devices, ACCU Rately and with the low latency.
NIPS-2016. Google [8] used whole words as acoustic units.
2017, IBM [9] employed direct Acoustics-to-word models. Reference

[1]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classfification:labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
[2]. Graves, Alex and jaitly, Navdeep. Towards End-to-end speech recognition with recurrent neural. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772, 2014.
[3]. Hannun, A., case, C., Casper, J., Catanzaro, B., Diamos, G.,elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coate S,a., et al. (2014a). Deepspeech:scaling up End-to-end speech recognition. ArXiv preprint arxiv:1412.5567.
[4]. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., "Deep speech 2:end-to-end speech recognition in 中文版 and Mandarin," CoRR arxiv:1512.02595, 2015.
[5]. Yajie Miao, Mohammad gowayyed, Florian Metze. Eesen:end-to-end Speech recognition using Deep RNN Models and wfst-based decoding. 2015 Automatic Speech Recognition and understanding Workshop (ASRU 2015)
[6]. A. Senior, H. Sak, F. de Chaumont quitry, T. Sainath, and K. Rao, "Acoustic modelling with cd-ctc-smbr lstm Rnns," In ASRU, 2015
[7]. I. McGraw, R. Prabhavalkar, R. Alvarez, M. Gonzalez Arenas, K. Rao, D. Rybach, O. Alsharif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, "personalized speech recognition on mobile devices," in Proc. of ICASSP, 2016.
[8]. H. Soltau, H. Liao, and H. Sak, "Neural speech recognizer:acoustic-to-word lstm model for large vocabulary speech re Cognition, "ArXiv preprint arxiv:1610.09975,2016.
[9]. K. Audhkhasi, B. Ramabhadran, G. Saon, M. Picheny, D. Nahamoo, "Direct Acoustics-to-word Models for 中文版 Conversa tional Speech recognition "ArXiv preprint arxiv:1703.07754,2017.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.