Related notes
CTC Learning Notes (i) Introduction
CTC Learning Notes (ii) training and formula derivation
CTC Learning Notes (iii) decoding
CTC Learning Notes (iv) decoding-WFST
CTC Learning Notes (v) Eesen training source History
ICML-2006. Graves et al. [1] introduced the connectionist temporal Classification (CTC) objective function for phone recognition.
ICML-2014. Graves [2] demonstrated that Character-level speech transcription can is performed by a recurrent neural network with mini Mal preprocessing.
Baidu. 2014 [3] Deepspeech, 2015 [4] DEEPSPEECH2.
ASRU-2015. Yajie Miao [5] presented Eesen framework.
ASRU-2015. Google [6] extended the application of Context-dependent (CD) Lstm trained with CTC and SMBR loss.
ICASSP-2016. Google [7] presented a compact large vocabulary speech recognition system that can run efficiently on mobile devices, ACCU Rately and with the low latency.
NIPS-2016. Google [8] used whole words as acoustic units.
2017, IBM [9] employed direct Acoustics-to-word models. Reference
[1]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classfification:labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
[2]. Graves, Alex and jaitly, Navdeep. Towards End-to-end speech recognition with recurrent neural. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772, 2014.
[3]. Hannun, A., case, C., Casper, J., Catanzaro, B., Diamos, G.,elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coate S,a., et al. (2014a). Deepspeech:scaling up End-to-end speech recognition. ArXiv preprint arxiv:1412.5567.
[4]. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., "Deep speech 2:end-to-end speech recognition in 中文版 and Mandarin," CoRR arxiv:1512.02595, 2015.
[5]. Yajie Miao, Mohammad gowayyed, Florian Metze. Eesen:end-to-end Speech recognition using Deep RNN Models and wfst-based decoding. 2015 Automatic Speech Recognition and understanding Workshop (ASRU 2015)
[6]. A. Senior, H. Sak, F. de Chaumont quitry, T. Sainath, and K. Rao, "Acoustic modelling with cd-ctc-smbr lstm Rnns," In ASRU, 2015
[7]. I. McGraw, R. Prabhavalkar, R. Alvarez, M. Gonzalez Arenas, K. Rao, D. Rybach, O. Alsharif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, "personalized speech recognition on mobile devices," in Proc. of ICASSP, 2016.
[8]. H. Soltau, H. Liao, and H. Sak, "Neural speech recognizer:acoustic-to-word lstm model for large vocabulary speech re Cognition, "ArXiv preprint arxiv:1610.09975,2016.
[9]. K. Audhkhasi, B. Ramabhadran, G. Saon, M. Picheny, D. Nahamoo, "Direct Acoustics-to-word Models for 中文版 Conversa tional Speech recognition "ArXiv preprint arxiv:1703.07754,2017.