1-gmm-hmms speech Recognition System-Framework Chapter

Source: Internet
Author: User

This paper mainly introduces the traditional speech recognition system based on Gmm/hmms.

Outline: Recognition Principle Statistics Model system framework

First, it is necessary to explain that the object discussed in this article is Continuous speech recognition (continuous Speech recognition, CSR), which means the recognition of isolated words based on DTW(Dynamic time warping) (Isolated Word recognition) is not within the scope of the discussion (out-of-date). At the same time, the whole paper focuses on the automatic speech recognition decoding process (recognition process). 1. Principle of recognition

First understand that our voice is a sound wave, is an analog signal, generally stored in the computer as a WAV file (no compression format) or can be obtained directly from the microphone acquisition (online).

preprocessing and digitizing operations are required: filtering noise reduction, pre-emphasis (high frequency), endpoint detection, window framing, and the decomposition of our speech signal into many small segments of speech (voice frames). Generally, the length of each frame is 25ms, the adjacent two frames have 10ms overlap, that is, often said frame length 25ms, frame shift 10ms.
Then, we do the signal analysis for each frame, to further compress the data, also known as feature extraction , the common characteristic parameters are: MFCC,PLP. After feature extraction, each frame is compressed into 39-dimensional MFCC characteristic parameters by the original hundreds of record points. (It's a lot easier in a moment)

Next, is how to transform a series of characteristic parameters into a paragraph of the question. That is, the acoustic model (GMM-HMMS), the language model comes in handy. First we need to know that a word consists of a sequence of words consisting of a string of phonemes (phoneme) (such as bal:/b//ɔː//l/). Usually in English we choose to establish the hidden Markov model (the Chinese modeling unit is usually the phonology), that is, a phoneme corresponds to a hmm, and usually a hmm consists of three states (state). Okay, we're in the opposite, we have a characteristic parameter sequence, the process of recognition, is to solve how each characteristic parameter is identified as a state, and then from the state to the phoneme, phoneme to word, word to Word sequence (a word). The characteristic parameters to the state are solved by GMMs (mixed Gaussian model); three states to a phoneme, solved by hmm; phoneme to word, solved by dictionary; Word to Word sequence, solved by language model. Of course, throughout the process, we are all in a state network (time-state), all based on HMMs. This is also why it is said that HMMs solved the problem of speech recognition.

Statistical Models

The task of automatic speech recognition (Automatic Speech recognition, ASR) is to map a segment acoustic signal to a string of text. (Modeling is the first step in our actual solution to the problem)

W∗=ARGMAXWP (w| X) (1) w^*=\mathop{argmax}_{w}p (w| X) \tag{1}

Wherein X=xti=x1x2,⋯,xt,⋯,xt x=x_{i}^{t}=x_1x_2,\cdots,x_t,\cdots,x_t represents a length of T-t acoustic signal (voice frame), W=wni=w1w2,⋯,xn W=w_{i}^{n}=w _1w_2,\cdots,x_n represents a sequence of words with a length of n n (Word sequence), w∗w^* is the most likely sequence of words in all, that is, our recognition results.
However, the formula (1) is difficult to calculate directly (for generative models). We perform a Bayes transformation:

P (w| X) =p (x| W) P (w) p (X) ∝p (x| W) P (W) (2) p (w| X) =\frac{p (x| W) P (w)}{p (X)} \ \ \quad\quad\quad\quad\propto P (x| W) P (w) \tag{2}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.