C ++ implements the memory storage model of the Speech Recognition dictionary

Last Update:2013-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For a given dictionary, the following [html] i1 ii i1 one ii i1 i1 one ii i1 ii i1 ii i1 i1 1111 ii i4 ii i1 ii i4 ii i1 1117 ii i1 ii i1 i1 q i1 1113 ii i1 ii i1 ii i1 s an1 1119 ii i1 ii i1 ii i1 j iu3 1112 ii i1 ii i1 ii i1 i1 ee er4 1115 ii i1 ii i1 ii i1 i1 uu u3 1118 ii i1 ii i1 ii i1 B a1 1116 ii i1 ii i1 ii i1 l iu4 1114 ii i1 ii i1 ii i1 s iy4 1110 ii i1 ii i1 ii i1 i1 l ing2 a seven ii i1 ii i1 q i1 here skipped when many words are used for speech recognition, dictionary used for training and Decoding This article describes the memory storage model for dictionary processing. 1. The model requirement model includes the number of words in the following information dictionary, including normal words and special words. The start and end words of a sentence or grammar. Pause words are mute words used for Mute modeling. The number of words that are pronounced. Some words may have multiple pronunciations, such as "and", which are common multiphoneme words. Whether the language model depends on words. 2. model implementation [cpp] class Vocabulary {public: int nWords; // total number of words char ** words; // array of all words, subscript is the ordinal number of words, including normal words and special words int nNormWords; // number of normal words int * normWordInds; number of normal words char specWordChar; // special words are identified as special words, such! 1. It indicates a special word int nSpecWords; // The number of special words int * specWordInds; // an array of the ordinal numbers of all special words int sentStartIndex; // The number of words starting with a sentence or grammar int sentEndIndex; // the number of words ending with a sentence or grammar int silIndex; // The Mute word bool fromBinFile; // The constructor Vocabulary (); vocabulary (const char * lexFName, char specWordChar _ = '\ 0', const char * sentStartWord = NULL, const char * sentEndWord = NULL, const char * silWord = NULL); virtual ~ DecVocabulary (); char * getWord (int index); // gets the word based on the given sequence number, which is obtained from the word array. Int getNumPronuns (int index); // gets the number of pronunciations of a word Based on the given sequence number. If it is not a multiphoney, 1 is returned. Bool isSpecial (int index); // whether the word corresponding to the serial number is a special word bool getIgnoreLM (int index); // whether the mark is used for Language Model Modeling is generally dependent, this increases the recognition rate. Int getIndex (const char * word, int guess =-1); // obtain the sequence number based on the word. You can specify the start position to start searching for private: int nWordsAlloc; // nWords records the number of words in the dictionary. The value records the total size of the memory dictionary. Bool * special; // indicates whether the word in the dictionary is a special word int * nPronuns; // how many different pronunciations each word corresponds to/*** adds a word to the memory dictionary, and indicates whether to update the pronunciation **/int addWord (const char * word, bool registerPronun = true) ;}; 3. Open the dictionary file of the constructor parameters during the constructor process, parameter Name: lexFName, FILE * fd. Call while (fgets (line, 1000, fd )! = NULL) read a row from fd, split it into the first domain, and call the member function addWord to add words to the dictionary. Add the start and end words to the dictionary. SpecWordChar is used to determine whether the first byte is specWordChar. Count the number of special and normal words and store them in the corresponding memory (see the class definition above ). 4. Code for adding words: [cpp] int Vocabulary: addWord (const char * word, bool registerPronun) {int cmpResult = 0, ind =-1; // allocate enough space for storage if (nWords = nWordsAlloc) {nWordsAlloc + = 100; words = (char **) realloc (words, nWordsAlloc * sizeof (char *)); nPronuns = (int *) realloc (nPronuns, nWordsAlloc * sizeof (int); for (int I = nWords; I <nWordsAlloc; I ++) {words [I] = NULL; nPronuns [I] = 0 ;}} if (word = NULL) | (word [0] = '\ 0 ')) return-1; if (nWords> 0) cmpResult = strcasecmp (words [nWords-1], word); // make sure that the new word is in the appropriate position if (cmpResult <0) | (nWords = 0) {// The new word belongs at the end of the list words [nWords] = new char [strlen (word) + 1]; nPronuns [nWords] = 0; strcpy (words [nWords], word); ind = nWords; nWords ++;} else if (cmpResult> 0) {for (int I = 0; I <nWords; I ++) {cmpResult = strcasecmp (words [I], word); if (cmpResult> 0) {nWords ++; for (int j = (nWords-1); j> I; j --) {words [j] = words [J-1]; nPronuns [j] = nPronuns [J-1];} words [I] = new char [strlen (word) + 1]; strcpy (words [I], word); nPronuns [I] = 0; ind = I; break ;} else if (cmpResult = 0) {// The word ind = I; break ;}} if (ind <0) error ("failed to add a word <0 "); if (registerPronun) {(nPronuns [ind]) ++; // pronunciation of the registered word} return ind; // return serial number}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C ++ implements the memory storage model of the Speech Recognition dictionary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

C ++ implements the memory storage model of the Speech Recognition dictionary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support