C ++ implements the memory storage model of the Speech Recognition dictionary

Source: Internet
Author: User

For a given dictionary, the following [html] i1 ii i1 one ii i1 i1 one ii i1 ii i1 ii i1 i1 1111 ii i4 ii i1 ii i4 ii i1 1117 ii i1 ii i1 i1 q i1 1113 ii i1 ii i1 ii i1 s an1 1119 ii i1 ii i1 ii i1 j iu3 1112 ii i1 ii i1 ii i1 i1 ee er4 1115 ii i1 ii i1 ii i1 i1 uu u3 1118 ii i1 ii i1 ii i1 B a1 1116 ii i1 ii i1 ii i1 l iu4 1114 ii i1 ii i1 ii i1 s iy4 1110 ii i1 ii i1 ii i1 i1 l ing2 a seven ii i1 ii i1 q i1 here skipped when many words are used for speech recognition, dictionary used for training and Decoding This article describes the memory storage model for dictionary processing. 1. The model requirement model includes the number of words in the following information dictionary, including normal words and special words. The start and end words of a sentence or grammar. Pause words are mute words used for Mute modeling. The number of words that are pronounced. Some words may have multiple pronunciations, such as "and", which are common multiphoneme words. Whether the language model depends on words. 2. model implementation [cpp] class Vocabulary {public: int nWords; // total number of words char ** words; // array of all words, subscript is the ordinal number of words, including normal words and special words int nNormWords; // number of normal words int * normWordInds; number of normal words char specWordChar; // special words are identified as special words, such! 1. It indicates a special word int nSpecWords; // The number of special words int * specWordInds; // an array of the ordinal numbers of all special words int sentStartIndex; // The number of words starting with a sentence or grammar int sentEndIndex; // the number of words ending with a sentence or grammar int silIndex; // The Mute word bool fromBinFile; // The constructor Vocabulary (); vocabulary (const char * lexFName, char specWordChar _ = '\ 0', const char * sentStartWord = NULL, const char * sentEndWord = NULL, const char * silWord = NULL); virtual ~ DecVocabulary (); char * getWord (int index); // gets the word based on the given sequence number, which is obtained from the word array. Int getNumPronuns (int index); // gets the number of pronunciations of a word Based on the given sequence number. If it is not a multiphoney, 1 is returned. Bool isSpecial (int index); // whether the word corresponding to the serial number is a special word bool getIgnoreLM (int index); // whether the mark is used for Language Model Modeling is generally dependent, this increases the recognition rate. Int getIndex (const char * word, int guess =-1); // obtain the sequence number based on the word. You can specify the start position to start searching for private: int nWordsAlloc; // nWords records the number of words in the dictionary. The value records the total size of the memory dictionary. Bool * special; // indicates whether the word in the dictionary is a special word int * nPronuns; // how many different pronunciations each word corresponds to/*** adds a word to the memory dictionary, and indicates whether to update the pronunciation **/int addWord (const char * word, bool registerPronun = true) ;}; 3. Open the dictionary file of the constructor parameters during the constructor process, parameter Name: lexFName, FILE * fd. Call while (fgets (line, 1000, fd )! = NULL) read a row from fd, split it into the first domain, and call the member function addWord to add words to the dictionary. Add the start and end words to the dictionary. SpecWordChar is used to determine whether the first byte is specWordChar. Count the number of special and normal words and store them in the corresponding memory (see the class definition above ). 4. Code for adding words: [cpp] int Vocabulary: addWord (const char * word, bool registerPronun) {int cmpResult = 0, ind =-1; // allocate enough space for storage if (nWords = nWordsAlloc) {nWordsAlloc + = 100; words = (char **) realloc (words, nWordsAlloc * sizeof (char *)); nPronuns = (int *) realloc (nPronuns, nWordsAlloc * sizeof (int); for (int I = nWords; I <nWordsAlloc; I ++) {words [I] = NULL; nPronuns [I] = 0 ;}} if (word = NULL) | (word [0] = '\ 0 ')) return-1; if (nWords> 0) cmpResult = strcasecmp (words [nWords-1], word); // make sure that the new word is in the appropriate position if (cmpResult <0) | (nWords = 0) {// The new word belongs at the end of the list words [nWords] = new char [strlen (word) + 1]; nPronuns [nWords] = 0; strcpy (words [nWords], word); ind = nWords; nWords ++;} else if (cmpResult> 0) {for (int I = 0; I <nWords; I ++) {cmpResult = strcasecmp (words [I], word); if (cmpResult> 0) {nWords ++; for (int j = (nWords-1); j> I; j --) {words [j] = words [J-1]; nPronuns [j] = nPronuns [J-1];} words [I] = new char [strlen (word) + 1]; strcpy (words [I], word); nPronuns [I] = 0; ind = I; break ;} else if (cmpResult = 0) {// The word ind = I; break ;}} if (ind <0) error ("failed to add a word <0 "); if (registerPronun) {(nPronuns [ind]) ++; // pronunciation of the registered word} return ind; // return serial number}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.