Brief introduction
Many sophisticated software projects have been using generic text configuration and resource files for many years, but no major problems have arisen. As the project expands and complexity increases, so does the need for higher rigor and greater adaptability. With XML and XML applications that use specific standards, you can benefit from cross project and Cross-platform compatibility, robustness, and scalability in areas such as Unicode.
Common abbreviations
HTK: Hidden Markov Model Toolkit (Hidden Markov models Toolkit)
PLS: Pronunciation Vocabulary Specification (Pronunciation Lexicon specification)
XML: Extensible Markup Language (extensilble Markup Language)
You can also improve flexibility and reliability by translating plain text files into relevant open source standards. A good example of this is the dictionary of speech recognition work. Regardless of whether your open source project turns to XML-formatted resource files, you can use XML standards in your work without losing functionality.
In this article, we'll learn how to easily convert between plain text and pronunciation Lexicon specification (PLS) format. Several examples show how to store the custom dictionaries in PLS format and extract the data into the normal file that you want.
Example: Dictionary
A dictionary is a list of words used in speech recognition tools. They contain information about how to print or display a word in a graphic, and how it uses phonemes to pronounce. Dictionaries that are often used with Hidden Markov Model Toolkit (HTK) are widely used in speech control projects. Listing 1 is an excerpt from a voxforge HTK dictionary.
Listing 1. Listing 1 comes from an excerpt from a voxforge HTK dictionary.
AGENCY [AGENCY] ey JH ih n s iy
AGENDA [AGENDA] ax JH eh n d ax
AGENT [agent] ey JH IH n t
agents [AGENTS] ey JH ih n T s
ager [ager] ey g er
AGES [AGES] ey JH IH z
The file in Listing 1 contains three tab-delimited fields:
General description of the label of the word
The square brackets around the word when you want to print or display a word on the screen (word element)
A series of single, space-delimited phonemes from the Arpabet set (see Resources) that describe the pronunciation of words
In the above example, English pronunciation is mostly included in the American Standard Code for Information Interchange (ASCII) character.
The CMU Sphinx Project stores dictionaries (or dictionaries) in a similar manner in the CMU Sphinx context. Listing 2 gives an excerpt.
Listing 2. Excerpt from a CMU Sphinx dictionary
Agency EY JH ah \ S IY
Agenda Ah JH eh n d ah
agendas ah JH eh n d ah Z
agent EY JH Ah n t
agents EY JH AH N T S
ager EY JH ER
In Listing 2, there are only two fields: Word/character and its phonemes. The two dictionary examples have some nuances:
Words and phonemes are completely different types.
Sounds have some subtle differences.
There are slightly different ways to treat punctuation (commas and exclamation marks, and so on).
You can see the entire dictionary in the Cmu07a.dic file in the currently downloaded Pocketsphinx.
Because the dictionary gives you the pronunciation of a particular word, you may need to edit the file to fit a particular person or dialect. Over time, you can build your knowledge assets in your custom dictionaries. Using a text editor makes it easy to edit plain files, but it is also easy to introduce errors, such as using delimiters other than file standards, inserting non-ASCII characters, placing fields in the wrong order, improperly sorting fields, missing square brackets where needed, and so on.
There is a shortage of ordinary documents. When you build a custom file, it is always incompatible with other speech items. A dictionary of standard XML formats (such as PLS), once identified by two items, is instantly compatible with each other.