Open source speech recognition software HTK, Julius and Kaldi and other systems know better
I. Installation by Source Tarball
Installation process is simple and contains the following points:
1. Download the newest source tarbal from Julius official Site–link
2. Unpack the archive For example to your user home directory
3. Configure and install Julius by following commands:
<span style= "Color:rgb (122, 8, 116); ><strong>cd</strong></span> ~<span style= "font-weight:bold;" >/</span>julius-4.2.2<span style= "Font-weight:bold;" >/</span> <span style= "Color:rgb (194, 12, 185); ><strong>sudo</strong></span>. <span style= "Font-weight:bold;" >/</span>configure <span style= "Color:rgb (194, 12, 185); ><strong>sudo</strong></span> <span style= "Color:rgb (194, 12, 185); ><strong>make</strong></span> <span style= "Color:rgb (194, 12, 185); ><strong>sudo</strong></span> <span style= "Color:rgb (194, 12, 185); ><strong>make</strong></span> <span style= "Color:rgb (194, 12, 185); ><strong>install</strong></span> |
Two. Installation using Apt-get in Ubuntu do not install with sudo apt-get install Julius this way, install the old version, there will be some problems.
4. Try it by typing on command line:
julius-4.2.1 <span style= "font-weight:bold;" >//</span>output Julius rev.4.2.2-based on juliuslib rev.4.2.2 <span style= "Color:rgb (122, 8, 116);" ><strong> (</strong></span>fast<span style= "Color:rgb (122, 8, 116);" ><strong>) </strong></span> built <span style= "font-weight:bold;" >for</span> i686-pc-linux Copyright <span style= "Color:rgb (122, 8, 116); ><strong> (</strong></span>c<span style= "Color:rgb (122, 8, 116);" ><strong>) </strong></span> 1991-2012 Kawahara Lab, Kyoto University Copyright <span style= " Color:rgb (122, 8, 116); " ><strong> (</strong></span>c<span style= "Color:rgb (122, 8, 116);" ><strong>) </strong></span> 1997-2000 information-technology Promotion Agency, Japan Copyright <span style= "Color:rgb (122, 8, 116); ><strong> (</strong></span>c<span style= "Color:rgb (122, 8, 116);"><strong>) </strong></span> 2000-2005 Shikano Lab., Nara Institute of science and technology Copyright <span style= "Color:rgb (122, 8, 116); ><strong> (</strong></span>c<span style= "Color:rgb (122, 8, 116);" ><strong>) </strong></span> 2005-2012 Julius project team, Nagoya Institute of technology Try <SP An style= "Color:rgb (255, 0, 0); > '-setting ' </span> <span style= "font-weight:bold;"
>for</span> built-in engine configuration. Try <span style= "Color:rgb (255, 0, 0); > '-help ' </span> <span style= "font-weight:bold;" >for</span> run <span style= "Font-weight:bold;" >time</span> options. |
5. Last additional thing needed to run Julius smoothly are a julius-voxforge package which can being installed via Apt-get by Typing in command line:
<span style= "Color:rgb (194, 12, 185); ><strong>sudo</strong></span> <span style= "Color:rgb (194, 12, 185); ><strong>apt-get install</strong></span> Julius-voxforge |
That's all now you can start to configure it, because it's not coming configured just out of the box. I'll write a post in next few days about the basic configuration to which link'll be is added here.
Precautions:
The commands in 1.readme have changed:
sudo mkdfa.pl sample
Export Tmp=/tmp
Using Julius to build a speech recognition engine here is the dictation program, which can be used for a continuous speech recognition, and mainly for Chinese:
I. Basic structure of the speech recognition engine
Basic all the open source speech recognition engine is the following structure, including: Sphinx,julius, and so on, the following mainly take Julius as an example, explain the relevant content:
1.Acoustic model (Acoustic model): Used for the identification of phonemes.
The technology used:
1) HMM (Hiden Mokov Model):
2) GMM (Gaussian Mixture Model):
3) Dfa&nfa:
Build your own acoustic model:
The tools you can use:
2.Phoneme Dictionary (pronunciation dictionary): used for the recognition of words.
Build Your own pronunciation dictionary:
The tools you can use:
3.Language model: Used for recognition of statements.
Build Your own pronunciation dictionary:
The tools you can use:
The above three 3=>2=>1 in turn into a dependency, as long as the above three are configured, a speech recognition engine is set up, the other is to use the development package to provide APIs for custom development.
Two. Specific operation
Taking sample of Julius-voxforge as an example, the use of Julius mainly involves three files:
1.sample.grammar: Language grammar.
2. Sample.voca: Pronunciation dictionary.
3. julian.jconf: The primary configuration file for the speech recognition engine. Here you specify the three main elements of face reading.
As for other documents:
Sample.dfa
Sample.dict
Sample.term
are automatically generated---mkdfa.pl sample by command.