- Sphinx-core Project is a Java project, with two examples, one is Helloword, which contains several functions, such as recording, alignment and so on. (not yet tested)
- The other is Hellongram, which is speech recognition. The parameter file that can be used has hellongram0.xml,hellongram9.xml, Hellongram1.xml. Where the language model is not used in 1.xml, but instead uses the JSFG to define the language rules of the sentence, it seems that the regular expression is used to stipulate that the sentence to be recognized is only as follows: (hello) (jim|kate|tom).
- Hellongram0.xml is an example of the Ngram language model, which defines an acoustic model, a language model, and a dictionary storage path.
- Recognize begins by load all available models, to the following stages, the load model defines the file mdef, and then allocates the pool and size according to the mean, variance, and transformation matrices respectively. Then create a pool for each sound element (Senone) (Distfloor: The lowest value, which appears to be the lowest threshold, Variancefloor: the lowest variance)
Variancepool = loaddensityfile (datalocation + variances ", Variancefloor ); Mixtureweightspool = Loadmixtureweights (datalocation +" mixture _weights " = Loadtransitionmatrices (datalocation +" Transi Tion_matrices "); Transformmatrix = Loadtransformmatrix (datalocation +" feature_t Ransform "); Senonepool = Createsenonepool (Distfloor, variancefloor);
- Current problem: When you try the demo's WSJ model catalog, the load can run successfully. While reading the model I trained, loading errors. The debug problem found that two models have the following differences
================WSG-0. XML model format:-------------------------------senone:4147 Numgausepersenone:8means:33176=4147*8variances:33176streams:1================= Male_result (my) model format:-----------------------------------senone:186Gaussianpersenone: means:1024x768varians:streams:4
Analyze the reason, whether for sphinx4 loaded model, some parameters are fixed, such as the number of streams, as well as the number of Gauss
Solution
Modify the Sphinx-config training parameters file, the semi to cont, you should note the following remarks, use Pocketsphinx time, is in semi format, sphinx3 when there is cont format, then the corresponding stream is 1, The number of Gause is 8. In this context, get the Cont model, load into sphinx4 environment, compile, ok! Run smoothly, use your own model and then test it with your own sound, the results are as follows:
Start speaking. Press Ctrl-C to Quit.resultList.size=1Bestfinaltoken=0050-6.8291255e06 0.0000000e00-1.0008177e04 lt-wordnode </s> (*sil) p 0.0-10008.177{[Great Wall][</s>]}</s>-10008.177 0.050 Great Wall 68886.47 0.04 <sil> 0.0 0.00 <s> 0.0 0.00result=<s> <sil> Great Wall </s>resultlist.size=1Bestfinaltoken=0050-6.8291255e06 0.0000000e00-1.0008177e04 lt-wordnode </s> (*sil) p 0.0-10008.177{[Great Wall][</s>]}</s>-10008.177 0.050 Great Wall 68886.47 0.04 <sil> 0.0 0.00 <s> 0.0 0.0resultlist.size=2Bestfinaltoken=0077-7.2286605e06 0.0000000e00-1.0008177e04 lt-wordnode </s> (*sil) p 0.0-10008.177{[Great Wall][</s>]}</s> 10008.177 0.059 Great Wall 68886.47 0.04 <sil> 0.0 0.00 <s> 0.0 0.01result=<s> <sil> Great Wall </s>resultlist.size=2Bestfinaltoken=0077-7.2286605e06 0.0000000e00-1.0008177e04 lt-wordnode </s> (*sil) p 0.0-10008.177{[Great Wall][</s>]}best Token=0077-7.2286605e06 0.0000000e00-1.0008177e04 lt-wordnode </s> (*sil) p 0.0-10008.177{[Great Wall][</s>]}</s> 10008.177 0.059 Great Wall 68886.47 0.04 <sil> 0.0 0.00 <s> 0.0 0.0You said: [Great Wall]start speaking. Press Ctrl-C to quit.
sphinx4 White Paper
"Sphinx" sphinx4 Study notes