How to use speech recognition in a simple way?

Last Update:2015-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Speech recognition is not a fresh topic, but in the last two years it has been discovered that there has been a sudden influx of applications. such as Siri on the iphone, search engines and shopping sites for voice searches. Recent large-scale applications may be rooted in the innovation of mobile Internet, the popularity of smartphones and the hardware base provided by network connectivity (such as 3G/4G) is an important precondition.

Currently on the market applications generally take voice upload, server-side analysis of the way. including Google and Apple. And their recognition efficiency and accuracy are impressive indeed. I have tried Google Speech API, the recognition rate is equally amazing (https://www.google.com/speech-api/v2/recognize?).

Server-side parsing has some drawbacks. Some time ago a TV in the description told the user: all your voice will be uploaded to the cloud. It scares some users. But at this stage this is the only possible way, because high-quality speech recognition does require a lot of storage space and computing power.

CMU Sphinx is the only successful open source speech recognition Project I know of. Kai-Fu Lee is the author of the first edition. When I first downloaded CMU sphinx, I was very disappointed with its recognition accuracy. After a few searches I realized that speech recognition is very complex. It involves voice models, language models, and so on. Voice model requires hundreds of people, dozens of hours of voice information per person to produce a more general voice library. And the language model is also very complex, with the increase of vocabulary, to guess the voice corresponding to the difficulty of the text will also increase rapidly. In other words, it is very difficult to identify different people and to recognize a large number of words at the same time. This also involves participle, as well as context. Not to mention that we usually talk mixed in English. Google and Apple's systems are likely to use several T-even larger storage spaces to store speech models and language models. and to have the corresponding calculation ability fast or real-time calculation.

Therefore, it is unrealistic to realize high-quality large vocabulary speech recognition on the mobile terminal at this stage. It may be possible to wait until our phone has several T storage spaces. By then, the whole field of AI might have had a huge impact on our lives.

But this does not mean that CMU Sphinx has no practical use. The application described above is full vocabulary recognition, but there are occasions where there is no need for such a large vocabulary. For example, we use Raspberry Pi to make a small car, as long as support "forward", "back", "left", "right", "stop" and so on a few words can be. At this time the language model is greatly simplified, and the recognition rate increases correspondingly. Voice Model Sphinx Some of the band, in addition to training for their own voice model, Sphinx have tools to train, as long as the screen of the word recording can be.

CMU Sphinx offers an online simple language model generation tool (http://www.speech.cs.cmu.edu/tools/lmtool-new.html) that can help you generate language models.

Detailed instructions on how to use the CMU Sphinx to get started on the official website. Java programmer Look at this: (http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4)

Recently, the CMU Spinx has an article on the status of offline identification, with detailed instructions on the status of offline use.

http://cmusphinx.sourceforge.net/2015/02/current-state-of-offline-speech-recognition-and-smarttv/

How to use speech recognition in a simple way?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to use speech recognition in a simple way?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to use speech recognition in a simple way?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support