Python End-level tutorial! Speech recognition! Seniors achieve speech recognition skills! It's not hanging.

Source: Internet
Author: User

An overview of how ▌ language recognition works

Speech recognition originated from the research done at Bell Labs in the early the 1950s. The early speech recognition system can only identify individual speakers and only about more than 10 words in the vocabulary. Modern speech recognition systems have made great strides in identifying multiple speakers and having a large vocabulary that identifies multiple languages.

▌ Select Python Speech Recognition Package

There are some ready-made speech recognition packages in PyPI. These include:

Apiai

Google-cloud-speech

Pocketsphinx

Speechrcognition

Watson-developer-cloud

Wit

$ pip Install SpeechRecognition

After the installation is complete, open the Interpreter window and enter the following to verify the installation:

Only Recognition_sphinx () in the above seven can work offline with the CMU Sphinx engine, and the other six need to be connected to the Internet.

SpeechRecognition comes with the Google Web Speech API's default API key, which you can use directly. The other six APIs require authentication using either an API key or a username/password combination, so this article uses the Web Speech API.

▌ use of audio files

You first need to download the audio file link in the directory where the Python interpreter session resides.

The AudioFile class can be initialized with the path to the audio file and provide a context manager interface for reading and processing the contents of the file.

The context Manager opens the file and reads the contents of the file, stores the data in the AudioFile instance, and records the data from the entire file to the Audiodata instance through the record (), which can be confirmed by checking the audio type:

When you call the record () command in a with block, the file stream moves forward. This means that if you record four seconds before recording for four seconds, the first four seconds will return the second four seconds of audio.

This program starts from the beginning of 4.7 seconds, so that the phrase "it takes heat to bring out the odor", the "it T" is not recorded, at this time the API only get "akes heat" this input, and the match is "mesqu Ite "this result.

Similarly, at the end of the recording phrase "A cold dip restores health and zest" The API captures only "a co" and is incorrectly matched to "Aiko".

So how do we deal with this problem? You can try calling the Adjust_for_ambient_noise () command of the Recognizer class.

Now we have the phrase "the", but now there are some new problems-sometimes because the signal is too noisy to eliminate the effects of noise.

If you encounter these problems frequently, you need to do some preprocessing of the audio. This preprocessing can be done through audio editing software, or by applying a filter to a file's Python package (for example, scipy). When dealing with noisy files, you can improve accuracy by looking at the actual API response. Most APIs return a JSON string that contains multiple possible transcripts, but the Recognition_google () method always returns only the most likely transcription characters without forcing a full response to be required.

Use of the ▌ microphone

To access the microphone using SpeechRecognizer, you must install the Pyaudio package, close the current interpreter window, and do the following:

Installing Pyaudio

The process of installing pyaudio will vary depending on the operating system.

Installation test

After installing the Pyaudio, you can test the installation from the console.

$ python-m Speech_recognition

Make sure that the default microphone is turned on and Unmute, and you should see something like the following if the installation is OK:

A moment of silence, please ...

Set minimum energy threshold to 600.4452854381937

Say something!

Speak to the microphone and see how speechrecognition transcribe your speech.

Microphone class

Please open another interpreter session and create an example of a class that knows a different type of device.

>>> Import Speech_recognition as SR

>>> r = Sr. Recognizer ()

Instead of using an audio file as the source, the default system microphone is used. The reader can access it by creating an instance of the microphone class.

>>> mic = Sr. Microphone ()

To handle ambient noise, you can call the Adjust_for_ambient_noise () function of the recognizer class, as it does with noisy audio files. Because the microphone input sounds are less predictable than audio files, you can use this procedure to process any time you listen to the microphone input.

I have a public number, and I often share some of the stuff about Python technology. If you like my share, you can use the search "Python language learning" to follow

Welcome to join thousands of people to exchange questions and answers skirt: 699+749+852


Python End-level tutorial! Speech recognition! Seniors achieve speech recognition skills! It's not hanging.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.