Analysis of mobile phone voice interactive design

Source: Internet
Author: User
Keywords Analysis can interaction design influence

Speech recognition technology, also known as automatic speech recognition, aims to convert the lexical content in human speech into machine-readable input, such as keystrokes, binary encodings, or character sequences.

As input mode, speech recognition technology is quicker than key input and gesture input, http://www.aliyun.com/zixun/aggregation/10547.html "> Learning cost is very low, The recognition rate of the non-specific continuous speech recognition system reaches 98.73%, has reached the practical requirement, has the broad application foreground, has the voice dialing, the voice input, the Voice command, the voice search and the voice translation in the handset side application.

The technical principles of speech are more complex and can be understood from the process of voice interaction:

1. Turn on speech recognition function. Generally by the user manually click on the button to start, the mobile phone can not automatically start, such as by voice command to start or according to the level of the volume of judgment began to identify.

2. Enter the speaking interface. The program interface will visually reflect the volume change.

3. Speaking finished, the system began to analyze. There are two ways to end the input: one is automatic shutdown, usually when the word is finished, and the other is the user's phone is closed manually. The system processing process can be divided into the following steps:

A) front-end processing. The main task of the module is to remove the noise from the input signal and extract the features for acoustic model processing. Signal processing before the breakpoint detection, endpoint detection refers to the voice signal in the speech and the voice signal period to distinguish between the precise identification of the starting point of the speech signal. After endpoint detection, subsequent processing can only be performed on speech signals, which plays an important role in improving the accuracy of the model and identifying the correct rate. The main task of speech enhancement is to eliminate the influence of ambient noise on speech. At present, the common method is to use Wiener filter, which is better than other filters in the case of large noise.

b Acoustic feature extraction. The extraction of acoustic features is not only a process of large information compression, but also a process of signal unwinding, so as to make the pattern dividing device better divided. such as uploading audio will use speech codec technology, can reduce audio file size, storage space or transmission bit rate.

c) Statistical acoustic model. The acoustic characteristics of each frame are computed, such as context modeling. According to the sound mechanism, the sound can only be gradient, the first sound will affect the latter, so that the spectrum of the latter sound and other conditions of the spectrum differences, so that the model can more accurately describe the voice.

d) Pronunciation dictionaries. A pronunciation dictionary contains a vocabulary set and its pronunciation that the system can handle, similar to a thesaurus for pinyin input methods. such as input method, dictionary update hot Word and thesaurus have groups to improve the accuracy of matching.

e) language model. The language model models the language for the system, such as parsing the speech context.

Due to the limited size of the audio file, only a small number of dictionaries can be stored locally, which requires complex voice to connect to the server analysis. Google Voice search after the user entered the completion of the notification can not be networked, before starting input should check the network connection status.

4. The system analyses the output result. One is to automatically display results based on results, such as Bing search, the other is to provide options for users to choose, which is related to the probability of output results. The results of user selection have an impact on the ranking of dictionaries, enhance the adaptive and robustness of speech, and help to form personalized input.

Depending on the product's identifiable vocabulary, the user can only enter words that match the command, such as a search for a contact name, for a specific voice command. The input method has more vocabulary, and the sentence search not only needs the huge vocabulary, but also needs to distinguish the legato and the tone of the continuous speech input, and it requires the more reasonable result according to the context and the hot word output. The less restrictive the condition, the greater the difficulty of speech recognition. Because to some extent avoid fuzzy sound, the less the dictionary data, the higher the accuracy of the input of specific words.

Chinese phonetic input is different from English, English can not match the dictionary configuration words can not be recognized, Chinese vocabulary is composed of words, Chinese may be based on word recognition.

Io 5 Input method has been added to the voice function, will gradually become the general function of mobile phone input, the final output of the accuracy and operating fluency is an important measure of the quality of their interactions.

Source: http://daichuanqing.com/index.php

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.