Why is it so hard for the user to speak?

Source: Internet
Author: User

Lao Luo once said: "The use of speech recognition technology, whether it is Siri or its imitators, have made a mistake from the root, so it is a busy time passed, almost no one seriously use it (because it is difficult to use + to look great silly)." "Although it is somewhat extreme, but no one is serious, which means people don't want to talk. This thing is worth thinking about, speech recognition technology This is the root of the Red Crown prince developed to the present, trying to infiltrate our lives at the same time, we also found that this thing does not look like the legendary so marvellous, From the voice robot to Siri to Google Glass, each of the related products have caused great concern, but as time and understanding gradually subsided, what makes us feel huge stupid and difficult to open the Kou?

Market situation

First look at the current common voice products:

One, mobile phone field: micro-letter, voice assistant, listen to the song search

Second, the PC domain: Voice chat, foreign language teaching software, blind assistant software

Third, other set up areas: Google Glass, Car systems

Figure 1: BMW's car voice control system, the driver just press the steering wheel control button, activate the Voice aid system, through the voice can send information, phone calls and use other voice instructions.

Figure 2:google Glass.

Figure 3: The blind Reader.

Figure 4:duolingo Foreign language learning software, heard through voice exercises.

Use of customary analysis

Some of them involve identification, some do not, but from these products, you can find several interesting phenomena:

1. In the field of mobile phone, listen to music Search this subdivision area of recognition is very accurate;

2. Because of the micro-credit education, we can see more and more people in public to speak to the mobile phone, voice communication has not appeared so unnatural, user habits gradually formed;

3. Foreign language teaching and software for the blind have their own unique market, the competition barrier is high also easy to obtain the result;

4. Although other equipment areas are still emerging, they are a trend of development because of the specificity and foresight of their equipment conditions and equipping scenes.

Problems and Solutions

Through these findings, it is not difficult to see people in the use of voice of human-computer interaction problems encountered, to facilitate our design means to enhance the user's desire to speak:

Accuracy rate of recognition

Environmental noise, hardware equipment conditions, technology constraints will reduce the accuracy of recognition, and people's ability to express the difference and the breadth of people's understanding of the ability to compare, technology and people can not compare, so after a few tries, we will become very cautious words.

So from the point of view of interaction, we take the "careful" question apart to look at, there are some ways to optimize or even solve:

1. Do not know where to say--the application of the largest call to Action Voice button near the microphone, such as the iphone's microphone in the lower part of the phone, Siri's button and harmonic wave feedback is also in the lower part of the phone, the user naturally formed in the lower part of the mobile phone speech reflex

2. Do not know when to start/end speech-long press for voice input. First, the long press as a voice input method has formed user habits, not only can apply in the social software domain, but also can apply to the speech recognition scene, second, the long press for the speech entry beginning or the end is by the person to control, compares the machine judgment to be more accurate, facilitates the shielding unnecessary noise; For example: Baidu Voice assistant Android version)

3. Identify the language is not clear-for effective tips and guidance. In one case, the result of recognition is not unique, it can give the user more result suggestion by the judgement of confidence interval, or provide the part and candidate that can be modified to reduce the user's psychological frustration and reduce the cost of modification. (For example: Baidu map voice input to find the location "Peng Atlas Building" after the list of search suggestions).

Affective factors

The Man-machine conversation process can be divided into three stages: human voice input → language recognition, analysis → machine response feedback.

Considering the solution from the perspective of experience design, the first stage, especially in public, is slightly peculiar to a machine with an accent of words, from an interactive point of view, we can:

1. Provide alternative input mode-keyboard input;

2. Layered convergence--into a sufficiently vertical scene, reduce unnecessary interference factors (for example: Baidu Voice assistant "Call small yellow chicken" into the yellow chicken dialogue scene, here is a joke, and joking irrelevant everything is thrown away, play the imagination, we can "jest" replaced by any one scene);

3. Simulate existing use habits. For example, the phone to the ear of this behavior, it has enough targeted to the scene, such as the hammer system voice call function, directly to the phone to the ear name can start to call, omit the operation steps, but also to make others feel their strange psychological concerns.

In the third stage of the man-machine conversation, the machine's answer, due to the fixed tone of the machine, the lack of emotional answers, and so slightly cold, you can take the personification of the scene design or rich voice broadcast type soothing tense nerves. (For example: Weather Pass provides a variety of dialects or stars to broadcast weather conditions, increase interest)

Other

These all affect the behavior of the user to use again, there are some other points worth to start with:

1. In the selection of product positioning, whether you are from the point of view of education, or to help people with disabilities, or completely interesting exploration, can find to fight for the necessary products for home travel means;

2. In some special occasions, the body or the sight is occupied, using voice communication is also a better solution, such as when driving to provide voice on the car functions to operate, cooking reference recipes, etc.;

3. The subdivision scene carries on the special optimization, like constructs the reminder, listens to the broadcast.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.