Jasper Voice Assistant Introduction

Last Update:2018-01-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction

Jasper is an open source voice control assistant based on Raspberry Pi

Jasper working principle is the device passive monitoring microphone, when the wake-up keyword to enter the active listening mode, the Voice command received after the speech recognition, and then the resulting text to parse and process the semantic content, and then the processing results through speech synthesis and output to the user.

The technologies involved include recording and playback of sound; Speech recognition (ASR/STT); Semantic content (NLU/NLP); Speech synthesis (TTS)

2. Audio System 2.1 Hardware

The audio system's hardware is the sound card, and the sound card uses the DAC (digital-to-analog conversion) and ADC (Analog-to-digital conversion) for audio input and output.

Here are the commands for viewing your sound card device under Linux

$ Lspci | grep-i audio 
 00:05.0 audio Device:intel Corporation 82801fb/fbm/fr/fw/frw (ICH6 Family) Hi GH Definition Audio Controller (rev.) 
 
 $ cat/proc/asound/cards 
 0 [intel          ]: Hda-intel-hda Intel 
                        HDA Intel at 0xf0804000 IRQ

2.2 Software

In Linux, the audio system is structured as follows

In Desktop Linux systems, audio systems typically contain drive layers , service tiers (sound servers), and application tiers .
In an embedded system, the audio system consists of only the drive layer and the application layer .

Under Linux There are two sets of audio drive systems, namely OSS and ALSA.

2.2.1 OSS

OSS (Open Sound system) is a unified voice architecture on Unix-like and POSIX-compatible systems, and applications that are compatible with OSS APIs can be easily ported
OSS API mainly provides the following device file interface

/dev/mixer: Mixing and control
/DEV/DSP: Audio input and output

Most Linux systems today do not provide OSS drivers and use ALSA, and are not detailed here.

More information refer to <oss--cross-platform audio interface Introduction >

2.2.2 ALSA

ALSA (Advanced Linux sound Architecture) is a successor to OSS and has become the mainstream audio architecture for Linux
ALSA includes drivers , libraries , and toolkits

alsa-driver: Drive part, integrated in the kernel, mostly in the form of a module

The drive can be divided into the following three layers

-Hardware control layer on the ground floor: Responsible for hardware manipulation access, which is the main part of the sound card driver to be implemented by manufacturers
-Middle tier core layer: A variety of functional audio device components, to provide users with a number of predefined components (such as PCM, AC97, sequencer and controller, etc.), in addition to the user can also define their own device components
-Sound Card Object Description layer: It is an abstract description of the sound card hardware, the kernel through these descriptions can be learned that the sound card hardware functions, device components and operating methods, etc.

The driver provides the following abstract interface for user space

/proc/asound: Information interface
/DEV/SND/CONTROLCX: Control Interface
/DEV/SND/MIXERCXDX: Mixer Interface
/DEV/SND/PCMCXDX:PCM interface
/dev/snd/midicxdx:midi interface
/DEV/SND/SEQ: Sequencer Interface
/dev/snd/timer: Timer interface

alsa-lib: The user space function library, which encapsulates the abstract interface provided by the driver, provides the API to the application through the file libasound.so

alsa-utils: Utility kit to implement tools such as playing audio (Aplay), recording (Arecord) by calling Alsa-lib

2.2.3 Service Layer

The sound server is between ALSA and the application, when the application calls the sound server's API to play the sound, while the audio data is sent to the sound server, the sound server mixes more than one playback request and sends it to the underlying sound card driver (Alsa/oss). The ALSA or OSS drives the sound card to play the data after the remix.

ESD (Enlightened sound Daemon, or esound): a voice server that is a GNOME desktop environment that has been replaced by PulseAudio.
ARts (Analog real-time synthesizer): a voice server in the KDE desktop environment that has been replaced by phonon.
PulseAudio: A new generation of sound servers that provide better sound and are currently the default sound server for GNOME desktops.

2.2.4 Other software

In addition to sound servers, there are some common middle-tier/sound libraries
JACK Audio Connection Kit: A professional sound server that provides real-time, low-latency connectivity for audio and MIDI data between applications.
GStreamer: is an open-source multimedia framework used to build streaming media applications.
Phonon: QT cross-platform multimedia framework.
Portaudio: cross-platform open source audio I/O library requires alsa-lib support in Linux.

3. Speech recognition

Speech recognition (Speech recognition) technology, known as automatic speech recognition, ASR (Automatic Speech recognition);
Also called voice-to-text, Speech to text (STT)

4. Speech synthesis

TTS (Text to Speech), currently open source solutions and proprietary solutions

Open Source solutions include
-Espeak/espeakng, Ekho, Festival/festvox/flite

Proprietary solutions include
-Google TTS, Amazon Polly, Neospeech TTS, and more
-Baidu TTS, Ali TTS, Iflytek TTS, etc.

4.1 ESpeak

Espeak support Chinese output, but the effect is really not flattering

"I am Chinese, I love China"

4.2 Ekho

Ekho is a free open source and multilingual text to speech software
It supports Cantonese, Mandarin, etc.

4.2 Festival

Festival, a general-purpose multilingual speech synthesis system developed for the University of Edinburgh CStr
Festvox software developed for the CMU to build synthetic sound
Flite (Festival-lite) Festival-based Lite speech synthesis system developed for CMU
Simply put, festival and Flite can perform TTS conversions and then output using the sounds provided by the Festvox.

Festival
# Yum Install Festival
$ echo "Hello, you are using festival" | Festival–tts
$ Festival--tts myfile
Flite

# yum Install Flite
$ flite "Hello, you are using Flite" A.wav
$ aplay A.wav

Festival effect is slightly better than espeak, but does not support Chinese

Reference:
<alsa opensrc org>
<linux ALSA sound card Driver development Best Practices >

Jasper Voice Assistant Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Jasper Voice Assistant Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Jasper Voice Assistant Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support