Jasper Voice Assistant Introduction

Source: Internet
Author: User

1. Introduction

Jasper is an open source voice control assistant based on Raspberry Pi

Jasper working principle is the device passive monitoring microphone, when the wake-up keyword to enter the active listening mode, the Voice command received after the speech recognition, and then the resulting text to parse and process the semantic content, and then the processing results through speech synthesis and output to the user.

The technologies involved include recording and playback of sound; Speech recognition (ASR/STT); Semantic content (NLU/NLP); Speech synthesis (TTS)

2. Audio System 2.1 Hardware

The audio system's hardware is the sound card, and the sound card uses the DAC (digital-to-analog conversion) and ADC (Analog-to-digital conversion) for audio input and output.

Here are the commands for viewing your sound card device under Linux

$ Lspci | grep-i audio 
00:05.0 audio Device:intel Corporation 82801fb/fbm/fr/fw/frw (ICH6 Family) Hi GH Definition Audio Controller (rev.)

$ cat/proc/asound/cards
0 [intel          ]: Hda-intel-hda Intel
                       HDA Intel at 0xf0804000 IRQ
2.2 Software

In Linux, the audio system is structured as follows

In Desktop Linux systems, audio systems typically contain drive layers , service tiers (sound servers), and application tiers .
In an embedded system, the audio system consists of only the drive layer and the application layer .

Under Linux There are two sets of audio drive systems, namely OSS and ALSA.

2.2.1 OSS

OSS (Open Sound system) is a unified voice architecture on Unix-like and POSIX-compatible systems, and applications that are compatible with OSS APIs can be easily ported
OSS API mainly provides the following device file interface

/dev/mixer: Mixing and control
/DEV/DSP: Audio input and output

Most Linux systems today do not provide OSS drivers and use ALSA, and are not detailed here.

More information refer to <oss--cross-platform audio interface Introduction >

2.2.2 ALSA

ALSA (Advanced Linux sound Architecture) is a successor to OSS and has become the mainstream audio architecture for Linux
ALSA includes drivers , libraries , and toolkits

alsa-driver: Drive part, integrated in the kernel, mostly in the form of a module

The drive can be divided into the following three layers

-Hardware control layer on the ground floor: Responsible for hardware manipulation access, which is the main part of the sound card driver to be implemented by manufacturers
-Middle tier core layer: A variety of functional audio device components, to provide users with a number of predefined components (such as PCM, AC97, sequencer and controller, etc.), in addition to the user can also define their own device components
-Sound Card Object Description layer: It is an abstract description of the sound card hardware, the kernel through these descriptions can be learned that the sound card hardware functions, device components and operating methods, etc.

The driver provides the following abstract interface for user space

/proc/asound: Information interface
/DEV/SND/CONTROLCX: Control Interface
/DEV/SND/MIXERCXDX: Mixer Interface
/DEV/SND/PCMCXDX:PCM interface
/dev/snd/midicxdx:midi interface
/DEV/SND/SEQ: Sequencer Interface
/dev/snd/timer: Timer interface

alsa-lib: The user space function library, which encapsulates the abstract interface provided by the driver, provides the API to the application through the file libasound.so

alsa-utils: Utility kit to implement tools such as playing audio (Aplay), recording (Arecord) by calling Alsa-lib

2.2.3 Service Layer

The sound server is between ALSA and the application, when the application calls the sound server's API to play the sound, while the audio data is sent to the sound server, the sound server mixes more than one playback request and sends it to the underlying sound card driver (Alsa/oss). The ALSA or OSS drives the sound card to play the data after the remix.

ESD (Enlightened sound Daemon, or esound): a voice server that is a GNOME desktop environment that has been replaced by PulseAudio.
ARts (Analog real-time synthesizer): a voice server in the KDE desktop environment that has been replaced by phonon.
PulseAudio: A new generation of sound servers that provide better sound and are currently the default sound server for GNOME desktops.

2.2.4 Other software

In addition to sound servers, there are some common middle-tier/sound libraries
JACK Audio Connection Kit: A professional sound server that provides real-time, low-latency connectivity for audio and MIDI data between applications.
GStreamer: is an open-source multimedia framework used to build streaming media applications.
Phonon: QT cross-platform multimedia framework.
Portaudio: cross-platform open source audio I/O library requires alsa-lib support in Linux.

3. Speech recognition

Speech recognition (Speech recognition) technology, known as automatic speech recognition, ASR (Automatic Speech recognition);
Also called voice-to-text, Speech to text (STT)

4. Speech synthesis

TTS (Text to Speech), currently open source solutions and proprietary solutions

Open Source solutions include
-Espeak/espeakng, Ekho, Festival/festvox/flite

Proprietary solutions include
-Google TTS, Amazon Polly, Neospeech TTS, and more
-Baidu TTS, Ali TTS, Iflytek TTS, etc.

4.1 ESpeak

Espeak support Chinese output, but the effect is really not flattering

"I am Chinese, I love China"
4.2 Ekho

Ekho is a free open source and multilingual text to speech software
It supports Cantonese, Mandarin, etc.

4.2 Festival

Festival, a general-purpose multilingual speech synthesis system developed for the University of Edinburgh CStr
Festvox software developed for the CMU to build synthetic sound
Flite (Festival-lite) Festival-based Lite speech synthesis system developed for CMU
Simply put, festival and Flite can perform TTS conversions and then output using the sounds provided by the Festvox.

Festival
# Yum Install Festival
$ echo "Hello, you are using festival" | Festival–tts
$ Festival--tts myfile
Flite

# yum Install Flite
$ flite "Hello, you are using Flite" A.wav
$ aplay A.wav

Festival effect is slightly better than espeak, but does not support Chinese

Reference:
<alsa opensrc org>
<linux ALSA sound card Driver development Best Practices >

Jasper Voice Assistant Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.