Discussion on the basis of the speech signal processing Series II

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following briefly summarizes several basic concepts. If you want to learn more, please let me know or directly refer to the relevant literature.

Generation of a voice signal

Generally, sound is produced by vibration. Similarly, voice refers to the air in the lung to form the airflow through the channel, and then from the mouth

Nose radiation. Voice signals are composed of three main components: voiced, voiced, and cracked.

The pronunciation of the vocal cords depends on the location and status of the vocal cords and pronunciation organs (nose and nose. From the perspective of the signal system

It can be seen that the air flows form the excitation source through the glottal cords. The cavity from The glottal to the mouth and nose is a time-varying system. Of course, the language

Sound is the time-varying signal output. Only by clarifying the characteristics of the incentive source and time-varying system can we truly understand the voice signal

To conduct more in-depth research.

2. Several Concepts describing Speech Features
1. physical attributes:
1) tone: pitch, the frequency of sound vibration;
2) sound intensity: volume, sound vibration strength;
3) Sound Length: the length of the sound;
4) tone: sound quality, sound content and characteristics, and vocal cords vibration frequency, excitation source and sound channel shape, etc.

Off.
2. Basic Unit
1) the most basic unit is phoneme, which can be voiced or voiced.
2) The minimum unit of pronunciation is a syllable consisting of phoneme. Syllables = vowels + consonants, but no syllables = voiced + clear

This is because they are not expressed in a field. One is a linguistic structure, and the other is a speech composition,

In addition, the consonants are divided into clear and Voiced Consonants. the vowels and Voiced Consonants indicate the vocal cords vibration, while the vocal cords do not vibrate.
3) Chinese speech = initials + finals + tones
3. Common Vibration Characteristics
Resonance occurs when the vibration frequency is consistent with the inherent frequency of the system. A sound channel has some resonance characteristics.

The cavity, which can be resonant with the speech at multiple frequencies. These resonant positions are called resonance peaks, which generate voice signals.

Has a huge impact.

4. Masking Effect
Starting from the perception characteristics of human ears, it is a psychological acoustic phenomenon and will be detailed later.

Relationship between voice signals and audio signals
The frequency range of the voice signal is 200 ~ Around Hz, people can hear the audio signal range is 20 ~ 20 KHz, apparently voice

The signal belongs to the audio signal. Why do we emphasize the study of the voice signal?
1. The processing objects of voice signals and audio signals are different. The main objects of voice signal are voice-oriented.

It takes all sounds in nature as the research object;
2. different research methods: the voice signal is mainly based on the human voice mechanism, establishing a voice system model and analyzing the system features

However, there are too many sources of audio signal, so it is based on the human's auditory characteristics, establishing a human ear system model and analyzing

System features.
3. Voice signals have more practical research and application values.

4. common technologies of speech signal processing
1. Time Domain Analysis
By dividing the voice signal into frames, the converted time-varying signal remains unchanged for processing.
1) Short-term energy
2) Short-term average zero-crossing Rate
3) Short-term auto-correlation calculation
2. Frequency Domain Analysis
1) Fourier Transform (FFT)
2) Filter Bank)
3) Mel frequency cepstrum analysis based on Auditory Characteristics
4) Cepstrum Analysis Based on Linear Prediction (LPC)
3. Two key parameters
1) Pitch)
2) Linear Prediction coefficient (LPC)

5. Common Software for Speech Signal Processing
1. Matlab
2. Cool Edit
......

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Discussion on the basis of the speech signal processing Series II

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Discussion on the basis of the speech signal processing Series II

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support