The following briefly summarizes several basic concepts. If you want to learn more, please let me know or directly refer to the relevant literature.
Generation of a voice signal
Generally, sound is produced by vibration. Similarly, voice refers to the air in the lung to form the airflow through the channel, and then from the mouth
Nose radiation. Voice signals are composed of three main components: voiced, voiced, and cracked.
The pronunciation of the vocal cords depends on the location and status of the vocal cords and pronunciation organs (nose and nose. From the perspective of the signal system
It can be seen that the air flows form the excitation source through the glottal cords. The cavity from The glottal to the mouth and nose is a time-varying system. Of course, the language
Sound is the time-varying signal output. Only by clarifying the characteristics of the incentive source and time-varying system can we truly understand the voice signal
To conduct more in-depth research.
2. Several Concepts describing Speech Features
1. physical attributes:
1) tone: pitch, the frequency of sound vibration;
2) sound intensity: volume, sound vibration strength;
3) Sound Length: the length of the sound;
4) tone: sound quality, sound content and characteristics, and vocal cords vibration frequency, excitation source and sound channel shape, etc.
Off.
2. Basic Unit
1) the most basic unit is phoneme, which can be voiced or voiced.
2) The minimum unit of pronunciation is a syllable consisting of phoneme. Syllables = vowels + consonants, but no syllables = voiced + clear
This is because they are not expressed in a field. One is a linguistic structure, and the other is a speech composition,
In addition, the consonants are divided into clear and Voiced Consonants. the vowels and Voiced Consonants indicate the vocal cords vibration, while the vocal cords do not vibrate.
3) Chinese speech = initials + finals + tones
3. Common Vibration Characteristics
Resonance occurs when the vibration frequency is consistent with the inherent frequency of the system. A sound channel has some resonance characteristics.
The cavity, which can be resonant with the speech at multiple frequencies. These resonant positions are called resonance peaks, which generate voice signals.
Has a huge impact.
4. Masking Effect
Starting from the perception characteristics of human ears, it is a psychological acoustic phenomenon and will be detailed later.
Relationship between voice signals and audio signals
The frequency range of the voice signal is 200 ~ Around Hz, people can hear the audio signal range is 20 ~ 20 KHz, apparently voice
The signal belongs to the audio signal. Why do we emphasize the study of the voice signal?
1. The processing objects of voice signals and audio signals are different. The main objects of voice signal are voice-oriented.
It takes all sounds in nature as the research object;
2. different research methods: the voice signal is mainly based on the human voice mechanism, establishing a voice system model and analyzing the system features
However, there are too many sources of audio signal, so it is based on the human's auditory characteristics, establishing a human ear system model and analyzing
System features.
3. Voice signals have more practical research and application values.
4. common technologies of speech signal processing
1. Time Domain Analysis
By dividing the voice signal into frames, the converted time-varying signal remains unchanged for processing.
1) Short-term energy
2) Short-term average zero-crossing Rate
3) Short-term auto-correlation calculation
2. Frequency Domain Analysis
1) Fourier Transform (FFT)
2) Filter Bank)
3) Mel frequency cepstrum analysis based on Auditory Characteristics
4) Cepstrum Analysis Based on Linear Prediction (LPC)
3. Two key parameters
1) Pitch)
2) Linear Prediction coefficient (LPC)
5. Common Software for Speech Signal Processing
1. Matlab
2. Cool Edit
......