I. Basic Knowledge
The frequency of speech is generally Hz ~ 3400Hz, but the auditory frequency of human ears is generally 20Hz ~ 20000Hz.
Ii. Sampling Rate
In reality, the voice signal sent by a person is a analog signal. to process it in reality, it must be a digital signal, that is, a sampling, quantization, and encoding processing scheme is adopted. The first step is sampling, that is, modulus conversion. To put it simply, we use the waveform Sampling Method to record how many pieces of data are needed for a 1-second sound. According to the nekuister sampling theorem, the waveform can be completely restored by sampling twice the frequency of a sine wave. Therefore, for sound signals, to restore discrete signals, the sampling frequency must be set to 40 kHz or above. In practice, it is generally set to 44.1 kHz. The sound at a 44.1khz sampling rate takes 44000 pieces of data to describe the sound waveform for one second. In principle, the higher the sampling rate, the better the sound quality. The sampling frequency is generally divided into three levels: 22.05 kHz, 44.1 kHz, and 48 khz. 22.05
KHz can only achieve the sound quality of fmbroadcast, 44.1khz is the theoretical CD sound quality limit, 48 khz has reached the DVD sound quality.
Iii. Bit Rate
Audio signals must be encoded. Here, encoding refers to source encoding, that is, data compression. If direct quantization is used for transmission without data compression, it is called PCM (pulse encoding Modulation ). It is easy to calculate the bit rate of a pcm audio stream. The sampling rate value × the sample size Value × the number of audio channels bps. A pcm-encoded WAV file with a sampling rate of 44.1 kHz and a sampling size of 16 bit. The data rate is 44.1k × 16 × 2 = 1411.2 kbps. We often say that the 1411.2 kb MP3 file corresponds to the wav parameter, which is kbps. this parameter is also called data bandwidth, and it is a concept of bandwidth in ADSL. Divide the bitrate by 8 to get the data rate of this wav, that is, 176.4kb/s. This indicates that the sampling rate for one second is 44.1 kHz, the sampling size is 16 bit, And the PCM-encoded audio signal in two channels requires a space of 176.4kb, which is about 10.34 m in 1 minute, this is unacceptable for most users, especially those who like to listen to music on their computers. To reduce disk usage, there are only two ways to reduce sampling metrics or compression. Reducing indicators is not desirable, so experts have developed various compression solutions. The most primitive types are DPCM and ADPCM, among which MP3 is the most famous. Therefore, the bit rate after data compression is much smaller than the original bit rate.
Iv. Summary
For human voice signals, the actual processing generally goes through the following steps:
Talking with people --> acoustic/electrical conversion --> sampling (modulus conversion) --> quantization (representing digital signals with appropriate values) --> encoding (data compression) -->
Transmission (network or other methods)
--> Decoding (data restoration) --> Anti-sampling (digital-to-analog conversion) --> electro-acoustic Conversion --> listening.