1.1. Basic concepts:
1.2. Multimedia system:
1.3. Multimedia data compression and encoding:
(1). Entropy Coding: Lossless data compression technology without regard to data sources. The core idea is to assign the appropriate length code to the symbol by the probability size of the sign, and assign it a short length ( less ascended number) code to the commonly used symbol , and assign it a long length to the less commonly used symbol (the accession number is more ) code. The most common entropy coding technique is the Hough Code and the arithmetic code
(2). source code : Data compression technology that considers data source characteristics. The encoding takes into account the characteristics of the signal source and the contents of the signal, so it is also called semantic-based coding Span style= "font-family: ' Times ';" > (semantic-based coding) "
(3). Mixed Coding: Data lossy compression technology combining source coding and entropy coding. This encoding is almost always used in film, image and sound media, such as JPEG,mpeg-video and mpeg-audio.
2. Digital Sound Coding
2.1 Digital Sound Signal
(1). Frequency of Sound:
(2). Sampling--Quantization--code:
Sampling frequency: The Nyquist theory states that the sampling frequency should not be less than twice times the maximum frequency of the sound signal, so that the voice can be restored to the original sound as a digital expression, which is called lossless digitization (lossless digitization)
Sampling accuracy: Number of bits per sound sample (bps)
Amount of data (Bytes/sec) = (sampling frequency (Hz) * Sample bits (BIT) * Channels)/8 ,
2.2. Storage format for sound files:
2.3. Voice Code compiler
(1). Waveform Compiler Code:
Attempting to produce a refactoring signal without any knowledge of generating a voice signal, its waveform is as consistent as the original voice waveform. Generally speaking, this kind of compiler code is low in complexity, data rate is above kb/s, the quality is quite high. Below this data rate, the sound quality drops sharply. The simplest waveform encoding is pulse-coded modulation (Pulser code MODULATION,PCM), which simply samples and quantifies the input signal.
(2). Audio Source code compiler
The idea of the source code is to attempt to extract the generated voice parameters from the voice waveform signal, and use these parameters to reconstruct the voice through the Voice generation model. The audio source code for voice is called the Sound Code device (vocoder). In the Voice generation model, the channel is equivalent to a filter that changes over time, which is called the variant filter (time-varying filter), which is stimulated by the white noise-silent voice segment, or by the burst-sound voice segment. So the information that needs to be transmitted to the decoder is the specification of the filter, the audible or non-audible sign and the syllable period of the audible voice, and is updated every 10~20 Ms. The model parameters of the sound code can be determined by both the time domain method and the frequency domain method, and this task is done by the encoder. The data rate of this kind of sound code device is around 2.4 kb/s, although the voice can be understood, but its quality is much lower than that of natural voice. Increasing the data rate does not help to improve the quality of synthesized voice, because it is limited by the voice generation model. Although its sound quality is relatively low, but its confidentiality performance is good, so the compiler has been used in military.
(3). Mixed Code compiler
the idea of a hybrid compilation code is to attempt to fill the gap between the waveform compilation code and the source code. Although the waveform compiler can provide high quality of voice, but the data rate is lower than kb/s, the problem of sound quality has not been solved technically; the data rate of the sound encoder can be reduced to 2.4 kb/s or even lower, but its sound quality simply cannot be compared with natural voice. In order to obtain a high quality and low data rate of the compiler, there have been many forms of mixed compiler code in history, but the most successful and commonly used compiler code is the time domain synthesis-analysis (analysis-bys ynthesis,abs) compiler code .
3. Pulse-coded modulation (pulse code modulation,pcm)
3.1, Concept
Pulse code modulation (pulse code modulation,pcm) is the simplest and most theoretically perfect coding system, which is the most successful and widely used coding system. But it is also the most data-volume coding system. PCM coding principle is more intuitive and simple, its input is analog sound signal, its output is a PCM sample.
The "anti-aliasing filter" is a low-pass filter used to filter out signals outside the sound band;
"Waveform encoder" can be temporarily understood as "sampler",
The Quantizer is understood as the quantization order size (step-size) generator or the quantization interval generator.
3.2. Quantification
There are two steps to digital sound: The first step is sampling, which is to read the amplitude of the sound at intervals, and the second step is to quantify the amplitude of the sound signal that is sampled to convert it to a numeric value. One class is called homogeneous quantization, the other is called non-uniform quantization. The quantization method is different, and the amount of data is different. Therefore, it can be said that quantification is also a method of compressing data.
(1). Uniform quantification
If the sampled signals are quantified using equal quantization intervals, the quantization is called uniform quantization. Uniform quantization is the use of the same "sub-ruler" to measure the amplitude of the sample, also known as linear quantization, the quantization of the sample value Y and the original value of the difference of x E=y-x called quantization error or quantization noise.
When the input signal is quantified in this way, the same quantization interval is used for both the large input signal and the small input signal. In order to adapt to the large amplitude of the input signal, but also to meet the accuracy requirements, you need to increase the number of bits of the sample. However, for voice signals, there are not many opportunities for large signals, and the increase in the number of sample bits is not fully utilized. In order to overcome this shortcoming, there is a non-uniform quantization method, which is also called nonlinear quantization.
(2). Non-uniform quantization
The basic idea of nonlinear quantization is that when the input signal is quantified, the large input signal uses a large quantization interval, and the small input signal is small quantization interval, so it can be expressed with fewer digits in the case of satisfying the precision requirement. When sound data is restored, the same rules apply. In the nonlinear quantization, two correspondence relations are defined between the amplitude of the sampled input signal and the quantization output data, which is called the M-law-expanding (companding) algorithm and the other is called a-law-expanding algorithm.
M-Law pressure expansion
M-Law (m-law) compression (g.711) is mainly used in digital telephone communication in regions such as North America and Japan, and the relationship between quantization input and output is determined by the following formula:
Type:x is the input signal amplitude, normalized into -1≤x≤1;
SGN (x) is the polarity of x ;
M is a parameter to determine the amount of compression, which reflects the maximum quantization interval and the minimum quantization interval ratio, taking 100£m£500.
Since the input and output relationships of the M-law expansion are logarithmic, this code is also known as logarithmic PCM. In the calculation, the logarithmic curve is converted into 8 polylines with m=255 to simplify the calculation process.
A law pressure expansion
A-Law (a-law) compression (g.711) is mainly used in digital telephone communication in regions such as Europe and mainland China, and the relationship between quantization input and output is determined as follows:
0£| x| £1/a
1/a < | x| £1
Type:x is the input signal amplitude normalized to -1£ x £ 1;
SGN (x) is the polarity of x ;
A is a parameter that determines the amount of compression, which reflects the ratio of the maximum quantization interval to the minimum quantization interval.
The first part of a law expansion is linear, and the remainder is the same as the M-law pressure extension. In the calculation, a=87.56, for simplifying the calculation, also turns the logarithmic curve part into a polyline. Detailed calculations can be found in the reference [17].
For input signals with a sampling frequency of 8 kHz, sample accuracy of 13-bit, 14-bit, or 16-bit, using M-law-coded or a-law-coded, the accuracy of each sample after the PCM encoder is 8 bits, and the output data rate is kb/s. This data is the CCITT recommended g.711 standard: Voice frequency Pulse code modulation (Pulse code modulation (PCM) of voice frequences).
3.3.PCM applications
PCM coding is mainly used in the multiplexing of voice communication in the early stage. In general, the transmission of media costs in the telecommunications network accounted for about 65% of the total cost, equipment costs accounted for about 35% of the cost, so improving line utilization is an important issue. The following two methods are commonly used to improve line utilization:
(1). Frequency Division Multiplexing (frequency-Division mULTIPLEXING,FDM)
This method divides the frequency band of the transmission channel into several narrow bands, each transmitting the signal in a narrow band. For example, the frequency band of a channel is 1400 Hz, which divides the channel into 4 sub-channels (subchannels): 820~990 Hz, 1230~1400 Hz, 1640~1810 Hz and 2050~2220 Hz, and the adjacent sub-channels are separated by a total Hz, Used to ensure that the sub-channels do not interfere with each other. Each pair of users occupies only one of the sub-channels. This is the main means of analog carrier communication.
(2). Time Division multiplexing (time-Division multiplexing,tdm)
This approach is to divide the transmission channel by time, specifying a time interval for each user, and transmitting a portion of the signal at each interval, so that many users can use one transmission line at a time. This is the main means of digital communication. For example, the sampling frequency of a voice signal is f=8000 Hz, and its sampling period is =125 M S, which is called a 1 frame. In this time can be accommodated in two sizes: 24-way and 30-way system.
Frame structure of 24-way PCM
The important parameters of the 24-way system are as follows:
Transfer 8000 frames per second, M s per frame.
12 frames make up 1 complex frames (for synchronization).
Each frame consists of 24 time slices (channels) and 1 bit synchronization bits.
Each channel transmits 8-bit code each time, 1 frames have 24x8 +1=193 bit (bit).
Data transfer rate r=8000x193=1544 kb/s.
The data transmission rate of each =8000x8=64 kb/s.
The important parameters of the 30-way system are as follows:
Transmits 8000 frames per second, each frame is in Ms.
16 Frames make up 1 complex frames (for synchronization).
Each frame consists of 32 time slices (channels).
Each channel transmits 8-bit code each time.
Data transfer rate: r=8000x32x8=2048 kb/s.
The data transmission rate of each =8000x8=64 kb/s.
Time Division Multiplexing (TDM) technology has been widely used in digital telephone networks to reflect the complexity of PCM signal multiplexing, usually represented by the term "group", also known as the hierarchy of digital networks. PCM communication has developed rapidly, and the transmission capacity has been increased from the first group (base group) of 30 (or 24), to two times the group of 120 (or 96 road), three times Group 480 road (or 384 road) .... The graph of N indicates the way, regardless of the n=30 or n=24, each channel data rate is a kb/s, after the data rate of one reuse becomes 2048 kb/s (n=30) or 1544 kb/s (n=24). In digital communications, a line with this data rate is called T1 long distance digital communication line in North America, and the level of this data rate service is called the T1 level, which is called E1 Long distance digital communication line and E1 grade in Europe. T1/e1,t2/e2,t3/e3,t4/e4 and T5/e5.
Streaming Media 1