Knowledge and technical parameters of audio coding

Source: Internet
Author: User
Tags value of pi

Introduction and comparison of common audio protocols for conference TV white Paper

First, the digital audio principle: The sound is actually a kind of energy wave, therefore also has the frequency and the amplitude characteristic, the frequency corresponds to the time axis, the amplitude corresponds to the level axis. Usually the ear can be heard in the frequency of 20Hz to 20KHz of sound waves known as audible, less than 20Hz to become infrasound, higher than 20KHz for ultrasound, multimedia technology only to study the audible part.

Audible, voice signal in the band between 80Hz to 3400Hz, music signal frequency band between 20hz-20khz, voice (speech) and music is the focus of multimedia technology to deal with the object.

Since the analog sound is continuous in time, the sound signal collected by the microphone needs to be digitally processed before it can be processed by the computer. Usually we use PCM encoding (Pulse code modulation coding), that is, by sampling, quantization, encoding three steps to convert the continuous change of analog signal to digital encoding.

1. Sampling

Sampling is the amplitude at which the sound is read at intervals of time. The number of samples per unit time is called the sampling frequency. It is obvious that the higher the sampling frequency, the more data points of the obtained discrete amplitude are approximated to the continuous analog audio signal curve, and the larger the data is sampled.

To ensure that digitized audio can be accurately (reversibly) restored to analog audio output, the sampling theorem requires that the sampling frequency must be greater than or equal to twice times the maximum frequency in the analog signal spectrum.

Commonly used audio sampling rates are: 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz.

For example: voice signal frequency in the range of 0.3~3.4khz, with 8kHz sampling frequency (FS), can be used to replace the original continuous voice signal sampling signal, and the general CD sampling frequency of 44.1kHz.

2. Quantification

Quantization is the conversion of a sampled sound signal amplitude into a numeric value that represents the signal strength.

Quantization precision: How many binaries are used to represent each sample value, also known as the quantization bit. The quantization bits of a sound signal are typically 4,6,8,12 or bits.

By sampling frequency and quantization precision can be known, relative to the nature of the signal, audio encoding can only be achieved infinitely close, in computer applications, to achieve the highest level of fidelity is the PCM code, usually PCM conventional lossless coding.

3. Coding

A sample rate of 44.1kHz, a quantization accuracy of 16bit, two-channel PCM encoded output, its data rate is 44.1kx16x2 =1411.2 Kbps, Storage for one second requires 176.4KB of space, 1 minutes is about 10.34M, so in order to reduce the cost of transmission or storage, digital audio signal must be encoded compression.

Up to now, the audio signal has been compressed to a digital rate of 32 to 256kbit/s, and the voice can be as low as 8kbit/s.

The purpose of compressing digital audio information is to minimize the amount of data in digital audio information without affecting people's use. This is usually measured in the following 6 properties:

-Bit rate;

-The bandwidth of the signal;

-Subjective/objective voice quality;

Delay

-Computational complexity and memory requirements;

-Sensitivity to channel error;

In order to make the encoded audio information widely used, it is necessary to use the standard algorithm to encode the audio information. Traditional conference TV equipment mainly uses ITU-T recommended audio standards such as G.711, g.722, g.728 and Aac_ld.

Second, the General Audio Protocol introduction:

1. ITU-T g.728

1992 ITU-T issued a recommendation on the coding of Telephone voice signals. Using the Ld-celp encoding method, the sampling rate is 8KHz, transmits the sound signal at the 16kb/second speed, the transmission delay time is very short, only has the 0.625 MS algorithm coding delay.

2. ITU-T G.711

Standard published in 1972, its voice signal encoding is non-uniform quantization PCM. The sample rate of speech is 8KHz, each sample value is 8bit quantization, the output data rate is 64kbps. This narrowband encoding supports the compression of audio from 300 to 3,400 Hz. But although the compression quality is good, but the bandwidth consumption is relatively large, mainly used in digital PBX/ISDN digital telephone.

3. ITU-T g.722

The ITU-T g.722 Standard is the first standardized wideband speech coding algorithm for a sampling rate of KHZ, which was defined by CCITT as standard in 1984 and is still in use today. The g.722 codec receives 16-bit data (bandwidths from 7 khz) at a frequency of up to 64 khz, and compresses it to a total of approximately 3 MS with an overall delay of about 56, and provides better call quality.

The advantage of g.722 is that the delay and transmission bit error rate is very low, and there is no patented technology, low cost. Therefore, g.722 is widely used in wireless communication system, VoIP manufacturer, personal communication Service, video conferencing application and so on.

4, g.722.1

g.722.1 is based on Polycom's third-generation Siren 7 compression technology, which was approved by ITU-T in 1999 as the g.722.1 standard. The g.722.1 uses a sampling frequency of 16 khz, which quantifies audio samples from a range of 7 khz to 32 and kbit/s, at a frequency of up to a rate of up to three Hz. It uses the MS-frame to provide 40ms algorithm latency.

The G722.1 achieves lower bit rates and greater compression than the g.722 codec. The goal is to achieve roughly the same quality as the g.722 at approximately half the bit rate. This license to use the code requires the authorization of Polycom Corporation.

5, G722.1 Annex C

The G722.1 Annex C is based on Polycom's siren 14 compression technology, which uses 32kHz sampling frequency to support audio sampling from a range of up to khz and compresses it to 24, 32, or Kbps. Provides 40 millisecond calculation delay with 20ms frame.

For 2005 years, the International Telecommunication Union (ITU) has approved Polycom Siren 14™ technology as a new standard for ultra-wideband audio coding over khz.    At the same time enter g.722.1 Annex C as recommended by ITU-T. G722.1 Annex C has the advantages of low computational power and low bandwidth. Ideal for handling voice, music and nature sounds.

6, Aac-ld

AAC (Advanced Audio Coding) is an audio compression format developed by the Fraunhofer Research Institute (creator of the MP3 format), Dolby Laboratories, and T (US Telephone and Telegraph Corporation) is part of the MPEG-2 specification and became an international standard in March 1997. With the MPEG-4 standard formed in 2000, MPEG2 AAC was also used as the core coding technology and added some new coding features, also known as MPEG-4 AAC.

The MPEG-4 AAC family currently has nine coding specifications, and the aac-ld (low delay, lower latency specification) is encoded at low bit rate. It supports 8k~48k sampling rate, can output 64Kbps of the bitrate close to CD quality audio, and supports multi-sound channels, AAC-LD algorithm delay is only 20ms.

AAC is more powerful because of its modular design. The frame structure itself can be filled with new things, which makes it possible for the cores of different development to be fused together and absorbed into each other.

7. Comparison of the main parameters of various audio protocols:

Sampling frequency supports the lowest audio bandwidth output bitrate algorithm delay

G711 8KHz Hz to 3,400 Hz (Kbps) <1ms

G722 16kHz Hz ~ 7 kHz 3ms

G722.1 16kHz Hz ~ kHz 24, 40ms

G722.1 C 32kHz hz~14 kHz 24, 32, 48Kbps 40ms

Aac-ld 48kHz hz-20khz 48~64 Kbps 20ms

Third, Aac_ld and G722. Annex C Pros and cons comparison:

G722.1 C Aac_ld

The sampled audio frequency range supports up to hz~14 KHz, close to CD quality, but loses the high-frequency portion.

Supports hz-20khz full band sampling and audio closer to CD quality.

Output code rate 24, 32, 48Kbps, bandwidth is lower than aac-ld, but at the expense of high-frequency.

48~64 Kbps and supports output greater than 64Kbps, providing the possibility for better audio quality.

Algorithm complexity algorithm is low complexity, CPU occupancy rate is slightly better than AAC-LD modular design, more powerful, with TI and other special chip support

Minimum delay with 20ms frame, 40ms algorithm delay 20ms algorithm delay, better than G722.1 C

Multichannel AAC supports up to 48 tracks and 15 low-frequency audio tracks

Standard versatility g722.1-c is developed by Polycom and is used by Polycom and a handful of conference TV vendors. As MPEG4 core standards, by Apple, Nokia, Panasonic and other support, and by Ted and many other conference television manufacturers, the application of a broader future.

By the Fraunhofer Research Institute's survey comparison chart can be known, at the same sampling frequency, aac-ld can provide better sound quality than G722.1 C, MP3 and so on. Aac-ld realizes the shortest delay in ultra-wideband audio coding, and guarantees close to the sound quality of CD, achieves the best combination of sound quality, bit rate and delay, and is the optimal choice in the field of Conference TV.

The sound in nature is very complex, the waveform is extremely complex, usually we use the Pulse code modulation coding, namely PCM coding. PCM converts a continuously varying analog signal to a digital encoding by sampling, quantization, and coding in three steps.

1. What is sample rate and sample size (bit/bit)?

The sound is actually a kind of energy wave, so it also has the characteristic of frequency and amplitude, the frequency corresponds to the time axis, the amplitude corresponds to the level axis. The wave is infinitely smooth, the string can be regarded as a myriad of points, because the storage space is relatively limited, the digital encoding process, the point of the string must be sampled. The sampling process is to extract the frequency value of a point, it is clear that in one second the more points extracted, get more frequency information more abundant, in order to restore the waveform, a vibration, must have 2 points of sampling, the human ear can feel the highest frequency of 20kHz, so to meet the hearing requirements of human ears, It takes at least 40k samples per second, expressed in 40kHz, and this 40kHz is the sample rate. Our common CD with a sample rate of 44.1kHz. It is not enough to have the frequency information, we must also obtain the energy value of the frequency and quantify it to indicate the signal strength. The number of quantization levels is 2 of the power of the whole number, our common CD bit 16bit sample size, that is, 2 of the 16-square. Sample size is more difficult to understand than the sample rate, because to appear abstract point, for example: Suppose that a wave is sampled 8 times, the sample points corresponding to the energy value of A1-A8, but we only use 2bit sample size, the result we can only retain 4 points in the A1-A8 value and discard the other 4. If we do a 3bit sample size, we just record all the information for the next 8 points. The larger the value of the sample rate and sample size, the more the recorded waveform is closer to the original signal.

2. Lossy and Lossless

According to the sampling rate and sample size can be learned that, relative to the nature of the signal, audio encoding can only be infinitely close, at least the current technology can only be so, relative to the nature of the signal, any digital audio coding scheme is lossy, because it can not be completely restored. In the computer application, can achieve the highest fidelity level is PCM code, is widely used in material preservation and music appreciation, CD, DVD and our common WAV files are used. As a result, PCM has a conventional lossless encoding, because PCM represents the best fidelity level in digital audio, and does not mean that the PCM will ensure that the signal is absolutely true and that the PCM can only achieve the maximum degree of proximity. We habitually include MP3 in the category of lossy audio coding, which is relative to PCM coding. The emphasis on the relative damage and lossless of the coding is to tell you that it is difficult to do real damage, like using numbers to express pi, no matter how high the precision is, it is just infinite proximity, not really equal to the value of pi.

3. Why to use audio compression technology

To calculate the bitrate of a PCM audio stream is a very easy thing to do, sample rate value x sample size value x channel number bps. A sample rate of 44.1KHz, the sample size of 16bit, two-channel PCM encoded WAV file, its data rate is 44.1kx16x2 =1411.2 Kbps. We often say that the 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called the data bandwidth, it and ADSL bandwidth is a concept. By dividing the bitrate by 8, you can get the data rate of this WAV, which is 176.4kb/s. This means that the storage of a second sampling rate of 44.1KHz, sampling size of 16bit, two-channel PCM encoded audio signal, the need for 176.4KB of space, 1 minutes is about 10.34M, which is unacceptable to most users, especially like listening to music on the computer friends, to reduce disk occupancy, only 2 ways to reduce the sampling Indicator or compression. Reducing the indicators is undesirable, so experts have developed a variety of compression schemes. Due to the different use and target market, various audio compression coding achieves the same sound quality and compression ratio, which we will mention in the following article. One thing is for sure, they have all been compressed.

4. The relationship between frequency and sampling rate

The sample rate represents the number of times the original signal is sampled per second, and the sample rate of the audio file we often see is 44.1KHz, what does that mean? Suppose we have 2 sine wave signals, 20Hz and 20KHz, each for a second, to correspond to the lowest and highest frequencies we can hear, and to sample 40KHz of these two signals separately, what kind of results can we get? The result: The 20Hz signal is sampled 40k/20=2000 times per vibration, while the 20K signal is sampled only 2 times per vibration. Obviously, at the same sampling rate, the information recorded at low frequencies is much more detailed than the high frequencies. That is why some audio enthusiasts accuse CDs of having a digital sound that is not real enough, and 44.1KHz sampling of CDs does not guarantee a better recording of high-frequency signals. To better record the high-frequency signal, it seems to need a higher sampling rate, so some friends in the capture of CD tracks when the use of 48KHz sampling rate, this is not advisable! This is not really good for the sound quality, and for the capture software, keeping the same sample rate as 44.1KHz from the CD is one of the best quality guarantees, rather than improving it. Higher sampling rates are only useful when compared to analog signals, and if the sampled signal is digital, do not attempt to increase the sampling rate.

5. Flow characteristics

With the development of the network, people to listen to music online requirements, so also require audio files can be read while playing, and do not need to read all the files and then playback, so that you can do without downloading can be achieved listening. can also do one side of the coding side play, it is this feature, can achieve online live, set up their own digital radio became a reality.

Iv. Introduction to various mainstream audio coding (or formats)

1. PCM Code

PCM Pulse code modulation is the abbreviation for the Pulse code modulation. The previous text we mentioned the PCM general workflow, we do not need to care about what the PCM final code is calculated, we only need to know the PCM encoded audio stream advantages and disadvantages. The biggest advantage of PCM coding is good sound quality, the biggest drawback is the large size. Our common audio CDs are encoded in PCM, and the capacity of a single disc can only hold 72 minutes of music information.

2. WAVE

This is an ancient audio file format developed by Microsoft. WAV is a file format that conforms to the PIFF Resource Interchange File format specification. All WAV has a file header, which is the encoding parameter of the header audio stream. WAV does not have a hard-coded audio stream encoding, except for PCM, and almost any code that supports the ACM specification can encode WAV audio streams. Many friends do not have this concept, we take AVI to do a demonstration, because AVI and WAV in the file structure is very similar, but AVI more than a video stream. We are exposed to many kinds of AVI, so we often need to install some decode to watch some avi, we touch more DivX is a video encoding, AVI can use DivX encoding to compress the video stream, of course, can also use other encoding compression. Similarly, WAV can also use a variety of audio encoding to compress its audio stream, but we are common is the audio stream is PCM encoding processing of WAV, but this does not mean that WAV can only use PCM encoding, MP3 encoding can also be used in WAV, and AVI, as long as the corresponding decode installed, You can enjoy these wav.
Under the Windows platform, PCM-encoded WAV is the best supported audio format, all audio software can be perfectly supported, because it can achieve high sound quality requirements, therefore, WAV is also the preferred format for music editing, suitable for saving music material. As a result, PCM-encoded WAV is used as an intermediary format, often in the conversion of other encodings, such as MP3 to WMA.

3, MP3 Code

  

4. OGG Code

On the network appeared a kind of called Ogg Vorbis Audio coding, known as MP3 Killer! What exactly is Ogg Vorbis? Ogg is a huge multimedia development program with a project name that will involve coding development in areas such as video and audio. The entire OGG project plan is designed to provide anyone with a completely FREE multimedia coding scheme! Ogg's belief is: open! free! Vorbis is a "playboy" character in Trie Pratt Zeit's fantasy novel "Small Gods". This term became the official name for the audio encoding in the Ogg project. At present, Vorbis has been developed successfully and the encoder has been developed.
Ogg Vorbis is a high-quality audio coding scheme, and official data shows that Ogg Vorbis can achieve better sound quality than MP3 at relatively low data rates! Ogg Vorbis This coding is also far more advanced than the successful MP3 of the 90 's, and she can support multichannel, what does that mean? This means that Ogg Vorbis, with the support of the SACD, DTSCD, DVD audio capture software (which is not yet available), can encode all channels, rather than MP3 encode only 2 channels. The rise of multichannel music brings a revolutionary change to music appreciation, especially when it comes to appreciating the symphony, which brings more realism. This revolutionary change is MP3 to be able to adapt.
Like MP3, Ogg Vorbis is a flexible and open audio codec that can be used to adjust the sound quality and improve the new algorithm after the coding scheme has been fixed. Therefore, its sound quality will be getting better, and MP3 similar, Ogg Vorbis more like an audio coding framework, you can constantly import new technologies gradually perfected. Like MP3, Ogg also supports VBR.

5. MPC Code

MPC is another impressive strength of the player, its popularity process is very low-key, there is no complicated background story, her appearance is only one purpose, smaller volume of better sound! The MPC was formerly known as Mp+, and it was clear that she was targeting the competitor. However, as long as the person who has used this code will have a deep impression, is her outstanding sound quality.

6, mp3PRO Code

June 14, 2001, Thomson Multimedia SA and Francheves Association (Fraunhofer Institute) released a new version of the music format, named mp3PRO, on June 14. This is an improved scheme based on MP3 coding technology, which appears to be quite appealing from the official announcement features. From various sources, mp3PRO is not a completely new format, it is based on the traditional MP3 coding technology, the biggest technical highlight is the SBR (spectral Band Replication band copy), which is a new audio coding enhancement algorithm. It provides the possibility to improve the performance of audio and speech coding in low-rate situations. This method can increase the bandwidth of the audio or improve the coding efficiency at the specified bit rate. The biggest advantage of SBR is to achieve very efficient coding at low data rate, unlike traditional encoding technology, SBR is more like a post-processing technology, so the advantages and disadvantages of decoder algorithm directly affect the quality of sound. The high frequency is actually produced by the decoder (player), the SBR encoded data is more like a high-frequency command set, or called the guidance of the signal source, which is a bit 駇 idi way of working. As we can see, mp3PRO is actually a mixed data stream encoding of MP3 signal flow and SBR signal stream. The data show that SBR technology can improve the high frequency sound quality under low data flow, the degree of improvement is about 30%, we no matter how this 30% is obtained, but can anticipate this improvement can make 64kbps mp3 reach 128kbps MP3 sound quality level (note: Under the same encoding condition, The increase in data rates and sound quality is not proportional to, at least, the ear of the hearing is such, and the official claim that the 64kbps mp3PRO can be comparable to 128kbps MP3 propaganda is basically consistent.

7. WMA

WMA is the Windows Media Audio encoded file format, developed by Microsoft, WMA is not aimed at the single market, is the network! Competitors are the famous real Networks in the online media market. Microsoft claims that with only 64kbps of bitrate, WMA can reach the sound quality near the CD. Unlike previous encodings, WMA supports anti-replication, and she supports the ability to add protection through Windows Media rights Manager, which can limit playback times and the number of plays and even the machines that are playing. WMA supports streaming technology, that is, playing on one side, so WMA can easily be broadcast online, because it is Microsoft's masterpiece, so Microsoft added the support for WMA in Windows, WMA has excellent technical characteristics, in the strong promotion of Microsoft, this format is more and more people accept.

8. RA

RA is the RealAudio format, this is a lot of network insects contact a very large format, most of the music website online audition is the use of RealAudio, this format is completely targeted to the network media market, support very rich features. The biggest flashing point is that this format can be based on the audience's bandwidth to control their own bitrate, in order to ensure smooth conditions to maximize the sound quality. RA can support multiple audio encodings, including ATRAC3. Like WMA, not only does RA support edge-reading, it also supports the use of special protocols to conceal the real network address of a file, enabling only online playback without the download. This is important for record companies and record sales companies, where RA and WMA are the most popular audio media formats available on the Internet for online listening.

9. APE

Ape is a lossless compression format provided by monkey ' s audio. Monkey's audio provides Winamp plug-in support, so this means that the compressed file is no longer a pure compression format, but an audio file format that can be played as well as MP3. The compression ratio of this format is much lower than other formats, but can be truly lossless, thus gaining a lot of fever users favor. In the existing many lossless compression scheme species, ape is a prominent performance of the format, satisfactory compression ratio and fast compression speed, became a lot of friends privately exchange fever music the only choice.

Knowledge and technical parameters of audio coding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.