AAC Algorithm Summary

Source: Internet
Author: User
Tags coding standards new features

a Introduction <?xml:namespace prefix = o ns = "Urn:schemas-microsoft-com:office:office"/>

 

This paper summarizes the audio coding algorithm of AAC (Advanced Audio Coding). First, we briefly introduce the development process of MPEG Audio (including AAC) and the AAC profile, and then combine the FAAC (free AAC) code to analyze the various modules of AAC coding algorithm in detail.

two AAC Overview

1 MPEG and its AAC The development history of audio

 

The 1988 ISO/IEC standardization organization established the Motion Picture Expert Group (MPEG) (official name ISO/IEC jtc1/sc29/wg11) to develop universal international standards for the combination of motion images, associated sounds, and image sounds. Since 1988, Iso/mpeg has undertaken a lot of standardization work on video and audio coding. Its standards have been widely used in many aspects.

At the end of 1992, MPEG completed the MPEG-1 Video coding standard. Finally adopted as ISO/IEC is 11172 standard. The corresponding audio section is divided into three modes of operation (called layers: layer), encoded from Layer-1 to Layer-3, providing progressive audio quality, and, of course, increasing complexity at the same time. Layer-3 provides the highest level of complexity and the best audio quality encoding method, is widely known as the MP3.

The audio portion of the MPEG-2 expands on both sides of the MPEG-1, while maintaining the back compatibility of the audio encoding: support for 5.1-channel, adapted to the often so-called cinema acoustics, increased support for 16khz,22.5khz,24khz sampling rate. This is MPEG-2 BC (back-compatible).

According to the validation model proposed in 1994, if the introduction of new algorithms, the waiver of compatibility can significantly improve the efficiency of coding, MPEG then abandoned the original compatibility requirements, so set up a new work project, defined as AAC (Advanced Audio Coding), The international standard ISO 13818-7 was formed in 1997. This standard is incompatible with MPEG-1, which is known as MPEG-2 NBC (Non back Compatible) encoding.

MPEG-4 provides two coding methods to encode the audio portion. For medium to high bit rate audio is implemented by improved AAC coding, low bit rate audio is adopted by NTT (Japan Telecom Telephone Company) and other developed TWINVQ coding methods.

The attention of MPEG-7 and MPEG-21 after MPEG-4 is not focused on improving the quality and reducing the code rate, but instead of solving the problem of multimedia data expression retrieval. So AAC is by far the best quality audio coding standard for MPEG.

AAC combines a number of new technologies with many new features. It supports a variety of sample rates from 8k to 96k and supports multiple channel configuration scenarios. Compared with MPEG Layer-3, AAC improves frequency resolution, increases linear prediction and time-domain noise shaping, improves combined stereo coding and Huffman Codebook, and uses adaptive long-and long-window switching mechanism in time-frequency transformation to effectively increase compression ratio and improve audio quality. All this enables AAC to have better coding quality and performance than other coding standards.

2 AAC Introduction to Algorithms

 

AAC systems include filter banks, psycho-acoustic models, quantization and coding, prediction, TNS, stereo processing, and gain control, among many highly efficient coding tools. The organic combination of these modules or processes forms the basic coding and decoding process for AAC systems. In practical applications, not all functional modules are required, the following table lists the options for each module:

 

Tools ( module )

Optional

Code Stream Packaging

Necessary

Noise-free coding

Necessary

Quantization device

Necessary

Scaling factor Processing

Necessary

m/s stereo processing

Options available

Forecast

Options available

Is (intensity stereo)/coupling channel processing

Options available

Tns

Options available

Filter Bank

Necessary

Gain Control (preprocessing)

Options available

Psychological acoustics Model (perceptual module)

Necessary

MPEG-2 AAC optional modules for each encoder

To be able to adapt to different applications, three different complexity frameworks (profiles) are defined in the AAC standard. The following were:

Main profile: In this framework has the highest complexity, can be used in storage and computing capacity is very sufficient occasions. In this framework, all coding tools except gain control are used to improve the compression efficiency.

LC (Low complexity) Profiles: This framework is used for compression situations where limited storage space and computational power are required. In this framework, there are no predictive and gain control tools, and the order of TNS is lower.

SSR (Scalable Sample rate) Profile: In this framework, the gain control tool is used, but the prediction and coupling tools are not allowed, with lower bandwidth and TNS order. The Gain control tool is not used for the lowest one PQF sub-band. When bandwidth is reduced, the complexity of the SSR framework can also be reduced, especially in the event of network bandwidth changes.

Main and LC frame change coding algorithm, using MDCT as its time/frequency analysis module, the SSR framework uses a hybrid filter group, the signal bandwidth is divided into 4 sub-bands, and then for the MDCT transformation. In three scenarios, a tradeoff between the coding quality and the complexity of the coding algorithm is achieved by selecting different modules.

AAC belongs to perceptual audio encoding. Similar to all perceptual audio codes, the principle is to encode the spectral lines in the transform domain using the masking effect of the human ear, to remove the information that will be masked, and to control the quantization noise in the encoding when it is not resolved.

In the coding process, the time domain signal is decomposed into the frequency domain by the filter bank (the window MDCT transform), and the time domain signal is obtained by the Type II psychological acoustics model of MPEG, the control information required for masking threshold, m/s and intensity stereo coding, and the selection information of the filter group should be used. The transient Noise shaping (TNS) module controls the distribution of noise by shaping it to resemble the energy spectrum envelope shape. Intensity stereo coding and prediction as well as M/s stereo coding can effectively reduce the number of bits required for coding, then the quantization module uses two nested loops for bit allocation and control quantization noise less than masking threshold, then it improves the Huffman coding of codebook. Thus, along with the sideband information obtained from the previous modules, the AAC stream can be formed.

The algorithm principles and corresponding implementation codes of the main modules are analyzed separately.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.