MP3 file structure and encoding and decoding process

Source: Internet
Author: User
Tags constant format definition id3 relative time interval

* MP3 Profile
* MP3 File structure
--TAG_V2 (ID3V2) label frame
--Data frame
--TAG_V1 (ID3V1) label frame
*mp3 encoding and decoding process
*mp3 File playback Process

First, MP3 Introduction:
MP3 Full name MPEG 1 Audio Layer 3, where MPEG (moving Picture Experts Group) standard includes video and audio
standards, where audio standards have been developed for MPEG-1, MPEG-2, MPEG-2 AAC, and MPEG-4.
The MPEG-1 and MPEG-2 standards use the same audio codec family--layer1, Layer 2, Layer3. It is based on the compression quality
and coding complexity, respectively, corresponding to MP1, MP2, MP3 these three kinds of sound files, and according to different purposes, using different layers
Encoding of the second. The higher the level of MPEG audio coding, the more complex the encoder, and the higher the compression rate
MPEG-2 A new feature is the use of low-sample-rate expansion to reduce data traffic, the other feature is multi-channel expansion, the main channel is increased to 5.
The MPEG Audio layer 1, Layer2, Layer3 three layers use the same filter group, bit flow structure and header information, sampling frequency is
32KHz, 44.1KHz, 48KHz.
Layer 1 is designed for digital compact Cassette, data stream 384kbps, compression ratio 4:1;
Layer 2 in the complexity and performance of the trade-offs, data flow down to 256kbps-192kbps, compression rate 6:1-8:1;
Layer 31 has been designed for low data traffic, data flow in 128kbps-112kbps, compression rate up to 10:1-12:1;
Layer 3 Adds a MDCT transform, making its frequency resolution 18 times times the Layer 2, and Layer 3 also uses the MPEG Video
A similar entropy encoding (Entropy Coding) reduces redundant information. The majority of MP3 use the MPEG-1 standard.

The audio quality of the MP3 depends on its bitrate and sampling frequency, as well as the encoder quality. The typical speed of MP3 is between
Between 128 and 320kb per second (problematic here). Sampling frequency also has 32,44.1,48 khz three frequency, more common is the use of CD
Sampling frequency--44.1khz. The commonly used encoder is lame, which fully complies with the LGPL MP3 encoder, with good speed and sound quality.
MP3 is a lossy compression method for audio signals, and in order to reduce sound distortion, MP3 adopts the "sensory coding technology", which
Losing data in the Pulse Code modulation (PCM) audio data that is not important to human hearing, thus achieving a high compression ratio, that is, the first
Audio file for spectral analysis, then filter out the noise level with filters, and then quantify the rest of each of the scattered arrangement, and finally
The MP3 file is formed with a higher compression ratio, and the compressed file can be compared with the sound effect of the original sound source when it is played back.

Two The entire MP3 file structure:

The MP3 file is composed of frames, which are the smallest constituent units of the MP3 file. Each frame contains a frame header, and the length of the frame can be calculated. Depending on the nature of the frame, the file is divided into three parts, TAG_V2 (id3v2) label frames, data frames and TAG_V1 (ID3V1) label frames. Not every MP3 file has ID3v2, but data frames and ID3V1 frames are required. ID3v2 in the file header, the string "ID3" as the logo, including the singer, composer, album and other information, length is not fixed, expanded the id3v1 of information. ID3v1 at the end of the file, with the string "tag" as the mark, its length is fixed 128 bytes, including the singer, song name, album, Year and other information.

    1. ID3v2 tags
     id3v2 to a total of four versions now, but popular playback software generally only supports the third edition, both id3v2.3. Each id3v2.3 label has a label header and several label frames or an extended tab header. Information about the track, such as title, author, etc. are stored in different label frames, it is not necessary to expand the tag head, but each label must have at least one label frame. The label header and label frame are placed in the order of the header of the MP3 file. The
Tag header
is 10 bytes long and is located at the file header with the following data structure:
Char header[3];/* string "ID3" */
Char ver;     & nbsp /* Version number id3v2.3 3 */
char Revision;/* Minor version number This version record is 0 */
Char flag;   /* Store the byte of the flag, this version only defines three bits, rarely used, can be suddenly Slightly */
Char size[4];/* Label size, except for the label header of the 10-byte label frame size */label size is four bytes, but each byte is only low 7 bits, the highest bit is not used, constant 0, the format is as follows:
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx

--Label Frame
Each label frame has a 10-byte frame header and at least one byte of non-fixed-length content. They are stored sequentially in the file and are marked by the respective specific label frame header to mark the beginning of the frame. Its frame is structured as follows:
Char frameid[4]; /* Identify a frame with four characters, stating its contents */
Char size[4]; /* The size of the frame content, not including the frame head, must not be less than 1 */
Char flags[2]; /* Store flag, only 6 bits defined, no longer stated here/*
Common Frame identifiers:
TIT2: Title
TPE1: Author
Talb: Albums
Trck: Audio track, Format: N/m,n indicates the number of albums in the album, M is the total of songs in the album
Tyer: Year
TCON: Type
COMM: Remarks, Format: "Eng/0 notes Content", where Eng indicates the language used
The frame size is the integer size represented by four bytes.

2. Data frame
Data frames tend to have multiple, as to how much, determined by file size and frame size. Each frame has a header header, the length is 4BYTE (32bit), the frame head may have two bytes of CRC check, the existence of two bytes is determined by the header information of the first 16bit, 0 is not behind the frame header, 1 is verified, the checksum length is 2 bytes, Immediately after the header, followed by the frame of the Entity data, that is, Main_data, the format is as follows:
CRC (free)

---The detailed structure of the header of the frame is:

Whether the main_data part length changes depends on whether the header's bitrate changes, a MP3 song, it has three versions: 96Kbps (96,000 bits per second), 128Kbps and 192Kbps. Kbps (bit rate), which indicates the amount of data per second of music, the higher the Kbps value, the better the sound quality, the larger the file, the MP3 standard stipulates that the unchanging bitrate MP3 file is called CBR, most MP3 files are CBR, and the changed bitrate MP3 file is called VBR , each length is likely to change.
———— C,lame Label Frames
When you really open a MP3 file, you will find that the first frame is not a real data frame, but a lame encoded frame.
There are two concepts involved: CBR and VBR. It also stores the total number of frames in the MP3 file, and the index of 100 bytes of total time-phased frames, with some other parameters;

1) CBR: Indicates that the bit rate is constant, that is, the length of each frame is consistent, it is marked with the string "INFO". Just know the total length of the file
and frame length, can be played by 26ms per frame to calculate the total time mp3 played, but also by counting the number of frames control fast forward, rewind
Slow-down operations.
2) VBR:VBR is the abbreviation of Variable bitrate, that is, the bit rate per frame and the length of the frame is changed, is the Xing company launched
Algorithm, so in MP3 there will be "XING" this keyword (now many popular small software can also be VBR compression, they
Whether or not to abide by this convention, it is unknown), it is stored in the MP3 file in the first valid frame, it identifies the MP3
The file is VBR. At the same time, the total number of frames in the first MP3 file is stored, which makes it easy to get the total time to play and
There are 100 bytes that hold the index of 100 time-phased frames for the total time of the play, assuming a 4-minute MP3 song,
240S, divided into 100 segments, each two consecutive index of the time difference is 2.4S, so through this index, as long as the front and back processing less
Number, we can quickly find the head we need to fast-forward. In addition to index, there are some other parameters, which are called Zone A, the traditional
Xing VBR label data, a total of 120 bytes. We can also see a string "LAME" in the binary text editor, and
Followed clearly with the version number. This is the 20-byte zone B initial lame information, which indicates that the file is using lame encoding technology.
Next until the end of the frame is the Zone C-lame label.

The length is calculated from the frame head:
1) Playback time per frame: No matter how long the frame is, the playback time of each frame is 26ms;
2) Data frame size:
Size = (((mpegversion = = MPEG1 144:72) * bitrate)/samplingrate) + paddingbit
For example: bitrate = 128000, a samplingrate =44100, and paddingbit = 1
Size = (144 * 128000)/44100 + 1 = 417 bytes

Its data structure is as follows:
Char header[3]; /* Tag header must be "tag" otherwise think no label */
Char title[30]; /* Title */
Char artist[30]; /* Author */
Char album[30]; /* Album */
Char year[4]; /* Vintage * *
Char comment[28]; /* Remark */
Char Reserve; /* Reserved */
Char track; /* Audio Track */
Char Genre; /* Type */
In fact, there is another version of the last 31 bytes, which is 30 bytes of comment and one byte of genre.
With this information, we can write our own code, grab information from the MP3 file, and modify the file name. However, if you really want to write a playback software, you still need to read its data frame and decode it.

Three. MP3 encoding and decoding process

MP3 Audio compression contains two sections of encoding and decoding. Encoding is the form of a bit stream that converts the data in a WAV file into a high compression rate, and decodes a bit stream and rebuilds it into a WAV file. The MP3 uses the Perceptual audio encoding (perceptual audio Coding) as a distortion algorithm. The frequency range of the human ear to feel the sound is 20hz-20khz,mp3 cut off a large number of redundant signals and irrelevant signals, the encoder through the hybrid filter set to transform the original sound to the frequency domain, using a psychological acoustic model, to estimate the level of noise that can be detected, then quantified, converted to Huffman encoding, form a MP3 bit stream. The decoder is much simpler, and its task is to extract the sound signal from the encoded spectral components, through the inverse quantization and inverse transformation.
The decoding of MP3 can be divided into 9 processes: Bitstream analysis, Huffman coding, inverse quantization processing, stereo processing, spectral rearrangement, antialiasing, IMDCT transform, sub-band synthesis, PCM output.
Briefly describe the MP3 compression process: The sound is an analog signal that samples and quantifies the sound, and the encoding will get the PCM data. PCM, also known as pulse modulation data, is the most original data that can be played by the computer, is also the source of MP3 compression, in order to achieve greater data compression rate, MPEG needle using the sub-band coding technology to divide the PCM data into 32 sub-bands, each sub-band is independent code, and then transform the data into the frequency domain analysis, MPEG uses an improved discrete cosine transform, or you can use the Fourier transform, and then in order to reconstruct the stereo in the frequency domain according to the specific rules of the arrangement, then the stereo processing, the processed data is quantified according to the protocol definition. In order to achieve greater compression, the Huffman code is then carried out. Finally, some coefficients are fused with the main information to form the MP3 file.
Decoding is the inverse of the encoding, probably as follows:
The so-called bitstream decomposition refers to the MP3 file is opened in binary mode, and then according to its compression format definition, in turn, from the mp3 file to remove the header information, edge information, scale factor information and so on. This information is required in the subsequent decoding process. Huffman coding is a lossless compression coding, which belongs to entropy coding [Average information encoding (Entropy Coding)]. MP3 decoding can be used to decode the data in real-time, but often using the table-checking method to achieve decoding (saving CPU time resources).
MP3 's technical highlights--MDCT (modified discrete cosine transform):
The modified discrete cosine transform (MDCT) refers to the conversion of a set of time domain data into frequency domain data in order to know the time domain changes. MDCT is the improvement of DCT algorithm. The early fast algorithm is the Fast Fourier transform (FFT), but the FFT has the complex operation, the MDCT is the real arithmetic, facilitates the programming.
In compressing the audio data, the original sound data is divided into a fixed block, and then the forward MDCT (Forward MDCT) to convert the value of each block to 512 MDCT coefficients, decompression, after the reverse MDCT (inverse MDCT) to restore 512 coefficients to the original sound data, Before and after the original sound data is inconsistent, because in the compression process, the removal of redundant and irrelevant data. The FMDCT transformation formula is:
The type of N is the conversion window length, that is, the sample points per block, n=8,16,...,1024,2048.
n0= (n/2+1)/2,x (N) is the last domain value, and X (k) is the frequency domain value. If n takes 1024 points, it is converted into 512 frequency domain values.
The IMDCT transformation formula is:
The MDCT itself does not compress data, it simply maps the signal to another domain and quantifies it to compress the data. When the quantization of the transformed sample value for bit allocation to consider to minimize the entire quantization block, which becomes lossy compression.

Iv. playback flow of MP3 files

A complete MP3 player is divided into several parts: CPU, decoder, storage device, host communication port, audio DAC and amplifier, display interface and control key. The central processing unit and decoder are the core of the whole system. Here the central Processing unit we are commonly referred to as MCU (monolithic microprocessor), referred to as single-chip microcomputer. It runs the entire control program of MP3, also known as Fireware (or firmware program). Control the work of the various parts of the MP3: from the storage device to read the data sent to the decoder decoding, the host when connected with the data exchange, receive control button operation, display the system operation status and other tasks. Decoder is a hardware module in the chip, or hardware decoding (some MP3 player is software decoding, by the high-speed central processing unit to complete). It can directly complete the decoding operation of MP3 data stream in various formats, and output digital audio signal in PCM or I2S format.

Storage devices are an important part of the MP3 player, and the usual MP3 Walkman is a storage device using either a semiconductor memory (FLASH memory) or a hard disk drive (HDD). It receives data from the storage host's communication port (usually in the form of a file), and the MCU reads the data from the memory and sends it to the decoder when it is played back. Data storage is to have a certain format, it is well known that the PC Management disk data is in the form of files, MP3 is no exception, the most common way is to directly use the PC file system to manage the memory, Microsoft operating system uses the FAT file system, which is the most widely used. One of the tasks of the player is to implement the FAT file system, which is the ability to access and read data from the FAT file system's disk by file name.

The host communication port is the way that the MP3 player and PC Exchange data, the PC through this port operation MP3 Player storage device data, copy, delete, copy files and other operations. Currently the most widely used USB bus, and follow the Microsoft-defined high-capacity Mobile Storage Protocol specification, the MP3 player as a host of a mobile storage device. There are several specifications to follow: USB communication protocol, bulk Mobile memory specification, and SCSI protocol.

The audio DAC converts the digital audio signal into an analog audio signal to promote analog sound devices such as headphones and amplifiers. Here we introduce the digital audio signal. The digital audio signal is relative to the analog audio signal. We know that the essence of sound is wave, and the frequency at which people say that they can hear is between 20Hz and 20kHz, called sound waves. The representation of the analog signal to the wave is a continuous function characteristic, and the basic principle is that the waves of different frequencies and amplitudes are superimposed together. Digital audio signal is a kind of quantization of analog signal, the typical method is to do the time coordinate sampling at equal time interval, to quantify the amplitude. The number of samples per unit time is called the sampling frequency. Such a sound wave can be digitized into a series of values, each corresponding to the corresponding sampling points of the amplitude values, in order to arrange these numbers are digital audio signal. This is the ADC (analog-to-digital conversion) process, in contrast to the DAC (digital-to-analog conversion) process, which converts successive numbers to the corresponding voltages in the order of frequency of sampling. The decoded information of the MP3 decoder belongs to the digital audio signal (the digital audio signal has different formats, the most commonly used are PCM and the I2S two kinds), need to turn the DAC converter to the analog signal to drive the amplifier, is recognized by the human ear.

The display device of the MP3 player usually uses LCD or OLED to display the system's working status. The control keyboard is usually a button switch. The keyboard and display devices combine to form the human-machine interface of the MP3 player.

The software structure of the MP3 player is relative to the hardware, that is, every piece of hardware has corresponding software code, because most of the hardware is digitally programmable.

To summarize, The most simplified MP3 work principle we can summarize as follows: First, the MP3 song file from the memory and read the signal on the memory → decoding chip to decode the signal → by the digital-to-analog converter to the solution of the numerical signal to the analogue signal → then the converted analog audio amplification → Low-pass filtering to the headphone output, The output is the music we hear.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.