Principle and structure analysis of "turn" MP3 file

Last Update:2015-09-30 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction
The ever-changing file compression technology makes MP3 the most hot music format today, with high-quality music spread around the world with 0 and 1, shaking people's hearts. What is MP3? The full name of MP3 is the MPEG Audio Layer 3, an efficient computer audio coding scheme that converts audio files to smaller extensions with a larger compression ratio. MP3 files, basically keep the original file sound quality. MP3 is part of the ISO/MPEG standard, Iso/mpeg standard describes the use of high-performance perceptual coding scheme of audio compression, this standard has been constantly updated to meet the "high quality of small" pursuit, has now formed MPEG Layer 1, Layer 2, Layer 33 audio codec scheme. MPEG Layer 3 compression rate of up to 1:10 to 1:12,1m MP3 file can play 1 minutes, and 1 minutes CD-quality WAV file (44100hz,16bit, two channels, 60 seconds) to occupy 10M space, so calculate, A 650M MP3 disc playback time should be over 10 hours, while the same capacity of a CD disk playback time in about 70 minutes. The advantage of MP3 is that it is hard to match the CD.
Analysis on the principle of 2 MP3
2.1 MPEG Audio Standard
MPEG (Moving Picture Experts Group) is a dynamic image Expert group under ISO, which has developed MPEG standards that are widely used in various multimedia applications. MPEG standards include video and audio standards, where audio standards have been developed for MPEG-1, MPEG-2, MPEG-2 AAC, and MPEG-4.
The MPEG-1 and MPEG-2 standards use the same audio codec family-layer1, 2, 3. MPEG-2 A new feature is the use of low-sample-rate expansion to reduce data traffic, the other feature is multi-channel expansion, the main channel is increased to 5. The MPEG-2 AAC (MPEG-2 advanced Audio Coding) standard was launched by Fraunhofer IIS at the same company in 1997 to significantly reduce data traffic, MPEG-2 AAC MDCT (Modified Discrete cosine Transform) algorithm, the sample rate can be between 8KHz to 96KHz, the number of channels can be between 1-48.
The MPEG Audio layer 1, 2, and 33 layers use the same filter group, bit flow structure, and header information, with a sampling frequency of 32KHz, 44.1KHz, or 48KHz. Layer 1 is designed for digital compact Cassette, which has a data flow of 384kbps,layer 2 in complexity and performance tradeoffs, and data traffic drops to 256kbps-192kbps. Layer 31 is designed for low data traffic, and data traffic increases the MDCT transform in 128kbps-112kbps,layer 3, making its frequency resolution 18 times times the Layer 2, and Layer 3 using MPEG Video-Similar entropy encoding (Entropy Coding) reduces redundant information. The majority of MP3 use the MPEG-1 standard.
2.2 Purpose of audio compression
The MP3 format began in the 80 's, and the Fraunhofer Institute in Erlangen, Germany, is committed to high-quality, low-data-rate sound coding. Let's take a look at an example: you want to sample a song you like for about 4 minutes, store it on disk, sample it in a CD-quality WAV format, sample rate of 44.1kHz, i.e. receive 44,100 values per second, stereo, 16 bits (2 bytes) of each sampled data, then this song occupies space:
44100x2 Channel x2 bytes x60 seconds x4 minutes =40.4MB
If you are downloading this song from the internet, assuming a transfer rate of 56kbps, the download time is:
40.4x106x8/56x103x60=96 min
Even if the 1M broadband network is more than 5 minutes, it can be seen that audio compression is particularly important to reduce the storage space of audio data.
2.3 MP3 encoding and decoding
MP3 Audio compression contains two sections of encoding and decoding. Encoding is the form of a bit stream that converts the data in a WAV file into a high compression rate, and decodes a bit stream and rebuilds it into a WAV file.
The MP3 uses the Perceptual audio encoding (perceptual audio Coding) as a distortion algorithm. The frequency range of the human ear to feel the sound is 20hz-20khz,mp3 cut off a large number of redundant signals and irrelevant signals, the encoder through the hybrid filter set to transform the original sound to the frequency domain, using a psychological acoustic model, to estimate the level of noise that can be detected, then quantified, converted to Huffman encoding, form a MP3 bit stream. The decoder is much simpler, and its task is to extract the sound signal from the encoded spectral components, through the inverse quantization and inverse transformation. The MP3 encoding and decoding process is shown in 1.
2.4 Modified Discrete cosine transform
The modified discrete cosine transform (MDCT) refers to the conversion of a set of time domain data into frequency domain data in order to know the time domain changes. MDCT is an improvement to the DCT algorithm. The early fast algorithm is the Fast Fourier transform (FFT), but the FFT has the complex operation, the MDCT is the real arithmetic, facilitates the programming.
In compressing the audio data, the original sound data is divided into a fixed block, and then the forward MDCT (Forward MDCT) to convert the value of each block to 512 MDCT coefficients, decompression, after the reverse MDCT (inverse MDCT) to restore 512 coefficients to the original sound data, Before and after the original sound data is inconsistent, because in the compression process, the removal of redundant and irrelevant data. The FMDCT transformation formula is:
K=0,1,...,n/2-1
The type of N is the conversion window length, that is, the sample points per block, n=8,16,...,1024,2048.
n0= (n/2+1)/2,x (N) is the last domain value, and X (k) is the frequency domain value. If n takes 1024 points, it is converted into 512 frequency domain values.
The IMDCT transformation formula is:

N=0,1,...,n-1
The MDCT itself does not compress data, it simply maps the signal to another domain and quantifies it to compress the data. When the quantization of the transformed sample value for bit allocation to consider to minimize the entire quantization block, which becomes lossy compression.
3 MP3 file Format analysis
The MP3 file data consists of multiple frames, and the frame is the smallest constituent unit of the MP3 file. Each frame is made up of frame headers, additional information, and sound data. The playback time of each frame is 0.026 seconds, and its length varies with the bit rate. Some additional bytes at the end of some MP3 files hold descriptive information for non-sound data. The MP3 file structure is 2.

3.1 Frame Header Format
The frame head is 4 bytes long, and for the fixed bit rate of the MP3 file, the frame header format of all frames is as follows:
typedef frameheader{
unsigned int sync:11;//synchronization information
unsigned int version:2;//version
unsigned int layer:2;//layer
unsigned int protection:1;//CRC checksum
unsigned int bitrate:4;//bit rate
unsigned int frequency:2;//frequency
unsigned int padding:1;//frame length adjustment
unsigned int private:1;//reserved word
unsigned int mode:2;//channel mode
unsigned int mode extension:2;//expansion modes
unsigned int copyright:1;//copyright
unsigned int original:1;//original logo
unsigned int emphasis:2;//accent mode
}header, *lpheader;
The frame header 4 byte usage instructions are shown in table 1.
Table 1 MP3 Frame header byte usage instructions
Name Length (bit) description
Synchronous
Information 11 1th, 2 bytes All bits are 1, and the 1th byte is constant ff.
Version 2 00-mpeg 2.5 01-Undefined
10-mpeg 2 11-mpeg 1
Layer 2 00-undefined 01-layer 3
10-layer 2 11-layer 1
Crc
Checksum 1 0-Checksum 1-No checksum
Bit rate 4 The 3 byte sampling rate, in Kbps, for example with MPEG-1 Layer 3,64kbps Yes with a value of 0101.
Frequency 2 sampling frequency, for MPEG-1:
00-44.1khz 01-48khz
10-32khz 11-Undefined
Frame length
Adjustment 1 To adjust the length of the file head, 0-No adjustment, 1-adjustment, the specific adjustment calculation method see below.
Reserved word 1 is not used.
Channel
Mode 2 The 4 byte represents the channel,
00-Stereo 01-joint Stereo
10-Dual Channel 11-Mono
Expand
Mode 2 is only used when the channel mode is 01.
Copyright 1 File Legal, 0-illegal 1-legal
Original
Flag 1 Whether original, 0-non-original 1-Original
Emphasize
Mode 2 is used to classify sound after noise reduction and re-compensation, which is seldom used and may not be used in the future.
00-Undefined 01-50/15ms
10-Reserved 11-ccitt j.17
The MP3 frame length depends on the bit rate and frequency, and the formula is:
Frame length = 144xbitrate∕frequency+padding
For example: Bit rate is 64kbps, frequency is 44.1khz,padding is 1 o'clock, frame length is 210 bytes. After the frame header is a variable-length additional information, the length of the standard MP3 file is 32 bytes, followed by the compressed sound data, which is decoded when the decoder is read here.
For MP3 files with fixed bit rate (cbr,constant bitrate), not all frames are equal, and some frames may be one or several bytes long. There is also a modified rate (VBR, Variable bitrate) MP3 file, is to make the MP3 file length of the minimum while guaranteeing sound quality, compared with the CBR file, in addition to the first frame is different, the rest are the same. The first frame of the VBR contains no sound data and is 156 bytes long to hold the standard sound frame header (4 bytes), the VBR file identifier, the number of frames, and the number of bytes of the file, as described in Table 2.
Table 2 VBR file First frame structure
BYTE description
1-4 the same standard sound frame header as the CBR
5-40 Store the VBR file identification "Xing" (6E 67), depending on the MPEG standard and channel mode used. The front and back bytes of the identity are not used.
36-39 MPEG-1 and non mono (common)
21-24 MPEG-1 and Mono
21-24 MPEG-2 and non mono
13-16 MPEG-2 and Mono
41-44 flag indicating whether the number of frames, file lengths, directory tables, and VBR scale information is stored, or 01 02 04 08 If stored.
45-48 frames (including first frame)
49-52 file length
53-152 Table of Contents for byte positioning by time.
153-156 VBR scale for bit rate change

3.2 ID3 Standard
In addition to storing some simple music information such as private, copyright, and original, the MP3 frame header does not consider storing the complex information such as song name, author, album name, year and so on, which is necessary in MP3 application. In 1996, Frickemp in the "Studio 3" project in the end of the MP3 file to add a piece of information to store songs, the formation of ID3 standards, so far has developed ID3 v1.0,v1.1,v2.0,v2.3 and V2.4 standards. The higher the version, the more detailed the information about the record.
ID3 V1.0 Standard is not comprehensive, storing less information, unable to store lyrics, can not input album cover, pictures and so on. V2.0 is a fairly complete standard, but it is difficult to write software, although the number of people in favor of this format, in the software really achieved very little. Most MP3 still use the ID3 V1.0 standard. This standard is used to store the last 128 bytes of the MP3 file at the end of the ID3 information, which is shown in table 3 for the 128-byte usage instructions.
Table 3 ID3 V1.0 file footer Description
byte length
(bytes) Description
1-3 3 Store The "TAG" character, indicating the ID3 V1.0 standard, followed by the song information immediately thereafter.
4-33 30 Song name
34-63 30 Authors
64-93 30 Album name
94-97 4 years
98-127 30 Notes
1 MP3 Music category, total 147 kinds.

3.3 File Instances
In VC + + open a file named Test.mp3, which reads as follows:
000000 FF FB 8C (C5) 2A C1
000010 A6 00 00 05 96 41 34 18 20 80 08 26 48 29
000020------C1 21 41 50 64
......
0000d0 FE FF FB 8C 6E 08 20 02 30
0000E0 0C CD C0 B8 01 00 08 36 48
0000f0 B7 (F4) E1 FF FF FF FF, 2F FF FF
......
0001A0 DF FF FF FB 8C (FE) 6E A0 02
0001b0 B0 CA-E1-F6 (BC) 7C
0001C0 AC B4 (94 FF FF FF FF FF FF FF FF FF)
......
001390 7F FF FF FF FD 4E 00 54 41 47 54 45 53 54 00 00
0013a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
......
0013f0 00 00 00 00 04 19 14 03 00 00 00 00 00 00 00 00
001400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
001410, XX, XX, 4E
The file length is 1416H (5.142K) and the frame head is: FF FB 8C, converted to binary as:
11111111 11111011
01010010 10001100
Table 1 shows that the Test.mp3 frame header information is shown in Figure 4.
Table 4 Test.mp3 File frame header information
Name bit value Description
Synchronization information 11111111111 The 1 byte constant for the ff,11 bit is 1.
Version 1 MPEG
Layer 3
CRC checksum 1 not verified
Bit rate 0101 64kbps
Frequency 44.1kHz
Frame length adjustment 1 adjustment, frame length is 210 bytes.
Reserved word 0 is not used.
Channel Mode 10 Dual Channel
Expansion mode 00 is not used.
Copyright 1 Legal
Original Logo 1 Original
Accent mode 00 Not defined

The three bytes starting at 1397th H are 54 41 47, which is the character "TAG", which indicates that the file has ID3 V1.0 information.
139AH starts with 30 bytes to hold the song name, the first 4 non 00 bytes are 54 45 53 54, which means "TEST";
The 4 bytes starting with 13F4H are 04 19 14 03, the Year of storage "04/25/2003";
The last 1 bytes are 4E, representing the music category, codenamed 78, i.e. "Rock&roll";
All other bytes are 00 and no information is stored.
4 concluding remarks
Sound as a kind of important multimedia data, people are always looking for more efficient compression method and new sound file format. The MDCT transform is used in the MP3 file, which is a quasi-optimal transformation with simple structure and easy programming, which avoids the problem that the optimal transformation (K-L) is difficult to solve the eigenvalue and eigenvector of covariance matrix. Through the analysis of the MP3 file format, it is not difficult to find its lack. The MP3 file has 4 bytes of the same frame header per frame, which requires some space overhead for a MP3 file that contains a large number of frames. ID3 Store Music Description information, the frame head of the private, copyright and other information is also a description of information, music, the description of information a little messy feeling.
In any case, MP3 's development is overwhelming, MP3 has become a recognized sound data format, MP3 is working with JPEG images, PDF documents become a hotspot in the field of multimedia information processing.

Principle and structure analysis of "turn" MP3 file

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More