WMA format file header parsing

Source: Internet
Author: User

Http://blog.csdn.net/chenmeimei_8899/archive/2009/01/20/3839463.aspx

WMA is a media file format defined by Microsoft. It is a streaming media. Since the WMA protocol is not public, we can only analyze it through a large number of files, simple analysis can be found on the Internet, and I will reference some of others below:

The format of the ASF file is similar to that of the WMA file. For details, see the code I wrote below. File analysis is performed based on the Code provided by asfhead. c In mplayer. Mplayer only removes the standard WMA header information, and the extended information is not parsed. The Code is as follows.

 

For each WMA file, the first 16 bytes are fixed, which is in hexadecimal format: "30 26 B2 75 8e 66 CF 11 A6 D9 00 AA 00 62 ce 6C ", used to identify whether the file is a WMA file. The next 8 bytes are an integer that indicates the size of the header of the entire WMA file. The header contains all non-Audio Information such as tag information, followed by Audio Information, we will not go into details here. 1. The next six bytes of the integer are not clear about the usage, but it does not affect the reading and writing of tag information. The following is the WMA file structure:


/*
That is to say, the offset starting from the file is 31, which stores many frames, including the standard tag information, Extended Tag Information, and WMA file control information. Each frame is not of an equal length, but the frame header is fixed to 24 bytes. the first 16 bytes are used to identify the frame name, the last eight bytes indicate the size of the frame (including the frame header. This is similar to the ID3v2 information of the MP3 file.

Because we only need to read and write tag information, and the tag information is saved in two frames respectively, which are standard tag frames and Extended Tag frames, all we need to do is process these two frames, other frames can be skipped based on the obtained frame length.

2. The standard tag frame only contains the title, artist, copyright, and remarks of the song. Its frame name is in hexadecimal format "33 26 B2 75 8e 66 CF 11 A6 D9 00 AA 00 62 ce 6C ", the size of the 24-byte frame header is followed by five Integers of 2 bytes respectively. The first four indicate the size of the song title, artist, copyright, and remarks respectively, the fifth question is what is used. In most cases, it is not used, that is, its size is 0.

After these 10 bytes, the content of the four information is stored in order. Remember, in the WMA file, all the text is stored in the Unicode wide character encoding mode, and each string is followed by another 0 ending character.

Looking at the Extended Tag frame, this is a little complicated. The number of information contained in it is uncertain, and each information is organized in the same way as the frame. The frame name of the Extended Tag frame is in hexadecimal format "40 A4 D0 D2 07 E3 D2 11 97 F0 00 A0 C9 5E A8 50 ", there is a two-byte integer after the 24-byte frame header to indicate the total number of extended information (exno) in the frame ).

Each extension information contains the extension information Name and corresponding value. There is a two-byte integer to indicate the size of the extended name information, followed by the extended information, and then there is a two-byte integer sign (FLAG), which will be discussed later. Then there is a two-byte integer, indicating the size of the value. This is the value.

When the extension information is named wmfsdkversion, this value indicates the version of the WMA file. When the extension information is named WM/albumtitle, this value indicates the album name; when the extension information is WM/genre, this value represents the genre. Similarly, it is easy to see the purpose of this value from the extension information name. These extended information names and values are almost all stored using Unicode strings. So far, only the following two exceptions have been found:

Next let's take a look at the flag, which is basically useless (usually 0). It is only useful for the extended information names WM/tracknumber and WM/track, when the flag is 3, the value (that is, the track information) is expressed as a four-byte integer. When the flag is 0, the track information is expressed as a normal string.

This is generally the case, but there are many versions of WMA, which can only be said to have encountered some processing during the test.

I am mainly talking about writing WMA headers. To write files, make sure the files are not changed. Otherwise, the files cannot be played. This is depressing, I have encountered it, and I cannot make it play. in general, do not change the file header. Generally, the WMA header will contain many 0-byte characters, which can be modified. we call it the padding value. it generally takes 0x74, 0xd4, 0x06, 0x18, 0xdf, 0xca, 0x09, 0x45,
0xa4, 0xba, 0x9a, 0xab, 0xcb, 0x96, 0xaa, 0xe8 indicates the pading Frame

If you are interested, you can make an analysis on this.

 

WMA starts with a 16-byte identifier, indicating that it is WMA: 30 26 B2 75 8e 66 CF 11 A6 D9 00 AA 62 ce 6c, there are 6 bytes behind the 16 bytes that are unclear, but many people say they are the number of WMA frames. Generally, it is useless. skip this step.

The following is the WMA frame. WMA has many frames. The first 16 bytes of each frame are used to identify the frame. The following 8 bytes indicate the frame size. Standard tags and extended tags are often used.

1. the Identifier byte of the standard tag is 33 26 B2 75 8e 66 CF 11 A6 D9 00 AA 00 62 ce 6C. The standard tag contains five pieces of information: name, the artist, copyright, and comments are unclear. The structure of the standard tag is as follows:

Frame header ID (16)
Frame size (8)
 
Music name size
Artist size
Copyright
Comment size
Fifth information size
 
Music name
 
Artist
 
Copyright
 
Note
 
Fifth Information
 

 

2. Extended Tag: 40 A4 D0 D2 07 E3 D2 11 97 F0 00 A0 C9 5E A8 50, which contains information such as albums, genres, and album covers. The frame structure is as follows:

Header ID (16)
Frame size (8)
 
Information count (2)
 
Length of Information Representation (2)
Identifier content
Flag (2)
Content Length (2)
Content
 

When the extension information is named wmfsdkversion, this value indicates the version of the WMA file. When the extension information is named WM/albumtitle, this value indicates the album name; when the extended information is WM/genre, this value represents the genre; the album cover of the "WM/picture" code. Here we will focus on reading pictures from music files on arm.

The image storage structure is as follows:

WM/picture
Flag (2)
 
Tag size (2)
Unknown flag (1)
 
Image Length
Unknown flag (2)
 
Image Type
 
Data
 

When reading an image on arm, if the image type is incorrect, an error occurs. You can use ishell_gethandle to obtain the classid of the iimage interface.

3.extended frame of the WMA header: B5 03 BF 5f 2E A9 CF 11 8e E3 00 C0 0C 20 53 65. This tag contains a lot of information, in an extended frame, there may be many sub-frames. One of the useful frames for writing WMA is the padding frame. This frame can be used as a sub-frame of the extended frame or independently become a frame. This frame can be changed without re-writing the WMA header. No matter where the padding frame is, its identifier is the same: 74 D4 06 18 DF ca 09 45 A4 Ba 9A AB CB 96 AA E8, and the 8 bytes after it indicates the frame size, all others are 00.

In an extended frame

The extended frame structure is as follows:

Extended frame header ID (16)
Frame size (8)
 
Reserved bits (16)
Reserved bits (2)
 
Data size (4)
 
Data (including multiple frames)
 

 

3. for frames with the WMA attribute, the 16-byte identifier of this frame is: A1 dc AB 8C 47 A9 CF 11 8e E4 00 C0 0C 20 53 65, this frame contains the file size, creation time, and some data of file playback, such as the number of data packets and bit rate, which we don't need to know in detail. At the beginning of the frame, 24th is followed by information about the file size.

Modify the WMA header.

Most WMA has the pading frame, so you do not need to re-write the file and write the modified content to the padding. The following describes the method in detail: Modify the music name as an example.

The music name is written in the standard tag, so you need to modify the standard tag. If the pading frame is behind the standard tag, the content from the standard tag to the padding tag is moved backward or forward. In short, the size of the new standard tag is ensured. Modify the padding. If the padding tag is in the extension tag, you must modify the padding size, data size, and padding tag size in the extension header, if the padding is not a sub-tag of the extension tag, you can write the size of the padding tag. Modify the size of the standard header, and then modify the length of the music name. In another case, if there is no standard tag, you need to create a standard tag, which is generally written at the beginning of the file. Be careful when writing the WMA header, because if an error occurs, the file may be damaged and cannot be played.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/chenmeimei_8899/archive/2009/01/20/3839463.aspx

 

 

Explanation of the MPEG audio file format (reproduced)

MP3 files are composed of frames, which are the smallest unit of MP3 files. The full name of MP3 should be mpeg1 layer-3 audio files, and MPEG (Moving Picture Experts Group) should be translated as an expert group of active images in Chinese, especially for activity audio and video compression standards, an MPEG audio file is an audio part of the mpeg1 standard. It is also called an MPEG audio layer. It is divided into three layers based on the compression quality and encoding complexity, namely layer-1, Layer2, layer3, it corresponds to the MP1, MP2, and MP3 audio files respectively, and uses different layers of encoding according to different purposes. The higher the level of MPEG Audio Encoding, the more complicated the encoder and the higher the compression ratio. The compression ratios of MP1 and MP2 are and 6- 8respectively, while the compression ratio of MP3 is as high as-12, that is to say, music with a one-minute CD sound quality requires 10 MB of storage space without compression, and only about 1 MB after being compressed by MP3. However, MP3 uses lossy compression for audio signals. To reduce sound distortion, MP3 adopts "sensory encoding technology", that is, Spectrum Analysis of audio files is performed before encoding, then filter out the noise level, and then sort the remaining bits in a quantitative manner to form an MP3 file with a high compression ratio, in addition, the compressed file can achieve sound effect close to the original audio source during playback.
I. MPEG audio compression Basics

Among the many audio compression methods, these methods compress digital audio as much as possible while maintaining sound quality to occupy less storage space. Mpeg Compression is the best in this field. This compression method is lossy compression, which means that part of the Audio Information will be lost when this method is used for compression. However, it is difficult to find this loss due to the control of the compression method. Using several very complex and demanding mathematical algorithms, the loss of almost nothing in the original audio is lost. This leaves more space for important information. In this way, the audio can be compressed by 12 times (the compression ratio can be selected), with remarkable results. It is precisely because of its quality that MPEG audio has become popular.
MPEG-1, MPEG-2 and MPEG-4 are familiar with the MPEG standard, MP3 only involves the first two, and there is also an unofficial standard MPEG-2.5 used to extend the MPEG-2/LSF to a lower sampling rate.
MPEG-1 audio (ISO/IEC 11172-3) describes three-tier Audio Encoding with the following attributes:
1 or 2 audio channels
The sampling frequency is 32 kHz, 44.1khz or 48 khz.
Bit Rate from 32 Kbps to 448 kbps
Each layer has its own advantages.
MPEG-2 audio (ISO/IEC 13818-3) has two MPEG-1 extensions, typically called MPEG-2/LSF and MPEG-2/multichannel
MPEG-2/LSF has the following features:
1 or 2 audio channels
Sampling frequency is half the MPEG-1
The baud rate ranges from 8 kbps256kbps
MPEG-2/mutichannel has the following features:
Up to 5 channels and 1 LFE-channel (low frequency enhancement is not heavy bass)
Same sampling frequency as MPEG-1
The highest baud rate of 5.1 may reach 1 Mbps.

Ii. Basic principles of MPEG layer3 encoding/Decoding

Music CD has a 44.1 kHz 16 bits stereo audio quality. a cd can store 74-minute songs (about 15 songs ). How to compress these songs lossless or basically lossless so that more songs can be stored in the same media has been plagued by the software industry. When the mpeg Association proposed MPEG audio layer1 ~ After lay3. By using mpeg1 layer3 encoding technology, producers can use a compression rate of about 12: 1 to record 16 kHz bandwidth lossy Music signals. However, it is slightly different from the CD source. The human hearing system has excellent performance, and its dynamic range exceeds 96 dB. You can either hear a small voice like a button falling onto the ground or hear the powerful roar of the Boeing 747. But when we stand at the airport and listen to the roar of Boeing 747, can you tell the sound of Button falling onto the ground? Impossible. The human hearing system adapts to the dynamic changes of sound. After studying the adaptability and shielding characteristics of sound quality, people come up with a theory that is very useful for sound compression. People have long known that this feature is used to reduce the noise of tape recordings (noisy when there is no music, and it is not easy to hear when the music signal is very high ). When the sound is strong, the shielding effect is generated. The noise or small signal sound under the threshold curve cannot be heard by human ears. When a strong signal appears, more signals are allowed. At this time, the added small signal data (using useless bits to carry more information) can be compressed to a certain extent. In general, the MP3 compressed file changes the original sound to the frequency field through FFT (Fast Fourier transformation), and then uses a certain algorithm to determine the frequency of the sound that can carry more information. During restoration, the decoder only needs to be transformed back from the frequency domain.

Iii. Entire MP3 file structure:

MP3 files are divided into three parts: tag_v2 (ID3v2), frame, tag_v1 (id3v1)

ID3v2

Contains the author, composing, album, and other information. The length is not fixed, and the information of id3v1 is extended.

Frame

.

.

.

Frame

The number of frames is determined by the file size and frame length.

The length of each frame may be unfixed or fixed, determined by bitrate

Each frame is divided into two parts: the frame header and the data entity.

The frame header records information such as the bit rate, sampling rate, and version of MP3. Each frame is independent of each other.

Id3v1

Contains the author, composing, and album information. The length is 128 bytes.

Iv. MPEG audio frame format

An MPEG audio file is composed of many smaller parts called frames. Generally, frames are independent components. Each frame has its own header and Audio Information. No file header. Therefore, we can cut any part of the mpeg file and play it normally (of course, it must be split to the end of the frame although many programs will handle the error header ). It is not 100% correct in layeriii. This is because in the data organization in the MPEG-1LayerIII file, frames are often associated and cannot be cropped as casually.
When you want to read the information of an mpeg file, it is usually enough to find the first frame, read its header information, and then assume that other frames are the same. But this is not all. The Bit Rate Change of the mpeg file uses the so-called bit transformation, that is, the bit rate of each frame varies according to the specific content. This method will apply a lower baud rate to frames that do not reduce the sound quality. This allows better compression quality while ensuring high-quality sound quality.
The frame header consists of the first 4 bytes (32 bits) of each frame. The first 11 bits (or the first 12 bits) of the frame header are always fixed as "Frame Synchronization ". Therefore, you can find the first frame synchronization in the entire file (that is, you must find a 255 value followed by three to four bytes with the highest position 1 .) Then read the entire header to check whether the value is correct. The specific meaning of each bit in the header should verify the validity of the value. You can read the following table. If the value is defined as retained, it is invalid, if the header is damaged or the value is not allowed, the header is damaged. Remember, this is not enough. Frame Synchronization can be widely used in many binary files. In addition, the mpeg file may contain garbage at the beginning, so we must check two or more frames to determine that the file we are currently reading is an mpeg file.
The frame may also have CRC verification. If yes, the CRC check is immediately followed by the frame header and the length is 16 bits. The CRC check is followed by the audio data. Calculate the frame length. If you need to read other headers or calculate the CRC value of the frame, you can use it to compare the frames read from the file. It is a good method to verify the validity of the mpeg header.

(Reference: http://mpgedit.org/mpgedit/index.html)

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.