This article is mainly about learning a wave of MP3 format fuzz knowledge. The directory is as follows
0x0. Composition of the MP3 format
0x0. Composition of the MP3 format
MP3 is a popular name, called MPEG1 Layer-3. MP3 is a three-stage structure, which is composed of ID3v2, Frame and ID3v1 in turn. Where frame is the frames, this is the basic unit of the MP3 format, each frame constitutes the entire MP3 file. First look at the simplest structure of the ID3V1, the size of the structure is fixed, a total of 128 bytes. Visible are some MP3 related information such as author, title and so on.
typedefstructtagid3v1{Charheader[3];//fixed to "TAG" Chartitle[ -];//title Charartist[ -];//author Charalbum[ -];//albums Charyear[4];//Vintage Charcomment[ -];//Notes CharReserve//reserved CharTrack//Audio Tracks CharGenre;//type}id3v1,*pid3v1;
Next is the frame frame structure where the data is stored. where Frameheader altogether 4 bytes A total of 32 bits, the 32 bits are used as a sign, that is, each bit corresponds to a different meaning. Where the CRC domain may not exist, this is determined by the bit in the Frameheader. And the Frameheader in the frame header record the size of the maindata.
struct _frame{ char frameheader[4]; Each frame has a 4-byte frame head char crc[2]; // The 16th byte of the frame header determines if there is a CRC Char Maindata[?]; // Frame header decision size } Frame,*pframe;
Frameheader defined as aaaaaaaaaaa BB CC D eeee FF G H II JJ K L MM, different letters representing different meanings of domains. The following are the meanings of these fields.
A- OneFrame Synchronization B-2Represents a version, with 1,2、2. 53 Kinds of C-2represents a layer version, that is, MP1, MP2, Mp3d-1decide whether there is Crce-4indicates bit rate F-2indicates sample rate G-1represents the Fill bit H-1Keep I-2represents the channel J-2extended mode K for stereo-1do you have any copyright information L-1whether it is the original M-2Emphasize
As I've said before, the size of frame is determined by Frameheader.
Below is a look at the ID3v2 tag, the ID3v2 tag is the first label of the MP3 file, consisting of a label header and several label frames. Where the label header looks like this
typedef struct tagid3v2{ char header[3 ]; // fixed to "ID3" char Ver; // record version number, 3 means id3v2.3 char Revision; // sub-version, here is 0 char Flag; // Define the identity bit char Size[4 ]; // Defines the label size, including these 10 bytes } Id3v2,*pid3v2; The
size algorithm is the 7 bits of each byte taken to form a 28-bit number, as a value of size.
struct header{ char frameid[4]; frame identification, indicating the meaning of this frame char size[4]; // the size of the frame content, not including the frame header, must not be less than 1 Char flags[2]; // Store Identity } HEADER;
Size is a 32-bit value
The size of the ID3v2 is calculated only for the first bit of each byte, as the total size of the ID3v2 block.
As shown above each ID3v2 belongs to the label also has other label head, each small label header represents a certain meaning of the label, meaning determined by Frameid, listed below are some common frameid. The calculation of size is calculated as a DWORD of 4 bytes. More values for flag and Frameid can be seen in the PDF of the attachment.
1 TIT2 title 2 TPE1 Author 3 Talb album 4 Trck Audio Tracks 5 Tyer years 6 TCON Type 7 COMM notes
0x1. Vulnerable points in the MP3 format
The first is to introduce the file format of the MP3 file, where it is necessary to MP3 file format before the fuzzing to figure out is the MP3 of the vulnerability of the file where, that is, where the problem is likely to occur, to figure out this for the Fuzzing sample generation also has a great advantage. And one problem is the MP3 frame size problem. We already know that a MP3 is made up of many frames, but note that there is no size field in the frame header because the size of a frame is calculated from the bit rate and sample rate, and the bitrate and adoption rate are stored in the frame header as E, F, respectively. The calculation formula is as follows.
frame Length (bytes) = Sample per frame/sample rate (HZ) * bit rate (bps)/8 + Fill example: Span style= "color: #000000;" >LAYERIII bit rate 128000, sample rate 44100
There is also a simple formula (144* bit rate)/sample rate + fill bit
Note that the bitrate is in K, For example, 128 is the bit rate of 128k.
|
MPEG1 |
MPEG2 |
MPEG2.5 |
Layer1 |
384 |
384 |
384 |
Layer2 |
1152 |
1152 |
1152 |
Layer3 |
1152 |
576 |
576 |
"Per frame sampling tables"
So there's a problem, do we want to do random generation of E and F? If the E, F are randomly generated then the frame size will be confused. But if not randomly generated, then these fields will be fixed. I'm not sure how to do that.
The following steps are provided to resolve the MP3 file operation
Parsing method when you want to read the information of the MPEG file, parse the first three bytes, determine whether there is a ID3v2 tag, the total size of the ID3v2 tag is calculated according to the above method, so that the first frame of the audio data frame is found, the header information is read, the bit rate, the sample rate, the MPEG version number, Layer description number and other information, based on the method provided above to calculate
The length of each frame and the duration of each frame are the same for the other frames of the fixed bit rate, which means that the first frame is resolved to achieve the goal. But this is not all the case. The variable bitrate MPEG file uses the so-called bit transform, which means that the bit rate of each frame varies according to the specific content. This will require you to parse every frame.
This parsing step is seen from the internet and I don't know what decoders are decoded in this way, but we can guess some information based on the operation provided above. First of all, the size of the IDV2 label field can not be randomly generated, because if you randomly generate this domain, then there is no way to locate the first frame, then this MP3 file is the entire mess, for this situation is estimated that the player will directly say that the file corruption is not the effect of testing. Similar to this is the data frame bit rate and sampling rate can not be arbitrarily generated, because these two values are used to calculate the size of the data frame, if randomly generated the size of the frame will be calculated error, according to the above resolution method can be resolved by the size of the move down, if the size of a frame is not calculated, Then there will be problems with the whole MP3 resolution. It is also impossible to achieve the test results. But what if it's a dynamic build? I think it's possible, but the logic of the data is too complicated. Especially the egg hurts. There is a bitwise operation, this directly with the label seems difficult to achieve, it should be necessary to implement embedded Pyhon script. I don't know what to do at the moment, so Sir, it's a fixed-length content.
MP3 Fuzz Study