Knowledge Overview
The binary format of WPS series dps wps et in Kingsoft is the same as Microsoft's office series PPT Word Excel. OpenOffice should also be compatible with Microsoft's binary format. The reason for this is that Microsoft is the boss. However, with anti-monopoly protection, Microsoft still wants to disclose its binary format to other vendors. Start with the question.
You must know the following before performing binary Detection.
1. Warehouse (there is a very important root warehouse root storage)
2. Standard stream
3. Storage of short streams and short streams
4. Sector
5. Short sector
6. Main sector configuration table, sector configuration table, and short sector configuration table
The simple relationship is organized as follows:
1. The repository contains a stream (either a standard stream or a short stream), just as there are files in drive D.
2. A repository can contain a repository, just as a d disk can contain folders.
3. The minimum unit of storage is sector. A stream is a combination of several sectors, and the slice configuration table specifies the relationship between these combinations.
4. A short stream is a stream smaller than the standard stream size, while a short stream is also a combination of some sectors, but these sectors can be divided into short sector units.
5. Main sector configuration table, which specifies the sectors used to store the sector configuration table
6. Both the slice configuration table and the short slice configuration table are used to specify the slice chain corresponding to a stream.
If the above relationship is unclear, see explain.
Encryption Detection
The first 512 bytes of the composite document are well formatted. Here we can find the sector size, the root warehouse entry address, and the primary sector configuration table.
Before entering the analysis, we first need to find the root warehouse's portal address (that is, the directory's portal address)
Each directory is 128 bytes in size and starts with 64 bytes to describe the directory name.
1. Word file
First, find the "worddocument" directory in the directory, and the corresponding binary is:
- U_bits_8 word [23] = {0x57,0x00, 0x6f, 0 x, 0 x, 0 x, 0 x, 0 x, 0 x, 0x00, 0x6f, 0 x, 0 x, 0 x, 0 x, 0x00, 0x6d, 0 x, 0 x, 0x00, 0x6e, 0 x, 0x74}
Note: The size issues here
View the stream entry sector of the directory. Note that if the stream size is greater than or equal to the standard stream size, query the slice chain in the slice configuration table (SAT, if it is smaller than the size of the standard stream, the slice chain is queried in the short slice configuration table (SSAT. After the specified slice is located, perform a certain offset to determine. The simple code is as follows:
- // Judge whether. DOC file is encrypted or not.
- Int is_encrypted_doc (char * file_path)
- {
- Ifstream ifs (file_path, ios_base: Binary );
- If (IFS)
- {
- Unsigned int stream_address;
- Sid_32 stream_sector;
- Int stream_length;
- Bits_8 TMP [20];
- Header header (IFS); // read msat and sat chain
- Directoryentry d_entry (& header );
- If (! D_entry.get_stream_address ("worddocument", stream_address, stream_sector, stream_length ))
- {
- IFS. Close ();
- Return file_error;
- }
- IFS. seekg (stream_address );
- IFS. Read (& TMP [0], 20 );
- IFS. Close ();
- If (TMP [11] & 0x01) return file_encrypted;
- Return file_common;
- }
- Else
- Return file_no_found;
- }
2. Excel files
The principle of Excel files and Word files is basically the same. It is used to find the "workbook" directory. The configuration in Excel is in the format of "configuration Name Length content". Therefore, if you want to find an encrypted field, you must read it from the front until the encrypted configuration field is read-only, the simple code is as follows:
- // Judge whether. xls file is encrypted or not.
- Int is_encrypted_xls (char * file_path)
- {
- Ifstream ifs (file_path, ios_base: Binary );
- If (IFS)
- {
- Unsigned int stream_address;
- Sid_32 stream_sector;
- Int stream_length;
- Bits_8 TMP [64];
- Header header (IFS); // read msat and sat chain
- Directoryentry d_entry (& header );
- If (! D_entry.get_stream_address ("workbook", stream_address, stream_sector, stream_length ))
- {
- IFS. Close ();
- Return file_error;
- }
- IFS. seekg (stream_address );
- IFS. Read (TMP, 64 );
- Unsigned int COUNT = 0;
- While (count + 4 <64)
- {
- Bits_16 flag = convert_chars_to_bits (TMP [count], TMP [count + 1]);
- If (flag! = Filepass)
- {
- Flag = convert_chars_to_bits (TMP [count + 2], TMP [count + 3]);
- Count + = Flag + 4;
- }
- Else
- Return file_encrypted;
- }
- Return file_common;
- }
- Else
- Return file_no_found;
- }
3.ppt files
The setting of the pptfile is annoying. In powerpoint2003, its encrypted field value is 0xf3d1c4df, indicating encryption, but powerpoint2002, indicating that encryption is not performed. This is depressing. Later I found a work und, that is, the field in the encrypted document (0x0ff50000 --> rt_usereditatom .) the values are different, mainly because the encrypted document has more encrypted information. We detect this field to complete encryption detection. Because the field name occupies four bytes, therefore, you can directly search for the binary file. After the binary file is searched, check the next byte. If it is 0x1c, it is common. If it is 0x20, it is encrypted.
Note: All Rights Reserved. If you have any post, please indicate the source.