Binary Structure Analysis of Excel files

Source: Internet
Author: User
Tags hacking guide
Collation

My research on the structure of Excel files was just recently started. I'm just curious. I'm just a little curious about how to change the game. It seems that many people are also interested in the file format. It is a pity that I cannot find the same path that is sufficient and willing to share my experiences and learn together.
When one person walks alone, they naturally feel lonely, and the spark of interest will go off. In addition, I am lazy and busy at work, and it is hard to resist the temptation of the outside world, for example, entertainment, games, novels, etc. Please forgive me and say hello. If this article is too slow or cannot be completed, never scold me. Of course this is gossip.

In addition, because there is no formal information, since it is a research, the following content may be completely wrong. If you want to approve the approval, please take the bricks as needed!

Chapter 1 Preface
I. Previous studies

There are not many materials available for previous studies. Msdn online has a part, but unfortunately the organization is very fragmented. Microsoft also deleted important content. Some people say that the msdn library attached to MS office develop has been introduced, but I don't have this stuff, and I don't know what to talk about. If someone has these items, I wonder if it is convenient to send them to me. Thank you!

But I should write it as I know!

In 1996, German Martin Schwartz wrote an obscure "Hacking Guide". He studied the storage file of Word 6, which he called laola. He believes that laola is a file system that contains sub-directories and sub-files. Copy to excel, according to this idea, the workbook, or VBA in Excel should be a sub-file, Microsoft seems to call them ole2 storage, that is, an Excel file, contains a directory structure and several sub-files. Each sub-file is an ole2 storage. The concept of laola defined by Martin Schwartz is used below.
However, Martin Schwartz's greatest contribution is to write a package (similar to the class in VBA) in Perl 4, providing a method to read the laola file system. (I am using VBA to rewrite this package, and add a write module. I am busy and intermittently working, so I haven't finished writing it yet ).

An article titled guy boertje was published in "Excel Digest" in 1997. The content is a VBA that reads the User-Defined numeric format in the workbook. He used the method to directly search for the workbook BOF record (beginning of file, marked as hex 0809) in the Excel file and find the starting offset of the file. For such a VBA program, you can refer to my article (reading the background image of an Excel worksheet) and post it in the jar. Because we didn't know the concept of laola at the time, we were right when reading the image.

Ii. basic tools for analyzing the binary structure of Excel files
To simplify the process, we can use three basic tools. One is ultraedit 32. binary files can be edited in hexadecimal format. Another tool is winhex. The main feature of this tool is that it can perform binary comparison or directly edit the binary data in the machine memory. It is a powerful tool for Game and memory dump. Another tool is the calculator in windows. We can use it to convert Decimal-hexadecimal-to binary.

Well, with these tools, we can start.
In addition, let's talk about three basic knowledge:
1. digit representation:
Decimal: we use a general representation, such as 1234 ".
Hexadecimal: Hex 1234 or 0x1234 ".
Binary: "bin 1001 ".

2. Storage sequence of numbers
Numbers are stored sequentially on Intel machines and binary files. For example, if hex 0809 is used, the values are 0x09 0x08. When the values are high, the values are low.

3. Basic Data Types
We mainly involve three data types: byte, word, and DWORD ), word and DWORD are equivalent to unsigned and unsigned long integers.

Chapter 2 laola Structure
In the second chapter, we will first study the laola structure. In the third chapter, we will learn the structure of the workbook, that is, the things posted by xiaog. In the fourth chapter, we will look at VBA, the fifth chapter is the laola concept in the Excel file, which is interesting. The sixth chapter is an example of the application tool, I want to classify some VBA programs for reading Excel, and VBA programs and C ++ programs in ADO mode into this chapter.
If you have no opinion, let's continue.

The first 512-byte block (0x200) in an Excel file is the initial block of laola, ranging from 0x00 to 0x1 ff. Let's take 80 bytes to learn.
0x00000 H: D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00
0x00010 H: 00 00 00 00 00 00 3E 00 03 00 Fe FF 09 00
0x00020 H: 06 00 00 00 00 00 00 00 00 00 00 01 00 00 00
0x00030 H: 19 00 00 00 00 00 00 10 00 00 Fe FF
0x00040 H: 00 00 00 00 Fe FF 00 00 00 00 00 18 00 00 00

The offset 00 starts with the laola identifier, "D0 CF 11 E0 A1 B1 1A E1", indicating that this is a laola file system.
The offset 0x18 is word, which may be a minor version,
The offset 0x1a is word, which may be a version number, usually 3.
Offset 0x 1e is word, indicating the block size, fixed to 9, is the power of 2 (2 ^ 9 is 512 bytes)
The offset 0x 20 indicates the size of a small file block. The value 2 ^ 6 indicates 64 bytes.

Let's talk about some concepts in laola.
1. Initial Block
Is the first 512 bytes in an Excel file. The data structure definition table of this block is listed at the end of this chapter. In each part of this chapter, we will gradually analyze the meaning of this initial block.

2. Large file block table
The large file block file table stores the index number of the file block, for example,
0x0003200: 01 00 00 00 02 00 00 03 00 00 00 05 00 00 00 00
0x0003210: Fe FF 06 00 00 00 07 00 00 00 Fe FF
0x0003220: 09 00 00 00 0a 00 00 00 0b 00 00 00 0C 00 00 00
0x0003230: 0d 00 00 00 0e 00 00 00 0f 00 00 00 Fe FF
0x0003240: 11 00 00 00 00 12 00 00 00 13 00 00 00 14 00 00 00
0x0003250: 15 00 00 00 00 16 00 00 00 00 00 Fe FF
0x0003260: FD FF

The index number starts from 0. a dword indicates an index.
Index value range:
0 xfffffffd: Special Block
0 xfffffffe: Index chain end mark
0 xffffffff: unused
0... Total number of file blocks: The next element in the index chain

Large file block file tables are arranged by index chain. Through the index chain, we can organize several large file blocks that are not continuous in the Excel file.
For example, we first create an empty large file Block List "{}". We take DWORD from the offset 0x0003200 + 0*4 (0 is the index) and the value is 1, therefore, put 0 and 1 in the large file block list, which is {0, 1}. The offset 0x0003200 + 1*4 is 2, SO 2 is placed in the list, it is {0, 1, 2}. Then, return the value to 0x0003200 + 2*4 at the offset, which is 3. In this way, until the offset is 0x0003200 + 7*4, the value 0 xfffffffe indicates that the index chain ends, and the large file block list is already {0, 1, 2, 3, 5, 6, 7 }. In this way, we have obtained several large file blocks that should be continuously discharged.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.