This is my summary of writing a simple compression shell. During this period, I read a lot of materials, learned a lot of code, made a lot of notes, had some care and experience, and hoped to share and learn with you. Once again, I would like to thank all our predecessors for their numerous materials.
The article is divided into four parts, from the most basic concepts, to the final learning practices of code, are in this series. Articles:
1. Data and commands, as well as pre-loading concepts
2. Structure parsing of pe files
3. pe File load Process
4. Shell Processing
I hope you can give me some advice. I hope to provide some reference for my friends who want to learn how to write and compress the shell, even if it is a little bit, they will be satisfied :)
1. Data and instructions
In the Von noiman computer system, data and commands are essentially the same as chess pieces in the chess box and chess boards. Data is the same as instructions in a broad sense. At some point, it is difficult to tell what commands are and what data is. Using an analogy in Wang Shuang's assembly language, the data is a piece in the board. The chess pieces are arranged one by one according to the rules of the board. After the game, all the chess pieces are packed in the chess box. In different cases, the status of the pawns in the game and after the game can correspond to the status of data and commands. During the competition, according to the rules of the competition, the chess pieces should be placed in the grid, and the grid size is available. One grid can only be placed one, and cannot be superimposed. When the game is not executed, the defender places the pawns in the box. The pawns are always pawns, and the difference between them is the rules of the game. Just like data and commands, they are essentially the same. When stored on a disk, they are loaded into the memory and run according to the disk storage rules, according to the Memory loading rules.
So how can we see this rule? When there is no match, the pawns can follow the "Rules" in the chess box (you can pour them in at will, as long as there is a gap, you can plug in until the plug-in fails, I just installed it on another board). You can also press the "Rules" of the Board (one by one at the intersection of the Board line), but at best you can only put 361, because a board has only 361 intersections. If you buy another game and put 362nd pieces of chess, you have to take another board. Through different rules of playing chess, we can know when it is playing chess and when it is resting. As a result, the concept of ing is introduced.
Ing:
When we understand the pe loading process, we assume that we need to load the memory to compete. Any game has its own rules. In this way, the chess pieces cannot be as compact as they are loaded into the chess box. They must correspond to one grid according to the grid of the checker. Similarly, the program is loaded from the hard disk to the memory. The system specifies that the program must be aligned according to a certain size during loading, and the size is different from the size of the hard disk. This is the root of the ing. As you can see, the size of the program in the hard disk and memory is different. During this period, in order to meet the Memory loading rules, you must fill in the places that are not originally available. In fact, in the end, everything needs to be done according to rules :)
For example, if I want to load a program into the memory, the minimum unit for loading is 1000 bytes. I only have 1 byte and also need 1000 bytes, because a minimum unit is 1000. 1001 bytes requires 2000 bytes of space, because a minimum unit is 1000. The remaining 1 byte must use 1000 bytes of the next minimum unit, although it only contains 1 byte.
Return to the graph. Here, the alignment size on the disk is 200 h, and the minimum alignment unit for loading data in the disk is 200 h, while the alignment in the memory is 1000 h, that is to say, the minimum alignment unit for data loading in the memory is 1000 h.
Since the disk file is mapped to the memory, an image is generated. We call it a memory image. The object corresponding to the image here is a disk file. We cannot say that it is the original copy. Because it is loaded into the memory, it must follow the loading rules of the memory. Different loading rules lead to the same content and different shapes and sizes of images.
In this way, although the disk file entity and memory image are derived from different rules, the granularity of loading alignment (that is, the minimum alignment unit of loading data) is different, the "image" is different.
PE loading preparation
To load a program from the disk to the memory, PE loader must know where to load and where to load it. If you write a PE file, you need to tell the loader how to load it. This is the root of the information provided by all parts of the pe file. Just like a map guide for loader, it tells loader the PE information one by one, locates the files in the disk, and loads them into the memory.
Aside from the specific data, many structures in the PE Structure notify the loader how to load and where to load it. So, when you see a lot of structure in the PE Structure, don't worry about it. Think about it for a moment, it's not just a bunch of addresses :)
In addition, some information in the PE is not required during loader, and some are only required for verification. PE files can be loaded normally as long as they are in the form of loader and within the fault tolerance range of loader :).
It is worth noting that, due to the different loading granularity, PE Loader will fill in 0 for extra spare loading addresses.
ImageBase, RVA, and FileOffset
If a pe is stored in the memory, it cannot give you an absolute address. Because loader cannot determine whether this address is available or can be loaded. Under such a premise, it is possible to provide loader with many offsets (RVA) relative to the IMAGEBASE. According to the IMAGEBASE loaded by PE ), all absolute address VA (IMAGEBASE + RVA) to be loaded can be calculated based on this offset.
The terms RVA and FileOffset are the offset relative to the memory base address and the offset relative to the disk base address. There are two offsets based on different loading rules.
The addressing modes combined with IMAGEBASE, RVA, and FileOffset make loader easier. You only need to get IMAGEBASE when loading the file, and you can press RVA and offset in the internal structure of PE.
Conversion of RVA and FileOffset
The memory offset is different from the disk offset result. The most fundamental reason is that the alignment granularity is different. To achieve conversion, you must transition to a unified granularity standard. Generally, you must first determine the region of the RVA to be converted, and then unify the granularity.
Back to the figure: if we require a disk offset of H, we find that the address is in. text Segment, and. the absolute loading address of the memory image starting from text is 401000 h, and IMAGEBASE is 4000000 h. The relative loading address is 1010 h. Then subtract the offset 1000 h after. text is aligned according to the memory alignment granularity, And the offset 410 h is obtained by adding the. text disk offset address.
For example, we require a disk offset of to 30 h. In. data, the absolute address for loading the memory image starting from. data is 403000 h, and IMAGEBASE is 4000000 h. The relative loading address is 3030 h. The offset 3000 h after the. data is aligned according to the memory alignment granularity, And the 410 h is obtained by adding the. data disk offset address.
In this way, we can unify the General conversion formula:
FileOffset = VA-IMAGEBASE-memory start offset of the node where the address is located + disk start offset of the node where the address is located
There is also a version of the conversion formula in many parsing tools:
K = memory start offset of the node where the address is located-start offset of the disk where the address is located
FileOffset = VA-IMAGEBASE-k
In fact, the meaning is the same. I personally think that the first version is more intuitive and Ren Jun chooses :)
Microsoft provides an ImageRvaToVa function for this purpose. Note that there is a different name here, that is, the following VA refers to the offset in the file. Let's take a look at the description of this function on MSDN.
Reference:
LPVOID ImageRvaToVa (
IN PIMAGE_NT_HEADERS NtHeaders, // NT Header
In lpvoid Base, // MapViewOfFile load the Base address of the Disk
In dword Rva, // Rva to be transferred
In out PIMAGE_SECTION_HEADER * LastRvaSection // The last section address, which can be set to NULL.
);
Microsoft has a bug in this function, that is, some information will be located in the header when some shells are deformed PE. At this time, the Offset obtained by the function will fail. Of course, this is all for non-normal PE files, according to the PE specifications are not wrong :)
Based on the algorithm of the first version, we can write the following conversion function RVAToOffset:
Ps: if you do not know the PE Structure, refer to the next article and check it again :)
Reference:
// ImageBase is the base address for loading, VirtualAddress is the VA to be converted, and the function returns the FileOffset request
DWORD RVAToOffset (LPVOID ImageBase, DWORD VirtualAddress)
{
PIMAGE_DOS_HEADER pDH;
PIMAGE_NT_HEADERS pNTH;
PIMAGE_SECTION_HEADER pSH;
Int NumOfSections;
// Find the offset address of the PE Header in the file www.2cto.com
PDH = (IMAGE_DOS_HEADER *) ImageBase;
PNTH = (IMAGE_NT_HEADERS *) (LPBYTE) ImageBase + pDH-> e_lfanew );
// Obtain the number of segments in the PE File
NumOfSections = pNTH-> FileHeader. NumberOfSections;
// Variable all sections to determine whether the input virtual address falls into the memory address space of the Section
For (int I = 0; I <NumOfSections; I ++)
{
// Obtain the section header information
PSH = (IMAGE_SECTION_HEADER *) (BYTE *) ImageBase + pDH-> e_lfanew + sizeof (IMAGE_NT_HEADERS) + I;
// Compare whether the virtual address is in a section
If (VirtualAddress> = pSH-> VirtualAddress & VirtualAddress <pSH-> VirtualAddress + pSH-> SizeOfRawData)
{
// The memory address to be retrieved is in the space of the Section
// VA-IMAGEBASE-memory start offset of the node where the address is located
DWORD VA_in_Section_RAV = VirtualAddress-pSH-> VirtualAddress;
// VA_in_SectionRAV + the start offset of the disk in which the address is located is converted to the file offset address.
DWORD FileOffset = pSH-> PointerToRawData + VA_in_SectionRAV;
// Return
Return FileOffset;
} // When VA is in the node header, VirtualAddress is returned directly,
Else if (VirtualAddress <pSH-> VirtualAddress & VirtualAddress> ImageBase)
{// Add a judgment here. If you want to locate RVA in the header, determine whether it has corresponding data in the disk.
If (VirtualAddress <pSH-> PointerToRawData)
Return VirtualAddress;
// Otherwise there is no corresponding data and it cannot be located
Else
Return FALSE;
}
}
Return FALSE;
}
Note that when obtaining SizeOfRawData and PointerToRawData of some PES, some deformed PE files have their own values, but they will have their own rules when loading the loader, therefore, when judging SizeOfRawData and PointerToRawData correctly, we recommend that you use the method described before linxer:
Reference:
SectionAlignmentMask = SectionAlignment-1;
Filealignmentmasks = 0x200-1;
SizeOfRawData = (SizeOfRawData + FileAlignmentMask) & (0 xffffffff ^ FileAlignmentMask );
PointerToRawData = PointerToRawData & (0 xffffffff ^ FileAlignmentMask );
In this way, you can use the above algorithm after obtaining the same offset as loader :)
Thanks to the reminder on the third floor, the deformation PE address translation in the attached TEST1 and TEST2 has been added to the judgment, some RVA in the deformation PE positioning are processed in the header and get the correct SizeOfRawData and PointerToRawData. If you have a better solution, please contact us :)
Summary: This article mainly introduces several important concepts throughout the PE file, such as RVA and Offset, and the conversion between them, and provides the solution code. I hope it will be helpful for beginners and try to be easy to understand. It may be a bit lengthy. Thank you for your patience. :) but the level is limited. Please give me more corrections.
Author's little heart