Detailed description of input table (import table) (PE description 07)

Source: Internet
Author: User
The shortcut is not to make the detour straight, but to help you block the road!
The detour is directly proportional to the growth speed! Don't be afraid to take a detour. the detour will help you know more and will eventually meet at the end!
The road will introduce you into the endless abyss, and the deeper you go ......


Before you begin to explain the concept of an input table (import table), you can use a few short sentences to summarize the content we have learned and further improve our thinking, note!

First, we know that the data in the PE file is loaded into memory and divided into many blocks (sections) based on different page attributes, and the block tables (section tables) are used to describe these blocks. Here we need to pay attention to the following issues: Data in a block is only put together because of the same attributes, and is not necessarily the same purpose. For example, the input and output tables to be discussed may be put in the same block together with read-only constants, because their attributes are readable and writable.

Second, because data of different purposes may be placed in the same block, it cannot be determined or located only by block tables. What should I do? By the way, the data directory table in the image_optional_deader32 structure in the PE File Header points out their location, 15 types of data that can be located in a data directory table include input table, output table, resource, relocation table, and TLS. (Data Directory table, do not think it is not important to appear on the front ~)

In this lesson, we are talking about the input table. Why do we need the input table? Because what we get from the data directory table is only the RVA of specified data and the size of data blocks. Obviously, the data organization methods (structures) in different data blocks are obviously different, for example, the data in the input table and resource data block is completely different from each other. Therefore, to gain a deeper understanding of PE files, we must understand how the data is organized and how the system processes and calls them.


Input Function

In code analysis or programming, we often encounter the concept of "Import functions. Here we will explain that the input function is called by the program, but its Execution Code is not in the program's function. The Code of these functions is in the relevant DLL file, only the relevant function information (such as the function name and DLL file name) can be retained in the caller program. For PE files on the disk, it cannot know the address of these Input Functions in the memory. Only after the PE file is loaded into the memory can the Windows loader load the related DLL, and associate the instruction that calls the input function with the actual address of the function. This is the concept of "Dynamic Link. Dynamic Links are completed through the "input table" defined in the PE file. The input table stores the function name and the DLL name where it resides.


Instance rehearsal (this will be demonstrated in the video. Only available here)

The main reason for this is the rehearsal, because we have not made an anatomy of the input table, so we will show you how to try it ~ Otherwise, learning is always boring and depressing ~ (The demo programs and tools involved in the demo can be downloaded from the corresponding posts in the decryption courseware series and the source code Download Area. You can also watch the video demonstration: decryption video series lectures)


The above is the little frog from this experiment ~ A simple program, double-click the program to display only one dialog window, and then end ~ Experiment with small programs, we try to delete the internal structure, debugging is more convenient. The purpose of this experience is to try to find the address of MessageBox in memory based on the knowledge we have learned.

Note: MessageBox is a function from the dynamic link library of user32.dll. We observe how the hello.exe test item locates and calls the MessageBox function in a different location through static decompilation Analysis of the PE file.
(MessageBox has two versions, one is messageboxa and the other is messageboxw with the ASCII code format and Unicode ~ respectively ~ History story ~)

Experience:
1. We decompile hello.exe using w32dam, once called "tu longdao,



We can see that this program only has two import modules to import several functions from two dynamic link libraries (user32.dll and kernel32.dll). We can also clearly see that, the messageboxa to be tracked is in user32.dll. Here, the program automatically locates its virtual address 2a2dc, but we do not need to use this, because we said that this time we are exploring, And we are focusing on human resources ......

We can find the location of the MessageBox function code through the w32dasm search function, and try to view its Assembly jump ~
Why? Let's start with calling a subroutine through assembly ~ In short, the assembly language uses the call and RET combinations to call subprograms, I have learned how to use my shoes from "getting started with zero basics and learning Assembly Language". Do you still remember the cartoon demonstration of turtle ?! ......



Did you see it? Push XXXX push XXXX and then call XXXX ......
That's right. This is a standard form of calling a function. Every push is actually putting the parameters required by the function to be called into the stack. Why is it necessary to go into the stack? This will start with the origin of the earth ...... Let me put it simply: the invention of stack makes the appearance of functions and subprograms possible! For local variables and parameters, the stack features are the most suitable ~ Don't understand, don't worry. This is the scope of the assembly language and compilation principles. We will naturally encounter this in the future. Then we can solve it in detail.

int MessageBox( HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType // style of message box );
The defined MessageBox function has four parameters. Therefore, we can call the MessageBox function after pushing the parameters to the stack in stdcall mode for four times. Well, since it's call our target function, we can see through disassembly that its address is [0042a2ac]. Is it that simple? 42a2ac is the address of the target function?

Let's pull down the program and try to find the address of 42a2ac ~
But ...... ......


We found the tragedy that the program has not reached 42a2ac, and it has ended in 421ff8 !! What is the situation?
Maybe ...... Maybe ...... Maybe ...... We cannot directly give answers to the input table without further details ......

Okay, let's ask a question. What is this address? Offset address or virtual address?

Well, that's right. This is a va. We said in the last lesson that this va can be converted into a method that is stored in the actual physical memory. Specifically, this VA is compared with the VA addresses of each block of the program one by one. Because the PE header file records the VA addresses of each block, it also records the actual physical addresses, therefore, we can determine the block in which the VA is located and find the difference with the block Va to find the actual physical address of the VA. Now that the road is straight forward, let's try to convert it to a physical address ?!

Okay, I know that I have spoken and tongue twisters. Let's demonstrate it in an example. Otherwise, you will be sprayed ............



We can see that our 42a2ac address loads the file between 2a000 and 2b000. (modify the program VC and set the image base address to 400000 ~), Therefore, we can locate the address to the. idata block where the program is located. The starting address of VA in this block is 42a000, so 42a2ac-42a000 = 2ac, raw data offs tells us that the physical address of this block is 28000, therefore, the physical offset address of the VA of 42a2ac is 28000 + 2ac = 282ac.

Let's use ue to open it and check what exists on the 282ac offset address ......



The address 282ac stores the dca20200 data and translates it into ASCII codes ~
But if we read dca20200 as a DWORD data type, we will get the data 0002a2dc (remember the big end and small end)
Slow down, isn't it familiar? It starts with 2a ***. Is it similar to our previous address? Well, let's try converting it to an offset address based on the method just now. The converted offset address is 282dc. Let's take a look at what is mysterious in 282dc?



Haha, are you seeing a miracle? Starting from the 282dc address, the value of the ASCII code is messageboxa. user32.dll.
How about it? A mysterious sense of accomplishment ~

But we still cannot solve the root problem: what is the MessageBox address? In fact, we cannot solve this problem here because we lack in-depth understanding of the input table, so we look forward to it. The next lesson will unveil the mystery of the input table ......

Detailed description of input table (import table) (PE description 07)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.