Two ways to load DLLs (pend) + delayload

Source: Internet
Author: User

Having seen the invocation example of a dynamic library, I decided to do it:
DLL's external interface declaration header file, Mydll.h:

Mydll.h#include <stdio.h> #include <stdlib.h> #include "Public.h" #define  Dll_export/*extern "C" */_ _declspec (dllexport)//Export # define  cust_api_stdcall//standard call dll_export void Cust_api displayversion (TCHAR *info);// Show version Dll_export int cust_api Calc (int ia,int ib)//dll_export int cust_api meticalc (int ia,int IB)//new Add interface//mydll.cpp# Include "MyDll.h" void Cust_api displayversion (TCHAR *info) {wcscpy_s (version), version);//#define Version  ver 1.0return;} int CUST_API Calc (int ia,int ib) {return ia+ib;} int CUST_API meticalc (int ia,int ib) {return ia*ib;}


After compiling, generate DllTest.lib and DllTest.dll
The first method: static invocation
Understanding: Lib Describes DLL information and function entry addresses, which are loaded into the executable program at compile time.
If the DLL add new API interface, the new interface in use, you must update the LIB to use, otherwise you will not find the address of the new interface function, thus, Lib contains the description of the DLL interface description information.

//dlltest.h#include <iostream> #include <windows.h>using namespace std;# pragma comment (lib, "... \\ApDll\\DllTest.lib ")//Load Lib Library # define Dll_export/*extern" C "*/__declspec (dllexport)//Export # define cust_api_stdcall// Standard call Dll_export void Cust_api displayversion (TCHAR *info),//dll display version function Dll_export int CUST_API Calc (int ia,int ib);D Ll_ EXPORT int Cust_api meticalc (int ia,int ib), int _tmain (int argc, _tchar* argv[]) {TCHAR version[50] = {0};int a = 10,b=12; DisplayVersion (Version); Wcout<<version<<endl;wcout<<calc (A, b) <<endl;} Result:ver 1.0 


Second method: Dynamic load
First, you define a pointer to a function that points to a function type that is provided externally in the dynamic library. The
function pointer definition is understood:
typedef void (_stdcall *funname) (Paramtypes 1,paramtypes 2);//defines a pointer to a call type of _stdcall, the number of arguments, Type such as Paramtypes1,paramtypes 2, the return value is a function pointer of type void
Note here that when defining a function pointer, return value (* pName) (parameter) , 3 parts;
Then, LoadLibrary (Path); Path is the location of the DLL, either the system directory or the other specified directory. A Hmodel module handle is returned after a successful load. The
then uses the module handle to get the address of the corresponding function.
When a function pointer is called, unlike a normal pointer, it does not require indirection, "*";
Use the DLL to remember Releaselibrary ();  

#include <iostream> #include <windows.h>using namespace std;typedef void (Cust_api *disver) (TCHAR *info); typedef int  (CUST_API *calcoprt) (int ia,int ib), int _tmain (int argc, _tchar* argv[]) {TCHAR version[50] = {0};int  A = 10,b=12; Hmodule hmodle = LoadLibrary (_t (". \\ApDll\\DllTest.dll "));//dynamic load dllif (NULL = = hmodle) {wcout<<" Load DLL failed! " <<endl;return-1;} Disver displayver = (disver):: GetProcAddress (Hmodle, "displayversion");//Depending on the module address, the function name is obtained, and the function address DisplayVersion (Version ); Try{if (NULL = = displayver) {wcout<<_t ("Load  function error!") <<endl;} (Displayver) (Version);//Call function Wcout<<version<<endl with function pointer;} catch (...) {;} System ("pause"); return 0;}

Look at 10, times, rather than knock it out, a small dynamic library call, but also have to pay attention to.

The following are some of the most basic knowledge of delay load:

We know that the loading of DLLs has two basic methods: implicit loading and explicit loading. The so-called implicit loading is the method described in the previous article, which loads the DLL into memory space before entering the entry function through the input table of the PE. Explicit loading is the use of LoadLibrary and Getproaddress methods to load DLLs into process space when needed. Both of these methods are the most commonly used. What's the delay load?

A DLL that is designated as delay load will only actually load the process space if it is needed, that is, the DLL will not be loaded if no function in that DLL is called. And this loading process is done by LoadLibrary and GetProcAddress, of course, all of which are transparent to the programmer.

What are the benefits of doing this? There is no doubt that the program starts faster because many DLLs may not be available when they are started. Even some DLLs may not be used for the entire life cycle, so this DLL does not need to occupy any memory with delay load.

So how to use the delay load? You need to do two things:

1. At the beginning of the CPP plus #pragma comment (lib, "DelayImp.lib"), if you use VS can also add such an item in input. Will explain later.

2. At compile time add:/link/delayload:xxx.dll, if using vs just find the delay load in the project properties, plus the name of the DLL.

(Many online #pragma comment (linker, "/delayload:xxx.dll") were added to CPP, which proved to be impractical. Can only be used as a parameter on the command line)

Now let me use the existing knowledge to analyze what changes have been made after delay load:

1. The DLL-related input table must be gone. There is no doubt that the program will still be loaded with innocence when it starts.

2. What fields are required to record the address of the DLL's load address and function? Otherwise, each call will be loadlibrary+getprocaddress is not too smart?

3. Who will call loadlibrary+getprocaddress and populate these fields? It seems that you must embed the code when linking. Where does the embedded code come from? Remember the previous #pragma comment (lib, "DelayImp.lib")? By the way, here it is.

Then we use a simple example to analyze the whole process, and let me once again realize the truth: as programmers do not learn the assembly is really difficult. Before we do that, what do you think of some PE structures that are related to delay load? That's right! One of the Image_directory_entry_delay_import in data DIRECTORY is the maintenance of all information related to DELAY load.

Let's take a look at the corresponding IMGDELAYDESCR structure:

Size Members Describe
DWORD Grattrs The properties of this structure. The only flag currently defined is Dlattrrva, which indicates that the field in the structure should be considered RVA, not a virtual address
Rva Rvadllname The RVA that points to the name of an input DLL. This string is passed to LoadLibrary
Rva Rvahmod The RVA that points to a hmodule-sized memory location. When a delay-loaded DLL is loaded into memory, its module handle (hmodule) is saved in this place
Rva Rvaiat The RVA of the input Address table that points to this DLL, which is the same format as the regular IAT
Rva Rvaint The RVA of the input name table that points to this DLL, which is the same as the regular int table format
Rva Rvaboundiat The optional binding Iat RVA, which points to the binding copy of the DLL's input address table, is the same as the regular IAT table format, and currently, the IAT copy is not actually binding, but this feature may be added to future versions of the binding program
Rva Rvaunloadiat The RVA of an optional copy of the original IAT. It points to the unbound copy of the input Address table for this DLL. It is the same format as the regular IAT table, usually set to 0
DWORD Dwtimestamp Time/date stamp of the input DLL that is delayed loading, usually set to 0

Does it look familiar to you? By the way, with Image_import_descriptor a bit like: all have DLL Name,int,iat, before continuing to look to make sure that the Int/iat have a basic understanding, not yet very clear the suggestion first look at the previous piece of article.

To this end, the basic knowledge of the introduction, before giving us an example of the tool I used: ollydbg+stud_pe,ollydbg is a powerful assembly-level debugger, for the time being groping stage, online there are many tutorials, but to understand these tutorials themselves need a certain skill. Stud_pe is a very small PE analyzer, through which it can be quickly positioned to any part of the PE, is a tool for beginners PE. Much of what I do next comes from these two tools.

Our routines are as concise as possible, and it is best to cover all of the delay load knowledge:

Of course you also need a delayLoad.dll, this DLL only need to export two functions export1+export2, function parameters we also omitted, plus unnecessary parameters will only increase the complexity of the assembly code, no help to our analysis. As for how to create this delayLoad.dll I don't have to say it again, if you do not, we recommend that you fill the basic knowledge ha ~

Compile + Link: Cl sample.cpp /link/delayload:delayload.dll

Before we start studying the compendium, let's take a look at what's happening in Sample.exe IMGDELAYDESCR now:

Let's take a look at the most important items (how to get the corresponding content in the file from the above virtual address no longer described, see the previous article):

rvadllname:64 6C 4C 6F (2E) 6C 6C (Delayload.dll)

Rvaiat:34 10 40 00 19 10 40 00 (according to normal reasoning: These two items will be used to save the function address)

rvaint:72 7A 7A 00 00

7A 00:00-6F 72 74 31 (... EXPORT1)

7A (00:00) 6F 72 74 32 (... EXPORT2)
The rvaint is identical to the INT usage in IMPORT table: Each item points to a image_import_by_name, the first two bytes represent hint (refer to the PE structure for specific purposes, not useless for this purpose), followed by a direct representation of the function name, expressed in ASCII code.

Rvaiat is a bit different from the IAT in import table, and the IAT for import table points to the same content as int before the program is loaded, and this is not the case. In addition, the IAT of import table obtains the address and update of all imported functions before loading, and Rvaiat is updated when the function is called. So what is the value in Rvaiat before the program loads? Don't worry, you'll find out soon.

Next, open the Sample.exe with ollydbg, find the entry function, and start our manual t_t:

00031000. EBP
00031001 8BEC MOV Ebp,esp
00031003 FF15 209b0300 call DWORD PTR Ds:[39b20]//export1
00031009 FF15 209b0300 call DWORD PTR Ds:[39b20]//export1
0003100F FF15 249b0300 call DWORD PTR Ds:[39b24]//export2
00031015 33c0 XOR Eax,eax
00031017 5D POP EBP
00031018 C3 RETN
00031019 B8 249b0300 MOV eax,test.00039b24
0003101E E9 00000000 JMP test.00031023
00031023 PUSH ECX
00031024-PUSH EDX
00031025 PUSH EAX

00031026 1c7a0300 PUSH Test.00037a1c
0003102B E8 0E000000 Call test.0003103e
00031030 5A POP EDX
00031031 ECX POP
00031032 FFE0 JMP EAX
00031034 B8 209b0300 MOV eax,test.00039b20

00031039 E9 e5ffffff JMP test.00031023

0003103E//Don't care about the code here

Let's take the first export1 invocation as an example: Call DWORD PTR Ds:[39b20]

What is 39B20? 30000 is the load address (EXE will not be loaded on the 400000?) Why is it loaded into the 30000 position when debugging with ollydbg? Puzzled... After testing ollydbg each EXE load address will randomly change, then we need is actually 9b20, look up Rvaiat value is 9b20! That is the first item in Rvaiat. Our previous question was immediately solved, and we saw that the first data in Rvaiat was 401034 (1034), which means that the call was actually called 31034:

00051034 B8 209b0300 MOV eax,test.00039b20

00051039 E9 e5ffffff JMP test.00031023

In combination with the code in 00031023 we can already come up with the following conclusions:

1. An address is saved in each Rvaiat entry that is located in the code snippet and is jumped into the code snippet by call DWORD PTR ds:[xxx]. (XXX is an address of an item in Rvaiat)

2. The code snippet entered by the call DWORD PTR Ds:[xxx] has a uniform format:

1. Save the address of the Rvaiat item in EAX.

2. Jump to an address (31023 in this program)

3. Call a function in the address (3103E in this program, the code is temporarily not given, will be analyzed in detail later in the introduction), the function will complete all the work of delay load, and modify the corresponding item in Rvaiat to have the correct function address.

4. After calling this function, jmp EAX, this time the corresponding item in Rvaiat already has the correct function address.

Next we focus on the code in 3103E:

This is basically the case for the first call to a function in a DLL. The notes have been written very clearly, although there are still a lot of places not completely clear, but the core of the part has been at a glance. Just as I intend to continue to study the code is not understood, suddenly found that Microsoft provided this part of the source code t_t: (delayhlp.cpp)

In contrast to the assembly, let's see what new information we got:

1. The last parameter of raiseexception points to a delayloadinfo. You can get the appropriate information in the exception filter.

2. Have been very confused our 39b4c and finally know what to use! is a HOOK:__PFNDLINOTIFYHOOK2 that the system provides to us, we can define this function in our own code, which is called by the system at a particular time. At the same time the system also provides a hook:__pfndlifailurehook2, which corresponds to 39B48 in the assembly code. The purpose of these two functions is described later.

3. Do you remember a series of conditions around 000311DC jump? This is used to determine if there is a binding information, if everything is normal to use the binding address directly, do not need to getprocessadress. To ensure that this binding address is used correctly, a series of conditional judgments is required:

1. Rvaboundiat & Dwtimestamp are not 0

2. Image_nt_signature & Timestamp Same as & load address consistent with preferred load address

So where does the bound load address come from? You remember Rvaboundiat? This analysis of the assembly is simpler than CPP: MOV EBX,DWORD PTR ds:[ecx+eax], where EAX is the Rvabouldiat address, ECX is offset (the function holds the offset of the address to the first address of the Rvaiat)

4. If the DLL fails to load or the function address looks for a failure, the program does not crash, but throws an exception for the developer to handle.

5. If the DLL may be unload (using __FUNLOADDELAYLOADEDDLL2), then we need to prepare some data structures: New ULI (Pidd);

So far, basically, we've been very clear about how delay load works, so let's think about what happened the second time we called Export1. Call DWORD PTR Ds:[39b20] is still called, but at this point 39b20 has stored the correct EXPORT1 address, which can be used directly after the use of this function!

Again from the beginning of the review, there is no content is not introduced to:

1. __pfndlinotifyhook2 & __pfndlifailurehook2

2. Unload ...

3. How do I get the bindings working? Obviously there is no binding in this example.

The next work we introduce to the above three aspects:

__pfndlinotifyhook2 & __pfndlifailurehook2

The above example is clear, but then we'll combine the implementation of __delayLoadHelper2 to see what we can do with Delayhook custom:

1. dlistartprocessing: If you get the function address here, jump directly to the end of __delayLoadHelper2.

2. Before dlinotepreloadlibrary:loadlibrary. At this time we can find the address of the DLL ourselves and return, if return 0, call LoadLibrary by __delayLoadHelper2.

3. Dlinotepregetprocaddress: We can get the function address ourselves when we receive this flag. If 0 is returned, it is the responsibility of the __delayloadhelper2.

4. Dlinoteendprocessing: All operations are finished ready to be returned from __delayLoadHelper2.

The usage of __PFNDLIFAILUREHOOK2 is similar:

1. Dlifailloadlib: When __delayLoadHelper2 call LoadLibrary error. Again here we can continue to try to load this DLL or do some error handling.

2. Dlifailgetproc: __delayLoadHelper2 call GetProcAddress failed. Similarly.

UNLOAD

DLLs that are lazy-loaded by default do not have the Unload feature. What do you mean?

1. FreeLibrary cannot be used in any way. Because FreeLibrary does not clean up the function address. The next call to a function in the DLL can result in exception access.

2. Since there is no freelibrary, there is no way to unload. This is the case by default.

Of course, Microsoft will not be so silly, you can find a function named __FUnloadDelayLoadedDLL2 in the Delayhlp.cpp, is specifically used to unload delay-loaded DLL, but to add it to its own program requires a link switch:/ Delay:unload. If this switch is not set, then calling __FUnloadDelayLoadedDLL2 will not do anything. Besides, of course, add #include<delayimp.h>& #include <windows.h> ensure that the compilation can pass.

Let's see what __FUnloadDelayLoadedDLL2 did.

1. Traverse all the IMGDELAYDESCR to find the DLL corresponding to the same name IMGDELAYDESCR

2. If the IMGDELAYDESCR corresponding Rvaunloadiat is not 0, then the data in the Rvaunloadiat is overwritten in Rvaiat.

Ask a small question: what does the Rvaunloadiat store? Interested readers can try it out on their own, but we should also be able to figure it out without trying. Because after unload we can still invoke the function in the DLL for lazy loading, the rvaiat after overwriting must be the same as the data in the original (the function in the DLL has never been called) Rvaiat. That is, a copy of the Rvaiat is stored in the Rvaunloadiat before the program is loaded.

Binding

There's a whole bunch of stuff to be introduced about bindings. Because it's not the focus of this article, let's simply introduce:

We know that, in general, when a function is imported from a DLL, the address of this function is obtained at load and filled into the IAT. This inevitably leads to longer loading times. The thing that binds you to do is to advance this work. Then there is a problem, DLL load address is uncertain, how to get the correct function address it? In fact, the binding has a precondition, that is, the DLL load address must be the same as the load address defined in the PE, binding will be effective. Otherwise, the function name and function address will be re-obtained by int at load time. In addition, a series of judgments are required, such as DLL timestamps, because the DLL address may be changed after recompilation, and the previous bindings are not valid.

So how do you bind it? Microsoft has provided a tool called Bind. Use the following bind-u Sample.exe DelayLoad.dll

After running the bind command we can see that there is a change in data DIRECTORY: Image_directory_entry_bound_import. If the DLL is not deferred loaded, it is believed that the binding will function after the command is run. Unfortunately in this example because of the use of delay load, I tried to bind using bind, but did not succeed (the timestamp in the IMGDELAYDESCR has not changed, so when the time stamp is compared to fail, The result is a function address obtained through GetProcAddress). For specific reasons do not know, for the time being a paragraph first.

When it comes to loading addresses, there is a tool that has to be mentioned, that is, rebase, the role of rebase is to adjust the DLL's preferred load address, so that each DLL can be loaded to the preferred address, so that a certain degree of optimization. Microsoft usually recommends that you run rebase before you run bind, which ensures that bind is effective. There are a lot of relevant content, interested readers can find their own information to study, if you can write a program to test the bind, rebase after the load time of the program to speed up how much that would be better:)

Two ways to load DLLs (pend) + delayload

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.