Detours: intercepts Win32 function calls in binary code

Source: Internet
Author: User

Detours: intercepts Win32 function calls in binary code

Galen hunt and Doug brubacher

Microsoft Research

One Microsoft Way

Redmond, WA 98052

Detours@microsoft.com

Http://research.microsoft.com/sn/detours

Note: The first publication of this paper is to authorize usenix. The author reserves the copyright. This document allows copying for non-commercial purposes, such as educational and research purposes. First published in proceedings of the 3rd usenix Windows NT symposium. Seattle, WA, July 1999.

 

Excerpt

The key to innovative research on system-level detection is to make it easier to Intercept function calls and to extend the functions of existing operating systems and applications. By getting the source code, we can easily Insert new functions or perform function extensions between the rebuilding operating system or applications. However, in today's commercialized Development World, and in systems with only binary code released, researchers have almost no chance of getting the source code of the program.

The developed detours is a tool library that intercepts any Win32 function calls on the X86 platform. Detours overrides the image of the target function to insert it to the Win32 function for execution. The detours development kit also retains the documentation describing how to attach the DLLs and data section tables (called payloads) to any Win32 binary file.

Although previous developers used to rewrite binary code to add debugging and performance testing code to applications, as we know, detours is the first development kit that can be called by a target function as a subprocess to intercept a function on any platform. Our unique trampoline design is the key to extending existing binary software.

We will introduce our experience of using detours to generate an automated distributed system. This system is used to analyze the DCOM protocol stack and generate a thunking layer for the com-based OS API. It proves the effectiveness of the detours library from a micro-benchmark.

1 Introduction
The key to the creative research on system-level detection is to make the Interception Function simpler and more feasible, and to expand the existing operating system and application functions, whether the function exists in an application, A library or a Dynamic Linked Library of a system. The most direct reason for intercepting function execution is to add functions, modify return values, or add additional code for debugging and performance testing. By accessing the source code, we can easily use the rebuilding operating system or application method to insert new features or perform function extensions between them. However, in today's commercial development world, researchers have almost no chance to get source code in systems that only release binary code.

Detours is a tool library that intercepts any Win32 function calls on the X86 platform. The interrupted code can be dynamically loaded at runtime. Detours uses an unconditional transfer command to replace the first few commands of the target function and transfer the control flow to a user-provided Interception Function. Some commands in the target function are stored in a function called "trampoline, these commands include the replaced code in the target function and an unconditional branch transferred to the target function. The interception function can replace the target function, or extend the function by calling the target function as a subroutine when executing "trampoline.

Detours is inserted during execution. The code of the target function in the memory is not modified on the hard disk, so it is easier to intercept the execution of the binary function at a good granularity. For example, the function process in the DLL loaded when an application is executed can be inserted with a piece of interception code (detoured). At the same time, the DLL can be executed by other applications as normal: that is, it is executed in a non-Intercepted manner. Because the DLL binary file is not modified, the interception will not affect the loading of this DLL in other process spaces ). Unlike DLL relinking or static redirection, the interrupt technology used in the detours library ensures that the method in the application or the positioning of the target function by the system code is not affected.

If others attempt to modify binary code for debugging or internal use of other system detection techniques, detours will be a universally available development kit. As we know, detours is the first development kit that can retain unmodified target code on any platform as a subroutine that can be called through "trampoline. In the past, the system put the interception code in the target code logically, rather than calling the original target code as a common subroutine. Our unique "Trampoline" design is crucial to extending the binary code of existing software.
For the purpose of using the basic function Interception Function, detours also provides the ability to edit any DLL import table to add any data section table to the existing binary code, inject a DLL into a new process or a running process. Once a DLL is injected into a process, this dynamic library can intercept any Win32 function, whether in an application or in a system library.

The next section describes how detours works. Section 3rd outlines how to use the detours library, section 4th describes the general techniques used by interception functions, and how to use a micro standard to measure detours. Section 5th describes in detail how to use detours to generate a distributed application from a local application to quantify the cost of DCOM and create a thunking layer for a new WIN32API Based on COM, and capture the first opportunity exception. We will compare the work of detours and others in section 6th and make a summary in section 7th.

2 Interception
Detours provides three important functions: the ability to interrupt the execution of Win32 binary functions at Will on the x86 machine, and the ability to edit Binary File Import tables, and the ability to attach arbitrary data section tables to binary files.

We will describe each interception function.

2.1 intercept binary Functions
The detours library makes it easier to Intercept function calls, And the interception code is dynamically loaded at runtime. Detours uses an unconditional transfer command to replace the first few commands of the target function, and transfers the control flow to an intercepted function provided by the user. Some commands in the target function are stored in a function called "trampoline". These Commands include the replaced code in the target function and an unconditional branch transferred to the target function.

When the program executes the target function, it directly jumps to a user-supported Interception Function. Intercepts functions to perform appropriate preprocessing. The interception function can be directly returned to the original function, or it can call the "trampoline" function, which can call the target function in the previous method of interception. After the target function is executed, it controls the return to the interception function. The interception function will execute the appropriate final work and return the control to the source function call. Figure 1 shows the logical control flow of intercepted and unintercepted calls.

Figure 1. Intercepted and unintercepted function calls.

The detours library intercepts the target function by overwriting the binary image of the target function in the process. For each target function, detours actually overrides two functions: the target function and the matching trampoline function. The trampoline function can be created statically or dynamically. A statically created trampoline function can call the target function without being intercepted. In the previous insert for interception, the static trampoline function saves a simple jump to the target function. After this adjustment is inserted, the trampoline function saves the initialization commands of the target function and the jump commands to the target function. Coign_cocreateinstance is extremely useful for programmers who intercept calls. For example, in coign [7], calling coign_cocreateinstance is equivalent to calling the original cocreateinstance function without intercept. The internal function of coign can call coign_cocreateinstance to generate a component object at any time without considering whether the original function has changed the execution flow due to interception.

Figure 2. Trampoline and target functions, before and after intercepting code insertion (from left to right ).

Figure 2 shows the insertion before and after the interception process. To intercept a target function, detours first allocates memory for the dynamic trampoline function (if no static trampoline function is provided), and then the target and Trampoline functions are writable. After the first instruction is started, detours will copy at least five bytes of instruction from the target function to the trampoline function (five bytes are enough to include the next unconditional transfer instruction ). If the target function is less than five bytes, detours terminates the execution and returns an error code. To copy commands, detours uses a simple table-driven disassembly engine. Detours will add a jump command at the end of the trampoline function execution. After the trampoline function is executed, the program will jump to the remaining part of the target function that has not been copied to continue the execution. Detours writes an unconditional jump command to the interception function as the first command to the target function. Finally, detours will save the original page permissions of the target function and Trampoline function, and use the flushinstructioncache function to clear the instruction buffer of the CPU.

2.2 edit the load effective and DLL import table
Although a large number of ready-to-use tools are available to edit binary files [[10, 12, 13, 17], however, most system research does not need to use these clumsy tools to perform a large number of accesses and modifications to binary files. Instead, you usually need to add an additional DLL or Data Partition Table for the application and system binary files. For interception functions, the detours Library provides the payloads function, which can be used to add reversible support for any data section table to Win32 binary files, and then edit the DLL import table.

Figure 3 shows the basic structure of the Win32 PE binary file. The Win32 binary file in PE format is an extension of COFF (Common Object File Format. A Win32 binary file contains a DOS-compatible file header, a PE Header, a text section table containing program code, and a data section table that stores initialization data, an import table that lists imported DLL and functions, an export table that lists exported function code, and debugging symbols. Except for the two file headers, each section table of the file is optional, and binary files can not contain them.

Figure 3. Structure of the Win32 PE executable file.

To modify a Win32 binary file, detours generates a new. detours section table between the export section table and the debugging symbol. Note that the debugging symbol must always be at the end of the Win32 binary file. This new table stores a record that intercepts the file header and the original PE Header. If the import table is modified, detours will generate a new import table, and attach it to the copy PE Header, and then modify the original PE Header so that it points internally to the new import table.

Finally, detours writes some other information to the end of the. detours section table and attaches the debugging information to the end of the file. Detours can restore the binary file to its original state before modification, because it can restore the original PE File Header saved in the. detours section table and delete the. detours section table. Figure 4 shows the format of a Win32 Binary File Modified by detours.

A new import table is generated with two entries. First, it retains the original import table, so that if the programmer wants to restore to the status before modification, there will be no problem. Second, the new import table can save the renamed import DLL and functions or the new DLL and functions. For example, coign [7] uses detours to insert a coignrte. dll dynamic library initialization entry for each program to be intercepted. As the first entry in the application import table, coignrte. DLL is always the first dynamic library to run in the application address space ).

Figure 4. Format of a binary file modified by detours.

Detours provides functions for editing the import table, adding the payload, enumerating the payload, deleting the payload, and then binding the dynamic library. Detours also provides the ability to map enumeration to binary files in the address space and to map these binary files to the effective load in the address space. Each payload is identified by a globally unique identifier (guid. Coign uses detours to attach the configuration information of each application to the binary code of the application.

Once any interception behavior needs to be inserted into the application without modifying the binary file, detours provides a function to inject the DLL into a new or existing process. To inject a DLL, detours uses the allocex and writeprocessmemory APIs to write the call code of a loadlibrary in the target process, and uses createremotethread to perform this call: A new thread is used to call the written code, including loadlibrary. During DLL loading, the dllmain function can be executed ).
 
 
3. How to use detours
The code snippet in Figure 5 describes how to use the detours library. To use detours, you must include detours. h and link detours. lib to the project.

Figure 5. An example of intercepting a function.

The trampoline function can be dynamically or statically created. To use the static trampoline function to intercept the target function, the application must use the detour_trampoline Macro when generating trampoline. Detour_trampoline has two input parameters: the prototype of the trampoline and the name of the target function.

Note: The correct interception model, including the target function, trampoline function, and interception function, must be completely consistent in the call form, including the parameter format and call conventions. It is the responsibility of the function to copy the correct parameters when calling the target function through the trampoline function. As the target function only intercepts a callable branch of the function, this responsibility is almost a kind of subconscious behavior.

The same call conventions can be used to ensure that the values in the registers are properly saved and that the call stack can be correctly created and destroyed when the function is intercepted to call the target function.

You can use the detourfunctionwithtrampoline function to intercept the target function. This function has two parameters: the trampoline function and the pointer to the intercepted function. Because the target function has been added to the trampoline function, you do not need to specify it in the parameter.

We can use the detourfunction function to create a dynamic trampoline function, which includes two parameters: a pointer to the target function and a pointer to the intercepted function. Detourfunction allocates a new trampoline function and inserts the appropriate interception code into the target function.

If the target function itself is a link symbol, it is very easy to use the static trampoline function. If the target function cannot be visible during the link, you can use the dynamic trampoline function. You can use other functions to obtain the pointer of the target function. In this case, when the target function is not easy to use, the detourfindfunction can find the function, regardless of the function exported in the DLL, or you can find it through the debugging symbol of the binary target function.

Detourfindfunction accepts two parameters: Library name and function name. If the detourfindfunction finds the specified function, a pointer to the function is returned. Otherwise, a null pointer is returned. Detourfindfunction first uses the Win32 functions loadlibrary and getprocaddress to locate the function. If the function is not found in the DLL export table, detourfindfunction uses the imagehlp library to search for valid debugging symbols: the debugging symbol here refers to the debugging symbol provided by Windows itself, which must be installed separately. For details, refer to the user diagnostic support information of windows ). The function pointer returned by detourfindfunction can be passed to detourfunction to generate a dynamic trampoline function.

We can call detourremovetrampoline to remove interception of a target function.

Note that the function in detours modifies the address space of the application. It is the programmer's responsibility to ensure that no other thread is executed in the process space when the Interception Function is added or removed. A simple method is to ensure that a single-threaded execution calls a function in dllmain when the detours library is loaded.

4 rating
Some other techniques can intercept function calls, including:

Replace called functions with the source code in the application: by modifying the source code of the application, replace the call to the target function with the call to the intercepted function. The main drawback of this method is that it requires source code.

Replace the called function in the application's binary file: Replace the call to the target function with the call to the intercepted function by modifying the application's binary file. Although this technology does not require source code, this method needs to identify the call location that can be used. This requires the available symbolic information in the binary file, which is not provided by common applications.

DLL redirection: if the target function resides in a dynamic library, you can redirect the call to an intercepted DLL by modifying the Binary File Import table. The redirection process can be to replace the original DLL in the import table before the application loads, or replace the function address in the indirect import jump table after loading [2]. Unfortunately, the method for redirecting to the interception function through the import table in the application is useless for the internal function calls of those DLL and the function pointers loaded using loadlibrary and getprocaddress.

Breakpoint trap: Unlike the replacement DLL, the target function can be captured by inserting a debugging breakpoint.

The intercepted function can be called by the debugging interrupt handle. The main drawback of this technology is that the breakpoint trap will suspend all threads of the application. In addition, debugging interruption must be captured in another operating system process. Capture through the breakpoint trap makes a great sacrifice in execution efficiency.

Table 1 lists the time taken to capture an empty function and a cocreateinstance API. This small test was executed on a MHz Pentium Pro machine. It lists the time spent without interception, the call replacement, the DLL replacement, the detours library, or the time spent using the breakpoint trap. You can see that the use of the detours library is only a little more time than some other methods (the fastest method is no more than 400 nanoseconds ).

Table 1. Comparison of capture techniques in time consumption.

5 experience
In the past two years, the detours library has been widely used in Win32 applications and Windows NT operating systems for research and function expansion.

Detours was originally developed for coign Automatic Distributed partition system [7. Coign converts a local desktop application from a COM component to a distributed client/server application. During system detection and analysis, coign uses detours to intercept calls to com Instance functions, such as the cocreateinstance function. The interception function calls the original library function through the trampoline function, and encapsulates an output interface pointer in an additional detection output layer (see [8]). This detection output layer determines how application components are executed over the network. In this way, through distributed execution, a new coign Interception Function will intercept calls to com Instance functions and use the Distributed Mechanism to re-allocate these calls to the call path. Essentially, coign extends the com library and supports flexible remote calls.

Although DCOM supports remote calls to some com Instance functions, coign supports remote calls to about 50 COM functions through roundabout extension (that is, interception behavior. Coign uses the detours DLL redirection function to attach a runtime loader to the binary code of the application, and uses the load function (payload) attaches a system Statistical Data Section Table to the binary code of the application.

Some of our colleagues also used detours to test the user mode of the DCOM protocol stack, including the externaling proxies, DCOM Runtime Library, RPC Runtime Library, and Winsock Runtime Library, and financialing stubs [11]. The analysis of the results is used to re-build the DCOM structure to generate a faster user mode network. In addition, they can use the source code to generate a special version of DCOM for system detection and analysis. on the computer that conducts system detection and analysis, this source code-based detection can achieve version independence and be shared by all DCOM applications. By using the detours-Based Binary detection method, the system analysis tool can attach to any Windows NT 4 DCOM and only affect the detected process.

In another feature Extension Test, detours is used to generate a thunking layer for the COP (Component-based operating system Proxy Server [14. Cop is a com-based Win32 API version. The application that uses the cop accesses the functions provided by the operating system through the COM interface, such as iwin32filehandle. Since the cop interface is released by DCOM, a cop application can use operating system resources, including file system, mouse, keyboard, display, registry, and so on, by computers on the network. To support subroutines, the cop uses interception functions to capture all calls to Win32 APIs. The API call of the local application is converted to the call of the COP interface. At the underlying layer, the cop uses the trampoline function to detect communications with the following operating systems. You do not need to modify the binary code of the application. During loading, the cop dll is injected into the application address space by the detours injection function. Through simple interception of detours, this kind of heavy extension to Win32 API becomes simpler.

Finally, to support the software distributed memory (sdsm) system, we constructed a first-chance exception (first-chance exception) filter for the Win32 structured exception handle. The wiin32 API contains an API: setunhandledexceptionfilter. If the application does not have any other exception filter handle, you can use this API to specify an exception filter for the application. For applications such as sdsm, programmers always want to insert the first chance exception filter, which can remove page errors caused by sdsm's operations on virtual memory (VM) page permissions. Windows NT does not provide a mechanism such as the first chance of exception filtering. A simple interception can switch the abnormal entry point from kernel mode to user mode (kiuserexceptiondispatcher ). Only a few lines of code are used to intercept the function and call an exception filter provided by the user for the first time. If the exception is not handled, the default exception handling is performed through the trampoline function.

6 related work
Detours can be used to expand common code patching technologies. To capture the execution process, an unconditional branch or jump is inserted to a certain point of the captured target function. The code of the target function types covered by these jump commands is moved to the code patch. The code patch contains the inserted detection code or a call to the detection code, the code is followed by the code of the target function that is transferred to the unconditional Branch and the jump to the first instruction that is not modified to the target function. Logically, a code patch can be designed to be placed at the beginning of a function, inserted to any point in the function, or appended to the end of the function.

The code patch will continue to execute the target code through a certain mechanism, but our technology has completely handed over the control to the interception function, the latter can call the original target function through the trampoline function when possible. The trampoline function allows the system to detect behaviors in full freedom, because by using the same call conventions, the original target function can be called at any time as an callable subroutine.

The code patch technology already exists when the digital computer becomes known [3-5, 9, 15]. Code patches are used to insert debugging information and detect code. In the past, code patches were generally considered as a more practical upgrade method, rather than re-compiling the entire application. In addition, for debugging and detection, detours is also used to flexibly expand the functions of the existing system [7, 14].

Although the recent system has extended the code patching method for parallel applications [1] and system kernel [16], as we know, detours is the only patch system that can use the target function as a callable subroutine. The intercepted function replaces the target function, but the target function can be called using the trampoline function wherever appropriate. Our unique trampoline design makes it easy to extend the functionality of existing binary code.

The latest research produces a class of rewriting tools for binary code, including atom [13], etch [12], eel [10], and morph [17]. In general, these tools use the binary code of the application and a script for detection as input. The detection script passes some commands that need to insert code in binary, which are basically blocked, or functions. The output is a new binary code used for detection and research. In earlier systems, dyninstapi [6] can dynamically modify applications.

The biggest advantage of detours over these binary rewriting tools is its size. The code added by detours to the detection package does not exceed 18 KB, and at least kb must be added to those rewriting tools. The size of detours is very small. The cost is that it cannot add code between the command and the basic blocking. The rewriting tool can insert detection commands between arbitrary commands through some special features, such as free register discovery. Detours depends on the call convention to save the register value. The rewriting tool supports inserting code before and after basic command units. It does not support calling unrewritten target functions as subprograms.

7 conclusion
The detours Library provides a complete set of import tools for system researchers. The detour function is fast, flexible, and friendly. A cocreateinstance Interception will not affect the speed by more than 3%. Compared with breakpoint traps, the speed advantage is an order of magnitude. The detours library is small. The compiled Runtime Library does not exceed 40 kb, although the additional code does not exceed 18 KB for the user's detection program.

Unlike DLL redirection, the detour Library supports capturing static and dynamic function calls. Finally, the detour library is more flexible than DLL redirection and direct modification of application code. During execution of each process, interception of any function is optional.

Our unique trampoline design retains the original semantics and provides the unaltered part of the target function as a subprogram to Intercept function calls. With the Interception Function and Trampoline function, you can easily generate eye-catching system extensions without the support of source code or re-Compilation of binary files. Detours makes it possible to conduct a new generation of System Research on the Windows NT platform.

References
[1] Aral, Ziya, Illya geraff, and Greg Schaffer. efficient debugging primitives for multiprocessors. proceedings of the Third International Conference on specified tural support for programming languages and operating systems, pp. 87-95. boston, MA, stml 1989.

[2] Balzer, Robert and Neil Goldman. Mediating ors. Proceedings of the 19th IEEE International Conference on Distributed Computing Systems workshop, pp. 73-77. Austin, TX, June 1999.

[3] Digital Equipment Corporation. DDT reference manual, 1972.

[4] Evans, Thomas G. and D. lucille Darley. debug-an extension to current online debugging techniques. communications of the ACM, 8 (5), pp. 321-326, May 1965.

[5] Gill, S. The diagnosis of mistakes in programmes on the edsac. Proceedings of the Royal Society, series A, 206, pp. 538-554, May 1951.

[6] Hollingsworth, Jeffrey K. and Bryan buck. dyninstapi programmer's guide, Release 1.2. Computer Science Department, University of Maryland, College Park, MD, September 1998.

[7] Hunt, Galen C. and Michael L. scott. the coign Automatic Distributed partitioning system. proceedings of the Third Symposium on operating system design and implementation (osdi '99), pp. 187-200. new Orleans, LA, February 1999. usenix.

[8] Hunt, Galen C. and Michael L. scott. intercepting and instrumenting com applications. proceedings of the specified th Conference on object-oriented technologies and systems (coots' 99), pp. 45-56. san Diego, CA, May 1999. usenix.

[9] Kessler, Peter. fast breakpoints: design and implementation. proceedings of the ACM sigplan '90 Conference on programming language design and implementation, pp. 78-84. white Plains, NY, Jun 1990.

[10] Larus, James R. and Eric schnarr. eel: Machine-independent executable editing. proceedings of the ACM sigplan Conference on programming language design and implementation, pp. 291-300. la Jolla, CA, June 1995.

[11] Li, Li, Alessandro forin, Galen hunt, and Yi-min Wang. high-Performance Distributed objects over a system area network. proceedings of the Third usenix nt symposium. seattle, WA, July 1999.

[12] Romer, Ted, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian bershad, and J. bradley Chen. instrumentation and optimization of Win32/Intel executables using etch. proceedings of the usenix Windows NT workshop 1997, pp. 1-7. seattle, WA, August 1997. usenix.

[13] Srivastava, Amitabh and Alan Eustace. atom: A system for building customized program analysis tools. proceedings of the sigplan '94 Conference on programming language design and implementation, pp. 196-205. orlando, FL, June 1994.

[14] Stets, Robert J ., galen C. hunt, and Michael L. scott. component-based operating system APIs: A versioning and distributed resource solution. IEEE Computer, 32 (7), July 1999.

[15] stockham, T. g. and J. b. dennis. flit-flexowriter interrogation tape: a symbolic utility program for the TX-0. department of electical engineering, MIT, Cambridge, MA, memo 5001-23, July 1960.

[16] tamches, Ariel and Barton p. miller. fine-grained dynamic instrumentation of commodity operating system kernels. proceedings of the Third Symposium on operating systems design and implementation (osdi '99), pp. 117-130. new Orleans, LA, February 1999. usenix.

[17] Zhang, Xiaolan, Zheng Wang, Nicolas gloy, J. bradley Chen, and Michael D. smith. system Support for automatic profiling and optimization. proceedings of the sixteenth ACM Symposium on operating system principles. saint-Malo, France, October 1997.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.