Dynamic code modification using C/C ++ (SMC)

Source: Internet
Author: User
Tags case statement

Author: Orbit)
E_mail: inte2000@163.com

SummaryThe self modifying code technology encrypts the code or data in executable files to prevent others from using reverse engineering tools (such as some common disassembly tools) the static analysis method of the program only decrypts the code and data when the program is running, so as to run the program and access data normally. Computer viruses usually use SMC technology to dynamically modify executable code in the memory to achieve deformation or code encryption, so as to escape antivirus software detection and removal or confuse antivirus workers to analyze the code. Because this technology needs to directly read and write the machine code in the memory, it is mostly implemented in assembly language, this makes many C/C ++ programmers who want to use SMC Technology for Software Encryption in their own programs discouraged. In view of this situation, this paper proposes several machine instruction Locating Methods Based on C/C ++ language, so as to implement dynamic code modification technology using C/C ++ language.

Keywords: SMC dynamic code modification Software Encryption

1. What is SMC technology?

The self modifying code technology encrypts the code or data in executable files to prevent others from using reverse engineering tools (such as some common disassembly tools) the static analysis method of the program only decrypts the code and data when the program is running, so as to run the program and access data normally. Computer viruses usually use SMC technology to dynamically modify executable code in the memory to achieve deformation or code encryption, so as to escape antivirus software detection and removal or confuse antivirus workers to analyze the code. Nowadays, many encryption software (or "shell" programs) use dynamic code modification technology to protect their own code in order to prevent cracker from tracking their own code. The following pseudocode demonstrates a typical application of SMC technology:
Proc main:
............
If. The running conditions are met
Call decryptproc (address of myproc); decrypt a function code
........
Call myproc; call this function
........
Call encryptproc (address of myproc); then encrypt the code to prevent the program from being dumped by dump

......
End main

Using SMC (code self-modification) technology in your own software can greatly improve the security of the software, protect private data and key functional code, and play a very good role in preventing software cracking. However, the SMC technology needs to directly read and write machine codes in the memory and have a considerable understanding of the assembly language and machine code. The specific implementation is generally in the assembly language. Since the Assembly Language is obscure and difficult to grasp, many C/C ++ programmers who want to use SMC Technology for Software Encryption in their own programs are discouraged. Is SMC technology only available in assembly languages? In theory, as long as direct access to pointer variables and memory is supported, SMC technology can be used in advanced languages such as C/C ++. This article uses some features of the C/C ++ language, such as direct access to function addresses and variable addresses, implements several methods to dynamically encrypt and decrypt Running code and data. First, a dynamic encryption and decryption method for the entire code segment is implemented by using the structure features of Windows executable files; then, the function name in C/C ++ is used to encrypt and decrypt the function as a whole; finally, the feature code sequence is inserted into the code, and the code is located by searching and matching the feature code sequence. A decryption and decryption method is implemented for any code segment. The following describes these methods.

Ii. Use SMC to encrypt and decrypt the entire code segment

The simplest way to use SMC in a program is to modify (or encrypt) the entire data segment or code segment. Here we first need to talk about the concept of "segment. This "segment" has two meanings. The first layer indicates the distribution of programs in the memory. The old 16-bit operating system uses the field ing method for memory, different segments are used to store code, data, and stacks, and dedicated base address registers are used to access these segments. Therefore, the segments are distinguished by code segments, data segments, and stack segments. With the rise of 32-bit windows, a new 32-bit flat memory mode has been introduced into Windows Memory Management Mechanism. In the flat mode, the distinction between segments is meaningless, however, the concept of segments is still preserved. These base address registers with the same name are now "segment selectors", but their functions are no different from those of common registers. Another layer of section refers to the data structure (section in the PE file) in the Windows Executable file stored on the disk ), it is a reference for Windows to locate code and data when loading this executable file. However, to really understand the concept of segments, you also need to understand the structure of Windows executable files and how Windows loads executable files into memory.

Microsoft has designed a new executable file format for its 32-bit Windows system, which is called "portable executable", that is, PE format, pe-format executable files are applicable to all 32-bit operating systems, including Windows 9x, Windows NT, Windows 2000, Windows XP, and Windows 2003, it is estimated that the new version of Windows will continue to support the PE format in the future. The PE File Format organizes the file data into a linear data structure. Figure 2-1 shows the image structure of a standard PE file:

Located at the beginning of the file is a MS-DOS header and a piece of Dos Stub code, in the PE file to retain this part is designed for DOS and Windows systems co-exist that period, when the program is running on the DOS system, the Dos Stub code is called in the format of DOS executable files. A typical Dos Stub code is to output a line of prompt on the console: "This program cannot be run in MS-DOS mode", of course, the Dos Stub code generated by different compilers is also different. Once upon a time, a program that can run both on the DOS system and on Windows was very popular. Its principle was to artificially Replace the Dos Stub code. What follows the Dos Stub code is the content of the PE file. First, it is a PE file mark, which contains four bytes, namely, "PE/0/0 ". This is followed by the PE Header and optional Header (optional Header, which can also be understood as some options and parameters of this PE file ), these two headers store a lot of important information about the PE file, such as the number of segments in the file, timestamp, Mount base address, and program entry point. These are followed by all segment headers, followed by all segment entities. The end of the PE file may contain other mixed information, including the re-allocation information, the debugging symbol table information, and the row number information. This information is not a necessary part of the PE file, for example, the program of a normally released release version does not have the debugging symbol table information and row number information, so the information is omitted in the structure diagram shown in Figure 2-1.

In the entire header structure, we only care about the segment headers of each segment, because the segment header contains the starting position and length of the segment in the file and the relative position of the segment mapped to the memory, when modifying the code in the memory, you need to locate the memory read/write address and the length of the read/write area. Next let's take a look at the definition of segment header in winnt. h,
Typedef struct _ image_section_header {
Byte name [image_sizeof_short_name];
Union {
DWORD physicaladdress;
DWORD virtualsize;
} MISC;
DWORD virtualaddress;
DWORD sizeofrawdata;
DWORD pointertorawdata;
DWORD pointertorelocations;
DWORD pointertolinenumbers;
Word numberofrelocations;
Word numberoflinenumbers;
DWORD characteristics;
} Image_section_header, * pimage_section_header;

In this header structure, we care about four attributes: name, virtualsize, virtualaddress, and characteristics. Name is the name of the segment, which is 8 bytes in length. ", such as". text ",". data, but this does not mean that the segment name must be ". this is just a Microsoft compiler Convention. Many compilers do not follow this convention. The segment name is an important attribute for directly modifying Memory code and data. In the memory, the position segment header is implemented by searching the name string. Virtualsize is the actual length of a segment. It is different from sizeofrawdata. sizeofrawdata is the size after file alignment. Generally, PE files are aligned in H bytes, so sizeofrawdata is an integer multiple of H. However, it is not necessarily H bytes in the memory loaded by windows, so the length of the segment should be determined using virtualsize. Virtualaddress is the relative virtual address (RVA) of the segment in the memory. The relative virtual address plus the base address loaded by the program is the real address of the segment in the memory. Finally, the segment attribute characteristics is used to add writable attributes to this segment, because Windows does not allow writing data to a read-only segment. The Section attributes are composed of some flags. The meanings of common flags and their values are shown in the following table:

Flag Meaning
Zero X 00000020 This is a code snippet
Zero X 00000040 This segment contains initialized data
Zero X 00000080 This segment contains uninitialized data
Zero X 02000000 Data in this segment can be discarded (after the EXE file is loaded, the process does not need the data)
Zero X 10000000 This segment can be executed
Zero X 20000000 This section is a shared segment.
Zero X 40000000 Segment readable
Zero X 80000000 This segment can be written

Table 2-1 common section attribute flag

Generally, the code segment generated by the compiler has the 0x00000020, 0x0000000, and 0x40000000 attributes. to modify the code segment, add the 0x80000000 flag to it, otherwise, Windows reports an invalid access exception.

The use of PE format files makes it unnecessary for Windows to load executable files and place them in the Middle East and West as before, instead, a simple loading method is to read PE files to the memory in order, this also makes the PE files loaded into the memory have a similar structure with the PE files stored on the disk, but the offset positions of each segment are slightly different due to different Alignment Methods, this difference is demonstrated:

 

The above section briefly introduces the format and Loading Method of PE files. If you want to learn more about PE files, refer to the references in this article [2]. the following section uses a simple example to describe how to dynamically encrypt and decrypt code by directly accessing the memory. The first thing to note is that you cannot encrypt the full code segment of the default code segment generated by the compiler. This is obvious because the entry code of the entire program is also in the default code segment. If you encrypt the entire default code segment, you will not have the opportunity to decrypt it, resulting in program loading and running failure. The default code names generated by different compilers are different. Generally, Microsoft compilers place all the code in a file named ". in the default code segment of text, while Borland's Compiler's default code segment is called "code", other compilers may have other code generation policies, but one thing is the same, that is, the whole segment of the code segment where the program entry point is located cannot be encrypted. In this case, the policy described in this article is to place the important code or data to be encrypted in a separate code segment, then, locate the segment through the memory and perform encryption and decryption. The first step is to notify the compiler to generate a new code segment and place the code we specified in this code segment. For this purpose, different compilers have different implementation methods, the compiler used in this example is Visual C ++. You can use the pre-compiled command # pragma to add a code segment for the program. First, use the VC Wizard to generate a Win32 application framework, and then add the following code:

# Pragma code_seg (". scode ")

Int calcregcode (const char * pszusername, char * pcodebuf, int nbufsize)
{
If (! Pszusername |! Pcodebuf)
Return 0;

Int nlength = strlen (pszusername );
If (nlength <= 0 | nlength> = nbufsize)
Return 0;

If (: isbadreadptr (pszusername, nlength) |: isbadwriteptr (pcodebuf, nbufsize ))
Return 0;

For (INT I = 0; I <nlength; I ++)
Pcodebuf [I] = pszusername [I] + 1; // for demonstration, only a shift transformation is performed.

Pcodebuf [nlength] = 0;

Return nlength;
}

# Pragma code_seg ()
# Pragma comment (linker, "/section:. scode, ERW ")

The calcregcode () function generates a valid registration code based on the user name. This is a function that should be protected, so encryption is required. The calcregcode () function code here is very simple, for demonstration purposes only, the function is to move the user name to the next digit to form a registration code. # Pragma code_seg (". scode ") indicates that the compiler generates a program named". scode, another pre-compiled command without parameters # pragma code_seg () tells the compiler that this is the end position of the new code segment, the code between the two pre-compiled commands will be placed in this name by the compiler. in the new code segment. Segment name ". scode can be named as needed, but the length (excluding the ending/0 Terminator) cannot exceed 8 bytes, which is determined by the structure of the Windows PE file. Last line # pragma comment (linker, "/section :. scode, ERW ") tells the link program to add the name". scode. The segment attribute is "ERW", indicating executable, readable, and writable. You can also add the "/section:. scode, ERW" option in the compilation option without using the pre-compiled command # pragma comment. Now compile this program and use the PE File Viewing tool to see that the program already has a file named ". scode section. The segment attribute is 0xe0000020, that is, 0x00000020 (code segment), 0x10000000 (executable), 0x40000000 (readable), and 0x80000000 (writable) A combination of four attributes.

 

With a new read/write code segment, the problem is how to locate the segment during the program running and modify it, this requires you to know the location of the PE file in the memory after it is loaded. When an executable program is loaded by windows, the virtual memory management mechanism of Windows maps it to a separate 4 GB memory space (of course, the application can only use part of the space, another part is occupied by the operating system). The addresses in the application are mapped to the virtual memory space. The entire PE file is mapped to a segment of the virtual space, the starting position is called the image base address, which is also a "virtual address" (different from the real address in the memory hardware ). Windows provides an API for obtaining the base address of an application. This API is getmodulehandle (). Its function prototype is:

Hmodule getmodulehandle (lpctstr lpmodulename );

The lpmodulename parameter is used to specify the module name. To obtain the base address for loading the current executable file, you only need to pass a null value. The returned value type hmodule looks a bit mysterious, in fact, it can be forcibly converted into a void type pointer for use, it points to the location is what we need base address. After finding the image base address, you can traverse all the section (segment) tables based on the structure of the PE file and find the table named ". segment of scode, and then get ". the starting address of the scode segment in the memory. In fact, this virtualaddress is only an offset relative to the base address of the image. the true position of the scode segment must be obtained by adding the image base address to virtualaddress. The size of the ". scode" segment is obtained through the virtualsize attribute. this size is the size before alignment, that is, the real size of all code, excluding the 0 bytes filled for alignment. Based on the introduction of the PE file, it is not difficult to write this lookup program. Below is a general function for finding the virtual address and size of a specific segment:

Bool getsectionpointer (void * pmodulebase, const char * lpszsection, void ** pppos, lpdword lpsize)
{
Image_dos_header * pdoshead;
Image_file_header * ppehead;
Image_section_header * ction;

* Pppos = NULL;
* Lpsize = 0;

If (: isbadreadptr (pmodulebase, sizeof (image_dos_header) |: isbadreadptr (lpszsection, 8 ))
Return false;

If (strlen (lpszsection)> = 16)
Return false;

Char szsecname [16];
Memset (szsecname, 0, 16 );
Strncpy (szsecname, lpszsection, image_sizeof_short_name );

Unsigned char * pszmodulebase = (unsigned char *) pmodulebase;
Pdoshead = (image_dos_header *) pszmodulebase;
// Skip the DOS header instead of the Dos Stub code and locate the PE flag.
DWORD Signature = * (DWORD *) (pszmodulebase + pdoshead-> e_lfanew );
If (signature! = Image_nt_signature) // "PE/0/0"
Return false;

// Locate the PE Header
Ppehead = (image_file_header *) (pszmodulebase + pdoshead-> e_lfanew + sizeof (DWORD ));
Int nsizeofoptionheader;
If (ppehead-> sizeofoptionalheader = 0)
Nsizeofoptionheader = sizeof (image_optional_header );
Else
Nsizeofoptionheader = ppehead-> sizeofoptionalheader;

Bool bfind = false;
// Skip the PE Header and option header and locate the Section Table location.
Ction = (image_section_header *) (unsigned char *) ppehead + sizeof (image_file_header) + nsizeofoptionheader );
For (INT I = 0; I <ppehead-> numberofsections; I ++)
{
If (! Strncmp (szsecname, (const char *) ction [I]. Name, image_sizeof_short_name) // compare the segment name
{
* Pppos = (void *) (pszmodulebase + ction [I]. virtualaddress); // calculate the actual virtual address
* Lpsize = direction ction [I]. Misc. virtualsize; // actual size
Bfind = true;
Break;
}
}

Return bfind;
}

Although the calcregcode () function has been done a lot, the usage of the calcregcode () function in the program is no different from that of calling other functions, but you just need to call ". scode segment decryption. The method described in this article requires a lot of Direct Memory operations, especially the read and write operations on the code to be run by the program, which may cause code exceptions, for example, code decryption failure may cause unpredictable commands to run the program. If you don't want your program to die hard, you 'd better use exception handling. The following describes how to use the calcregcode () function:

Try
{
Bool bfind = getsectionpointer (void *) himagebase, ". scode", & psecaddr, & dwsecsize );
If (! Bfind |! Psecaddr)
Throw "not find special section! ";

// Note that decryption and encryption functions are also important functions. It is best to call these two functions by calling the calcregcode () function.
// A little farther to avoid being detected
Decryptblock (psecaddr, dwsecsize, 0x5a); // decrypt the code segment first

Calcregcode ("system", szbuff, 128); // call the registration code calculation function

Encryptblock (psecaddr, dwsecsize, 0x5a); // encrypted code segment after calling
}
... // Exception Handling

So far, all the dynamic preparations have been completed, but the last step is to pre-encrypt the ". scode" code segment after the program is generated. The code generated by the compiler is unencrypted. To make the method described in this article usable, you must manually encrypt the ". scode" section in the PE file. In this example, the sub-code contains a small program cryptexe.exe, which is a command line tool that can encrypt a specified location of the PE file. The rest of the work is to locate the offset of the ". scode" segment in the disk file. Locate ". scode "segment and locate in memory image". the scode Section is also used to find the section table. scode section, and then locate the offset position and size of the segment in the file through the corresponding attributes of the segment (in this case, the attribute to be accessed is pointertorawdata and sizeofrawdata ). However, there is a simpler way to use the PE File Viewing tool to directly view the offset position and size. The preceding section table is used as an example (Figure 3) to demonstrate the program's ". the offset position of the socde segment in the file is 6000 H, the size is 1000 h, and the decimal values are 24576 and 4096, respectively. You can use the following command to perform initial encryption on the demo program:

Cryptexe.exe crktest.exe 24576 4096

After the token is generated, you forget to encrypt it in advance. An error message box is displayed, showing the error message. So far, the SMC encryption and decryption function has been fully implemented for the real code segment.

 

3. Use SMC encryption for the entire function body

The previous section describes a method of dynamic encryption code, that is, to encrypt and decrypt the entire code segment during the running of the program, which can protect some code that is critical to anti-cracking software, however, this method also has some drawbacks, that is, it requires an additional code segment, which is a bit like "Silver three hundred two, this extra code segment will undoubtedly become the target of the scammers to take care. This section describes how to encrypt and decrypt the code of a function. This method does not need to create additional code segments. It is relatively hidden and hard to detect.

The principle of encrypting a single function is the same as that of encrypting the entire code segment. You also need to locate the code start position and code block size in the memory image and PE file, but the code Positioning method is different. First, we will introduce how to locate the start position of the function and the size of the function code block in the memory image of the program. The C/C ++ language has a feature, that is, the function name represents the start address of the function. Therefore, the position of the code block in the memory can be obtained based on the function name, the remaining question is how to determine the size of the function code block, that is, how to find the position of the last instruction of the function. Unfortunately, there is no perfect solution to this problem except to directly view the Assembly Code. However, if I just say: Check the assembly code, finding the last RET command is enough. It is too "irresponsible" and violates the original intention of this article. Plan B is definitely not the first option for "walking through the rivers and lakes". The alternative solution is of course not "perfect, for example, the method used in this article is to calculate the difference between the starting position of the next function adjacent to the function and the starting position of the function. The difference can be roughly considered as the size of the function code block. Although this method is introduced in many materials, its imperfection is manifested in the following two aspects: on the one hand, the compiler cannot ensure that the adjacent functions of two C/C ++ codes are adjacent to the final machine code. No compiler makes this commitment, therefore, using this method is risky. On the other hand, the imperfection is because this method has many constraints on functions. This constraint is reflected in the compiler's code generation policy. Many documents have special instructions on this, for example, you 'd better not use longjmp () or switch... case statement, of course, cannot use exception handling mechanism. This is because when the above situation occurs in the Code, the compiler cannot guarantee that the generated code will be in a continuous code block, in particular, exception handling. Although this method is so imperfect, it is still widely used, because for the first imperfection, many compilers will try their best to ensure code continuity unless unexpected, as for the second imperfection, you only need to skillfully construct the code to avoid the use of the above statements, and reasonably set the if judgment statement to reduce the length of the function code, so as to avoid the appearance of Long Jump code blocks. It seems that this method is not very safe, but as long as the method is proper, it is trustworthy. The author used this method in several Software Encryption projects, this simple method is recommended because it can work reliably at present.

Let's take an example to see the specific effect. Create a dialog box-based project using VC, copy the calcregcode () function in the previous example to the project, and add an empty function immediately after it. The function type and name are casual, for example:

 

 

[Unfinished ...]

Code download

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.