Malware Reverse Analysis Series (1): identifies important code structures in assembly languages

Source: Internet
Author: User
Tags switch case

Malware Reverse Analysis Series (1): identifies important code structures in assembly languages

This series of articles are related to malware courses. Therefore, we should start with the complete structure of PE and ELF files.

Another important concept is that these malicious programs are executed through understandable assembly code. What is the binary architecture of these different codes? In this article, we will understand the code structure at the Assembly level.

Note: by default, the audience of this article is familiar with the assembly language. If the foundation is poor, you don't have to worry about it. I will explain these things in the simplest way.

Variable range

The things declared in the code to save constants or strings are called variables. The range of variables can be global or local. Local variables store the internal values of functions, and global variables can be used anywhere in the Code logic. When we go deep into the assembly code, global variables represent some memory addresses, while local variables represent the ebp offset value of the Register. For those who are not familiar with the beginning of assembly language, the commands starting with assembly language functions are similar to the following:

Push ebp

Mov ebp, esp

The ebp sets the stack frame of the new function, and the second command sets the ebp as the base address. Since the stack grows in ascending order of address, this means that local variables can be accessed via a [ebp-4] Like mov eax, which moves the four bytes in the ebp-4 to eax. Because local variables are not affiliated with any function, they cannot be referenced through ebp, while global variables point to memory addresses, such as mov eax, dword_50CF60, the global variable address is 0x50CF60.

Condition Statement

If condition statements are often used in code. A basic if condition statement is as follows:

The compilation code is as follows:

 

mov [ebp + var-4],1

mov [ebp+var_8],2

mov eax,[ebp+var_4]

cmp eax,[ebp+var_8]

jnz short loc_50101B

push offset aequalsb;”a & b are equal\n”

……

Push offset anotequalb;”a & b are not equal”

……

Note that the cmp, 'jnz-compare ', and 'dump if no zerm' commands correspond to 'if (a = B) in the code ). The Cmp command indicates the subtraction operation. Therefore, it means to compare two variables. If they are not equal, jump to the memory address and print the string "a & B are not equal". Otherwise, if the variables are equal, the jnz command will be skipped and the string "a & B are equal" will be printed ". If multiple if statements are embedded, you will see multiple cmp, jnz, and jz, and then print strings or other operations in the future.

Loop

The cyclic statements in the Code logic are used to iterate some operations until a certain condition is met. For loops are frequently used.

For the For loop, find out the four things: initialization, comparison, execution, increase/decrease.

 

for(int i=0;i<10;i++)

{

Printf(“Current value of i is %d\n”,i);

}

The corresponding assembly code contains four parts:

 

mov [ebp+var_4],0

jmp short loc_102345 

loc_987654

mov eax, [ebp+var_4]

add eax,1 

mov [ebp+var_4],eax

loc_102345

cmp [ebp+var_4],Ah

jge short loc_23456

mov ecx,[ebp+var_4]

push ecx

push offset iValue;”Current Value of i is %d\n”

call printf

add esp, 8

jmp loc_987654

We can see from the assembly code above that the First Command is initialization, followed by a jump, and compare it with 10 (Ah). If it is greater than or equal to 10, it will jump to loc23456; otherwise, the I value will be printed, then jump to loc987654, I value plus 1 ,. Then compare (I = 1)> = (I = 10 )). If not, the value is printed and then incremented. The entire process continues.

While loop statements can be easily tracked in the code. For example:

 

int i=0; 

while(i<10) 

{

printf(“current value of I is %d\n”,i) 

i++; 

}

Assembly Language

 

Mov [ebp+var_4], 0

Jmp short loc_12345

Loc_123456:

mov eax, [ebp+var_4]

add eax,1

mov [ebp_var_4],eax

Loc_102345:

cmp [ebp+var_4],Ah

jge short loc_234567

mov ecx,[ebp+var_4]

push ecx

push offset iValue,” current value of I is %d\n”

add esp,8

jmp loc_1023456

The assembly code of the While loop is similar to that of the for loop.

Switch statement

The Switch case is often used by the program and determined by values. For example:

 

Switch(i)

{

Case 1:

Printf(“Current Value of I is %d\n”,i+1);

break;

Case 2:

Printf(“Current Value of I is %d\n”,i+1);

break;

Case 3:

Printf(“Current Value of I is %d\n”,i+1);

break;

Case 4:

Printf(“Current Value of I is %d\n”,i+1);

break;

default:

break;

}

The assembly code language looks like a series of if statements, because many cmp and jmp commands can be seen in the initial stage. When these cases are not in order, such as case 1, case 12, and case 17, many if-else statements will appear in assembly code. Variables are immediately followed by case1, case2, and case3. The compiler performs simple optimization as follows:

 

Mov ecx,[ebp+var_4]

Sub ecx,1

cmp [ebp+var_8],3

ja loc_12345

mov edx,[ebp+var_8]

jmp short loc_987650[edx*4]

loc_234564:

…..

Jmp loc_12345:

loc_234565: 

…..

Jmp loc_12345:

loc_234566:

…..

Jmp loc_12345:

loc_234567:

…..

Jmp loc_12345:

loc_12345:

// Clear stack code

loc_987650

offset loc_234564 //jump table

offset loc_234565

offset loc_234566

offset loc_234567

What happened here. First, I marked all the addresses with similar memory addresses in color (hope to help !). Initialize the case variable, such as "I", save it in ecx, and subtract it from 1. Why? Here we use the jump table concept. This jump table is used when the compiler optimizes the code, the jump table stores the addresses of these different case values. Therefore, ecx needs to decrease by 1 because it serves as an offset pointer to jump tables whose start position is 0. The largest case is compared first because this case is the default case. For other variables, the offset value is set in the jump table.

This article describes the most basic features in some code. In the second part, we will discuss some complex structures such as arrays, struct, and linked lists.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.