A contrastive analysis of C language and assembly language

Source: Internet
Author: User
Tags nested switch

The game usually contains a variety of functions, such as combat system, UI rendering, economic system, production system, etc., each system contains a variety of sub-functions, such as damage determination, cast, use props, character movement, trading between players and so on. These game functions often have conditional judgments (such as damage determination), loops (traversing item lists, playing game animations), etc. in code implementations.

In the reverse process, if the corresponding grammatical structure can be recognized from assembly language, it can help to understand the program execution process by converting assembly code into C grammar structure during the analysis.

The following describes the most common logical syntax structures:

A) If...else

b) switch...case

c) for, while

Note: The disassembly tool used in this article is Ida

First, If...else

Assembly Code:

The IF...ELSE structure is relatively fixed and typically contains CMP directives, JCC directives, and instruction blocks that are executed after the condition is met.

The IF...ELSE structure can be concatenated, and the If...else in series has an obvious code block boundary, and the reverse tool can often identify blocks of code (dashed lines in the figure).

Second, switch...case

    1. A simple switch...case

  

Assembly Code:

Shows the basic structure of switch...case: a) jump expression; b) branch code; c) Jump table

A) Jump expression

Where the loc_401235 code block corresponds to the default branch in Switch...case.

When Ngameevent > 4 o'clock, jump to the loc_401235 code block, which is the default branch.

When Ngameevent <= 4 o'clock, jump according to the jump expression:

JMP Ds:off_40123c[ngameevent*4]

Where off_40123c is the jump table address, each entry in the Jump table represents a 32-bit address (4 bytes), when Ngameevent is 0 by the first address jump, when ngameevent 1 by the second address jump, and so on.

b) Branch Code

The processing logic for each branch is here, and the sample code simply calls the corresponding function.
(PS: This is the result of compiler optimizations using JMP instead of call)

c) Jump Table

The jump table is actually an array of addresses, storing the address of each jump branch (32-bit absolute address), and when Ngameevent is 0 o'clock, the jump expression reads the first data in the array (0x0040121c), which is

. text:0040121c E9 8F FF ff+ jmp [email protected] @YAXXZ

Call the Dologin function.

(PS: The actual runtime, due to the random base address, from the debugger to see the content of the jump table may be different from static analysis, this is caused by relocation, about the principle of relocation can refer to the relevant documents, not detailed here)

    1. Discontinuous switch...case

In the example above, the value of case is continuous, so the jump table compares the rules. You may encounter irregular case values in actual use, such as:

Assembly Code:

The above code has two features:

    1. Minimum case value not 0
      The minimum case value is 3, in order to not waste the jump table space, the compiler will subtract the index value by 3 to ensure that the smallest case value corresponds to the first item in the Jump table.

2.case values are not contiguous
The compiler inserts the default jump in the jump table interval to ensure that the logic is correct. (Space in exchange for time)

3. Double Jump Table

Assembly Code:

The case value interval is greater here than in the previous example. If you follow the previous method, the size of the jump table needs to be (110-30 + 1) * 4 = 324 bytes, which takes up large memory space.

In order to save space, the compiler uses a double jump: Jump table, indirect jump table. The jump table is consistent with the jump table described earlier, while the indirect jump table is not the branch address, but the index value, which points to the index in the jump table.

Jump table:


Indirect Jump table:

When entering switch...case, calculate the index number according to the indirect jump table, and then find the jump table according to the index number to get the actual branch address.

After using the double jump table, the actual space occupied: 5*4 + (110–30 + 1) = 101 bytes, greatly reducing space consumption.

    1. Swtich...case degeneration

When the case value interval is too large, the space consumed by the jump table and the double jump table is too large for the compiler to degrade switch...case to if...else, such as:

Assembly Code:

There is no jump table structure, only the CMP/JCC instruction is left, and the compiler has converted the swtich...case to equivalent if...else. However, in the process of conversion, the compiler has done its best to optimize the jump branch by binary search method.

    1. Nesting switch...case

Assembly Code:

It can be seen that the nested switch...case structure is relatively independent in the assembly code, and the outer and inner switch structures have their own jump tables.

Outer Jump table:

Memory Jump Table (double jump table):

Depending on the address entry in the Jump table, you can also clearly distinguish between the outer and inner jump branches.

Third, the circular statement

A) for loop

Assembly Code:

Where NOP DWORD ptr[eax+00h] is directive aligned, there is no practical meaning. The assembly implementation of the loop is:

b) While loop

Assembly Code:

Where NOP DWORD ptr[eax+eax+00h] is directive aligned, there is no practical meaning. The assembly implementation of the loop is:

As can be seen from the above, the for and while structure of the assembly implementation of almost a touch, just the use of a few different registers. It is possible to map loops to a for or while structure during the actual reverse process. At the same time, it can be seen that the loop has an obvious feature: Jump back (to the direction of the address small jump), most of the cases encountered a jump back to the instruction is the loop, very few such as the compiler code structure optimization generated back jump is not the exception of the loop.

Iv. Summary

The assembly code corresponding to the syntax structure is very much related to the compiler, the same source code is different from the compiler generated assembly code structure, even the same compiler, different compilation options generated by the assembly code structure is not the same. You need to be familiar with the compiler's features in the reverse process.

* Reproduced please specify from the Game Security Laboratory (GSLAB.QQ.COM)

A contrastive analysis of C language and assembly language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.