Reverse Engineering (I): Basics of assembly and Reverse Engineering

Last Update:2015-12-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This series of articles will explain the various knowledge of reverse engineering, which is easy to understand.

Assembly is the basis of reverse engineering. This article is not in-depth but covers all the basic knowledge you need to know when you first start learning assembly! Assembly language is the start point and end point of all programs. After all, all advanced languages are built on the basis of assembly. In many advanced languages, we need relatively clear syntaxes, but in assembly, we use abbreviations and numbers to express programs.

I. Unit, bit, and byte

· BIT-the minimum unit of computer data volume, which can be 0 or 1. For example: 00000001 = 1; 00000010 = 2; 00000011 = 3 · BYTE (bytes)-one BYTE contains eight digits, so the maximum value of one BYTE is 255 (0-255 ). To facilitate reading, we usually use hexadecimal notation. · WORD-a WORD consists of 16 bytes. The maximum value of a word is 0 FFFFh (or 65535d) (h Represents hexadecimal, d represents hexadecimal ). · Double word (double dword)-a double word contains two characters, a total of 32 characters. The maximum value is 0 FFFFFFFF (or 4294967295d ). · KILOBYTE (kilobytes)-The kilobytes are not 1000 bytes, but 1024 (32*32) bytes. · MEGABYTE-MB is not a MEGABYTE, but 1024*1024 = 1,048,578 bytes.

II. Registers

Registers are a "special place" for computers to store data ". You can think of registers as a small box. We can put many things in it, such as names, numbers, and a paragraph ......

Today, Windows + Intel CPUs usually consist of nine 32-bit registers (w/o mark registers ). They are:

EAX: accumulators EBX: base register ECX: Counter EDX: data register ESI: Source Address Register EDI: Destination Address Register EBP: extended base address pointer register ESP: Stack pointer register EIP: instruction pointer register

Generally, the register size is 32 bits (four bytes ). They can store data with values from 0-FFFFFFFF (unsigned. At first, most Register names imply their functions, such as ECX = count, but now you can use any register to count (only in some custom sections, count to use ECX ). When I use the EAX, EBX, ECX, EDX, ESI, and EDI registers, I will explain their functions in detail. So let's talk about EBP, ESP, and EIP first.

EBP: The EBP is the most widely used in the stack. At the beginning, there was nothing to pay special attention to.) ESP: ESP points to the top position of the stack in the stack area. Stack is a place to store data that will be used soon. You can search for the push/pop command to learn more about stack. EIP: The EIP points to the next command that will be executed.

Another thing worth noting is that some registers are 16-bit or even 8-bit, which cannot be directly addressable.

Including:

Generally, a register can be viewed as follows:

The figure shows that EAX is the name of this 32-bit register. The low 16-bit part of EAX is called AX, AX is divided into two independent registers: AH with a high 8-bit value and AL with a low 8-bit value.

Note: even if it is not important, you should at least know the following registers.

These registers help us differentiate the size:

I. single-byte (8-bit) registers: as the name suggests, these registers are all one byte (8-bit ):

AL and AHBL and BHCL and CHDL and DH

Ii. Single-word (16-bit) registers: these registers are one character (= 2 bytes = 16 bits ). A single-byte register contains two single-byte registers. We usually differentiate them based on their functions.

1. General registers:

AX (single word = 16 bits) = AH + AL-> in which the '+' number does not mean to add them together. The AH and AL registers are independent of each other, but they are only part of the AX registers. Therefore, if you change AH or AL (or both), the AX registers will also be changed. -> 'Accessulator' (accumulators): used for mathematical operations BX-> 'base' (base register): used to connect to the stack (will be described later) CX-> 'counter' (counter): DX-> 'data' (data register): used to store data in most cases DI-> 'destination Index' (destination Address Register ): for example, copy a string to DISI-> 'source Index' (source Address Register): for example, copy a string from SI

2. index register (pointer register ):

BP-> 'base pointer '(base address pointer register): indicates the base address SP of the stack region-> 'stack pointer' (stack pointer register): indicates the top address of the stack region.

3. segment register:

CS-> 'Code segment' (code segment register): used to store the segment Base Address of the application code segment (which will be described later) DS-> 'data segment' (data segment register ): the base address of the segment used to store the data segment (will be described later) ES-> 'extra segment '(additional segment register ): base Address of the additional data segment used by the program SS-> 'stack segment' (stack segment register): used to store the base address of the stack segment (will be described later)

4. Instruction Pointer register:

IP-> 'struction pointer '(instruction pointer register): points to the next instruction ;)

Iii. Dual-character (32-bit) registers:

2 Characters = 4 bytes = 32 characters, EAX, EBX, ECX, EDX, EDI ......

If 'E' is added before the 16-bit registers, it indicates that they are 32-bit registers. For example, AX = 16 bits and EAX = 32 bits.

III. Mark register

The flag register represents a certain state. There are 32 different Mark registers in 32-bit CPUs, but don't worry. We only care about three OF them: ZF, OF, and CF. In reverse engineering, you know the mark register to know whether the program will jump in this step. The mark register is a sign and can only be 0 or 1, they determine whether to execute a command.

Z-Flag (zero Flag ):

ZF is the most used register for cracking (usually 90%). It can be set to 0 or 1. If the result of the previous operation is 0, the value is 1; otherwise, the value is 0. (You may ask why the 'cmp 'operation can be performed on the ZF register. This is because the command is being compared (equal to or not equal to). When is the result 0 and 1? Later)

The O-Flag (overflow Flag ):

OF registers account for about 4% in reverse engineering. When the previous operation changes the maximum valid bit OF a register, OF registers are set to 1. For example, if the value of eax is 7 ffffff and you add 1 to EAX, The OF register is set to 1, because the maximum valid bits of the EAX Register have changed at this time (you can use the computer's built-in calculator to convert the hexadecimal value to a binary value ). In addition, when the previous operation overflows (that is, the arithmetic operation exceeds the expressed range OF the number OF symbols), The OF register is also set to 1.

The C-Flag (carry Flag ):

The use of carry-in registers accounts for approximately 1%. If overflow occurs, it is set to 1. For example, if the value of a register is FFFFFFFF and 1 is added, an overflow occurs. You can use the calculator that comes with your computer to try it.

IV. segment offset

One segment in the memory stores commands (CS), data (DS), stacks (SS), or other segments (ES ). Each segment has an offset. In a 32-bit application, the offset ranges from 00000000 to FFFFFFFF. The standard format of the segment and offset is as follows:

Segment: offset = put them together is a specific address in the memory.

You can see as follows:

A segment is a page of a book: offset is a line of a page

V. Stack

Stack is the place where the memory can store things that will be used later. It can be seen as a pile of books in a box, and the last one is always the first. You can also think of the stack as a paper box. The box is a stack, and each piece of paper represents a memory address. Remember this rule: the last piece of paper is taken out first. The 'push' command is used to load data into the stack. The 'pop' command is used to extract the final data from the stack and store it in a specific register.

VI. commands (alphabetic sorting)

Note that all values are usually stored in hexadecimal format.

Most commands have two operators (for example, add EAX and EBX), some are one operator (for example, not EAX), and some are three operators (for example: imul eax, EDX, 64 ). If you use "dword ptr [XXX]", the data with the offset [XXX] in the memory is used. Note: bytes are stored in the memory in reverse mode (Windows + Intel mostly uses the "small-end method", word ptr [XXX] (dual-byte) and byte ptr [XXX] (single-BYTE) also follow this rule ).

Most of the instructions with two operators are in the following form (for example, the add command ):

Add eax, ebx; register, register add eax, 123; register, value add eax, dword ptr [404000]; register, Dword pointer [value] add eax, dword ptr [eax]; register, Dword pointer [register value] add eax, dword ptr [eax + 00404000]; register, dword pointer [register value + value] add dword ptr [404000], eax; Dword pointer [value], register add dword ptr [404000], 123 ;; dword pointer [value], numeric value add dword ptr [eax], eax; Dword pointer [register value], register add dword ptr [eax], 123 ;; dword pointer [register value], value add dword ptr [eax + 404000], eax; Dword pointer [register value + value], register add dword ptr [eax + 404000], 123; Dword pointer [register value + value], Value

ADD (ADD)

Syntax: ADD

The addition command adds a value to a register or memory address.

Add eax, 123 = eax + 123;

Addition commands affect ZF, OF, and CF.

AND (logical AND)

Syntax: AND target number, original number

The AND operation performs logic AND operation on two numbers.

The AND command clears the OF, CF tag, AND sets the ZF tag.

To better understand AND, there are two binary numbers:

10010101100101001101

If you perform the AND operation on them, the result is 0001000100.

That is, true (1), otherwise false (0), you can use a calculator to verify.

CALL)

Syntax: CALL something

The CALL Command pushes the current relative address (IP) into the stack and calls the subprogram after the CALL.

CALL can be used as follows:

CALL 404000; most common: CALL address call eax; CALL register-if the value of the Register is 404000, it is equivalent to the first case call dword ptr [EAX]; the address to which the CALL [EAX] offset points: call dword ptr [EAX + 5]; CALL [EAX + 5] the address to which the offset points

CDQ

Syntax: CDQ

It is often hard to understand when the CDQ command appears for the first time. It usually appears before division to convert all the bits of EDX to the highest bits of EAX,

For example, when EAX> = H, the highest binary bit is 1, and EDX is assigned a value of 1 by 32 bits, that is, FFFFFFFF.

If EAX is <80000000, the highest binary value is 0, and EDX is 00000000.

Add EDX: EAX to the new number (64-bit): FFFFFFFF 80000000

CMP (comparison)

Syntax: Number of CMP targets, original number

The CMP command compares two values and marks CF, OF, and ZF:

Cmp eax, EBX; compare whether eax and ebx are equal. If they are equal, set ZF to 1CMP EAX, [404000]; compare whether the values of eax and offset [404000] are equal cmp [404000], EAX; and whether [404000] are equal to eax

DEC (auto-subtraction)

Syntax: DEC something

Dec uses a subtraction of 1, which is equivalent-

Dec can be used in the following ways:

Dec eax; eax minus 1dec [eax]; eax offset minus 1dec [401000]; the value with an offset of 401000 is reduced from 1dec [eax + 401000]; The offset is the value of eax + 401000 minus 1.

Dec commands can mark ZF and

DIV ()

Syntax: DIV Divisor

The DIV command is used to divide EAX by the divisor (unsigned Division). The divisor is usually EAX and the result is stored in EAX. The divisor has the divisor.

Example:

Mov eax, 64; EAX = 64 h = 100mov ecx, 9; ECX = 9div ecx; EAX divided by ECX

After division, EAX = 100/9 = 0B (decimal: 11) and ECX = 100 MOD 9 = 1

Div commands can mark CF, OF, and ZF

IDIV (entire Division)

Syntax: IDIV Divisor

IDIV is executed in the same way as div, but IDIV is a signed Division.

Idiv commands can mark CF, OC, and ZF

IMUL (multiplication)

Syntax: IMUL Value

IMUL target register, value, and value

IMUL target register and Value

The IMUL command can multiply EAX by the previous number (INUL value) or two values and place the product in the target register (IMUL target register, value, value) or multiply the target register to a value (IMUL target register, value)

If the product is too large, the target register cannot be installed, then OF and CF will be marked, and ZF will also be marked.

INC (auto-increment)

Syntax: INC something

INC, in contrast to DEC, adds 1 to the value

The INC command can mark ZF and

INT

Syntax: Number of int targets

The INT value must be an integer (for example, int 21 h). Similar to the call function, the INT command calls the program to control the hardware. Different values correspond to different functions.

Refer to the hardware specification for details.

JUMPS

These are the most important jump commands and trigger conditions (important with the * mark, the most important with the ** mark ):

Command condition JA *-jump if the value is greater than (unsigned)-CF = 0 and ZF = 0JAE-jump if the value is greater than or equal to (unsigned) -CF = 0JB *-if the value is smaller than or equal to, jump (unsigned)-CF = 1JBE-if the value is smaller than or equal to, jump (unsigned) -CF = 1 or ZF = 1JC-If CF is marked, jump-CF = 1 JCXZ-If CX is equal to 0, jump-CX = 0JE **-If CX is equal, jump- ZF = 1 JECXZ-jump if ECX is equal to 0-ECX = 0JG *-jump if it is greater than (Signed) -ZF = 0 and SF = OF (SF = Sign Flag) JGE *-jump if it is greater than or equal to (Signed) -SF = OFJL *-jump if it is smaller than (Signed)-SF! = (! = Is not) JLE *-if it is less than or equal to, it will jump (with the symbol-ZF = 1 and! = OFJMP **-Jump-force jump to JNA-jump if not greater than (unsigned)-CF = 1 or ZF = 1 JNAE-jump if not greater than or equal to (unsigned) -CF = 1JNB-jump if not less than (unsigned)-CF = 0 JNBE-jump if not less than or equal to (unsigned) -CF = 0 and ZF = 0JNC-jump if CF is not marked-CF = 0JNE **-jump if not equal-ZF = 0JNG-jump if not greater (signed) -ZF = 1 or SF! = OFJNGE-jump (Signed) if not greater than or equal to-SF! = OFJNL-jump if not less than (Signed)-SF = OFJNLE-jump if not less than or equal to (Signed) -ZF = 0 and SF = OFJNO-jump if OF is not marked-OF = 0JNP-jump if PF is not marked-PF = 0JNS-jump if SF is not marked-SF = 0JNZ-jump if it is not equal to 0-ZF = 0JO-jump if OF is marked-OF = 1JP-jump if PF is marked-PF = 1JPE-jump if it is an even number -PF = 1JPO-jump if it is an odd number-PF = 0JS-jump if SF is marked-SF = 1JZ-jump if it is equal to 0-ZF = 1

LEA (transfer with a valid address)

Syntax: Number and source of LEA

LEA can be seen as a LEA Command similar to MOV. Its functions are not widely used, but are widely used in Fast Multiplication:

Lea eax, dword ptr [4 * ecx + ebx]

Assign eax to 4 * ecx + ebx

MOV (transfer)

Syntax: MOV number, source number

This is a simple command. The MOV command assigns the source number to the target number, and the source value remains unchanged.

Here are some MOV Deformation:

MOVS/MOVSB/MOVSW/movsd edi, ESI: These variants can transmit content directed by ESI to the content directed by EDI.

MOVSX: The MOVSX command expands a single word or single byte to dual-or dual-byte transmission, and the original symbol remains unchanged.

MOVZX: MOVZX expands the size of a single byte or word to dubyte or dubyte and fills the remaining part with 0 (in general, it is to extract the source number to the destination number, and fill the other bit with 0)

MUL (multiplication)

Syntax: MUL Value

This command is the same as IMUL, but MUL can be multiplied by the unsigned number.

NOP (no operation)

Syntax: NOP

This command does not do anything.

Therefore, it is widely used in reverse engineering.

OR (logical OR)

Syntax: Number of OR, source number

OR command to perform logic OR operation on two values

This command clears the OF and CF tags and sets the ZF tag.

To better understand OR, consider the following binary string:

10010101100101001101

If you perform logic and operation on them, the result is 1101011111.

The result is 0 only when both sides are 0. Otherwise, it is 1. You can use a calculator to calculate. I hope you can understand why. It is best to calculate it by yourself.

POP

Syntax: POP Destination Address

The POP Command sends the first word at the top of the stack to the destination address. After each POP operation, ESP (Stack pointer register) is added to point to the top of the new stack.

PUSH

Syntax: PUSH Value

PUSH is the opposite of POP. It pushes a value to the stack and reduces the value of the top pointer to point to the top of the new stack.

REP/REPE/REPZ/REPNE/REPNZ

Syntax: REP/REPE/REPZ/REPNE/REPNZ ins

Repeat the preceding command until CX = 0. Ins must be an operator, such as CMPS, INS, LODS, MOVS, OUTS, SCAS, or STOS

RET (return)

Syntax: RET

RET digit

The RET command exits from a code area to the CALL instruction area.

RET digit clears the stack before returning

SUB (minus)

Syntax: Number of SUB objects, source number

SUB is opposite to ADD. It deducts the number of sources from the number of destination and stores the result in the destination number.

SUB can mark ZF, OF, and CF

TEST

Syntax: TEST operator and operator

This command 99% is used for "test eax, EAX". It performs the same functions as AND but does not store data. If EAX = 0, ZF is marked. If EAX is not 0, ZF is cleared.

XOR

Syntax: Number of XOR objects, source number

The XOR command returns an exception or operation on two numbers.

This command clears OF and CF, but marks ZF

For better understanding, consider the following binary string:

10010101100101001101

If they are different, the result is 1100011011.

If the two values are equal, the result is 0. Otherwise, the result is 1. You can use the calculator to check.

In many cases, we use "xor eax, EAX". This operation is to assign an EAX value to 0, because when a value is different or its own, it is always 0. You 'd better try it on your own to help you better understand it.

VII. logical operators

The following are common logical operators:

Now, the Assembly basics required by reverse engineering have been supplemented. goodbye in the next article!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reverse Engineering (I): Basics of assembly and Reverse Engineering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reverse Engineering (I): Basics of assembly and Reverse Engineering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support