Reverse Engineering (I): Basics of assembly and Reverse Engineering
This series of articles will explain the various knowledge of reverse engineering, which is easy to understand.
Assembly is the basis of reverse engineering. This article is not in-depth but covers all the basic knowledge you need to know when you first start learning assembly! Assembly language is the start point and end point of all programs. After all, all advanced languages are built on the basis of assembly. In many advanced languages, we need relatively clear syntaxes, but in assembly, we use abbreviations and numbers to express programs.
I. Unit, bit, and byte
· BIT-the minimum unit of computer data volume, which can be 0 or 1. For example: 00000001 = 1; 00000010 = 2; 00000011 = 3 · BYTE (bytes)-one BYTE contains eight digits, so the maximum value of one BYTE is 255 (0-255 ). To facilitate reading, we usually use hexadecimal notation. · WORD-a WORD consists of 16 bytes. The maximum value of a word is 0 FFFFh (or 65535d) (h Represents hexadecimal, d represents hexadecimal ). · Double word (double dword)-a double word contains two characters, a total of 32 characters. The maximum value is 0 FFFFFFFF (or 4294967295d ). · KILOBYTE (kilobytes)-The kilobytes are not 1000 bytes, but 1024 (32*32) bytes. · MEGABYTE-MB is not a MEGABYTE, but 1024*1024 = 1,048,578 bytes.
II. Registers
Registers are a "special place" for computers to store data ". You can think of registers as a small box. We can put many things in it, such as names, numbers, and a paragraph ......
Today, Windows + Intel CPUs usually consist of nine 32-bit registers (w/o mark registers ). They are:
EAX: accumulators EBX: base register ECX: Counter EDX: data register ESI: Source Address Register EDI: Destination Address Register EBP: extended base address pointer register ESP: Stack pointer register EIP: instruction pointer register
Generally, the register size is 32 bits (four bytes ). They can store data with values from 0-FFFFFFFF (unsigned. At first, most Register names imply their functions, such as ECX = count, but now you can use any register to count (only in some custom sections, count to use ECX ). When I use the EAX, EBX, ECX, EDX, ESI, and EDI registers, I will explain their functions in detail. So let's talk about EBP, ESP, and EIP first.
EBP: The EBP is the most widely used in the stack. At the beginning, there was nothing to pay special attention to.) ESP: ESP points to the top position of the stack in the stack area. Stack is a place to store data that will be used soon. You can search for the push/pop command to learn more about stack. EIP: The EIP points to the next command that will be executed.
Another thing worth noting is that some registers are 16-bit or even 8-bit, which cannot be directly addressable.
Including:
Generally, a register can be viewed as follows:
The figure shows that EAX is the name of this 32-bit register. The low 16-bit part of EAX is called AX, AX is divided into two independent registers: AH with a high 8-bit value and AL with a low 8-bit value.
Note: even if it is not important, you should at least know the following registers.
These registers help us differentiate the size:
I. single-byte (8-bit) registers: as the name suggests, these registers are all one byte (8-bit ):
AL and AHBL and BHCL and CHDL and DH
Ii. Single-word (16-bit) registers: these registers are one character (= 2 bytes = 16 bits ). A single-byte register contains two single-byte registers. We usually differentiate them based on their functions.
1. General registers:
AX (single word = 16 bits) = AH + AL-> in which the '+' number does not mean to add them together. The AH and AL registers are independent of each other, but they are only part of the AX registers. Therefore, if you change AH or AL (or both), the AX registers will also be changed. -> 'Accessulator' (accumulators): used for mathematical operations BX-> 'base' (base register): used to connect to the stack (will be described later) CX-> 'counter' (counter): DX-> 'data' (data register): used to store data in most cases DI-> 'destination Index' (destination Address Register ): for example, copy a string to DISI-> 'source Index' (source Address Register): for example, copy a string from SI
2. index register (pointer register ):
BP-> 'base pointer '(base address pointer register): indicates the base address SP of the stack region-> 'stack pointer' (stack pointer register): indicates the top address of the stack region.
3. segment register:
CS-> 'Code segment' (code segment register): used to store the segment Base Address of the application code segment (which will be described later) DS-> 'data segment' (data segment register ): the base address of the segment used to store the data segment (will be described later) ES-> 'extra segment '(additional segment register ): base Address of the additional data segment used by the program SS-> 'stack segment' (stack segment register): used to store the base address of the stack segment (will be described later)
4. Instruction Pointer register:
IP-> 'struction pointer '(instruction pointer register): points to the next instruction ;)
Iii. Dual-character (32-bit) registers:
2 Characters = 4 bytes = 32 characters, EAX, EBX, ECX, EDX, EDI ......
If 'E' is added before the 16-bit registers, it indicates that they are 32-bit registers. For example, AX = 16 bits and EAX = 32 bits.
III. Mark register
The flag register represents a certain state. There are 32 different Mark registers in 32-bit CPUs, but don't worry. We only care about three OF them: ZF, OF, and CF. In reverse engineering, you know the mark register to know whether the program will jump in this step. The mark register is a sign and can only be 0 or 1, they determine whether to execute a command.
Z-Flag (zero Flag ):
ZF is the most used register for cracking (usually 90%). It can be set to 0 or 1. If the result of the previous operation is 0, the value is 1; otherwise, the value is 0. (You may ask why the 'cmp 'operation can be performed on the ZF register. This is because the command is being compared (equal to or not equal to). When is the result 0 and 1? Later)
The O-Flag (overflow Flag ):
OF registers account for about 4% in reverse engineering. When the previous operation changes the maximum valid bit OF a register, OF registers are set to 1. For example, if the value of eax is 7 ffffff and you add 1 to EAX, The OF register is set to 1, because the maximum valid bits of the EAX Register have changed at this time (you can use the computer's built-in calculator to convert the hexadecimal value to a binary value ). In addition, when the previous operation overflows (that is, the arithmetic operation exceeds the expressed range OF the number OF symbols), The OF register is also set to 1.
The C-Flag (carry Flag ):
The use of carry-in registers accounts for approximately 1%. If overflow occurs, it is set to 1. For example, if the value of a register is FFFFFFFF and 1 is added, an overflow occurs. You can use the calculator that comes with your computer to try it.
IV. segment offset
One segment in the memory stores commands (CS), data (DS), stacks (SS), or other segments (ES ). Each segment has an offset. In a 32-bit application, the offset ranges from 00000000 to FFFFFFFF. The standard format of the segment and offset is as follows:
Segment: offset = put them together is a specific address in the memory.
You can see as follows:
A segment is a page of a book: offset is a line of a page
V. Stack
Stack is the place where the memory can store things that will be used later. It can be seen as a pile of books in a box, and the last one is always the first. You can also think of the stack as a paper box. The box is a stack, and each piece of paper represents a memory address. Remember this rule: the last piece of paper is taken out first. The 'push' command is used to load data into the stack. The 'pop' command is used to extract the final data from the stack and store it in a specific register.
VI. commands (alphabetic sorting)
Note that all values are usually stored in hexadecimal format.
Most commands have two operators (for example, add EAX and EBX), some are one operator (for example, not EAX), and some are three operators (for example: imul eax, EDX, 64 ). If you use "dword ptr [XXX]", the data with the offset [XXX] in the memory is used. Note: bytes are stored in the memory in reverse mode (Windows + Intel mostly uses the "small-end method", word ptr [XXX] (dual-byte) and byte ptr [XXX] (single-BYTE) also follow this rule ).
Most of the instructions with two operators are in the following form (for example, the add command ):
Add eax, ebx; register, register add eax, 123; register, value add eax, dword ptr [404000]; register, Dword pointer [value] add eax, dword ptr [eax]; register, Dword pointer [register value] add eax, dword ptr [eax + 00404000]; register, dword pointer [register value + value] add dword ptr [404000], eax; Dword pointer [value], register add dword ptr [404000], 123 ;; dword pointer [value], numeric value add dword ptr [eax], eax; Dword pointer [register value], register add dword ptr [eax], 123 ;; dword pointer [register value], value add dword ptr [eax + 404000], eax; Dword pointer [register value + value], register add dword ptr [eax + 404000], 123; Dword pointer [register value + value], Value
ADD (ADD)
Syntax: ADD
The addition command adds a value to a register or memory address.
Add eax, 123 = eax + 123;
Addition commands affect ZF, OF, and CF.
AND (logical AND)
Syntax: AND target number, original number
The AND operation performs logic AND operation on two numbers.
The AND command clears the OF, CF tag, AND sets the ZF tag.
To better understand AND, there are two binary numbers:
10010101100101001101
If you perform the AND operation on them, the result is 0001000100.
That is, true (1), otherwise false (0), you can use a calculator to verify.
CALL)
Syntax: CALL something
The CALL Command pushes the current relative address (IP) into the stack and calls the subprogram after the CALL.
CALL can be used as follows:
CALL 404000; most common: CALL address call eax; CALL register-if the value of the Register is 404000, it is equivalent to the first case call dword ptr [EAX]; the address to which the CALL [EAX] offset points: call dword ptr [EAX + 5]; CALL [EAX + 5] the address to which the offset points
CDQ
Syntax: CDQ
It is often hard to understand when the CDQ command appears for the first time. It usually appears before division to convert all the bits of EDX to the highest bits of EAX,
For example, when EAX> = H, the highest binary bit is 1, and EDX is assigned a value of 1 by 32 bits, that is, FFFFFFFF.
If EAX is <80000000, the highest binary value is 0, and EDX is 00000000.
Add EDX: EAX to the new number (64-bit): FFFFFFFF 80000000
CMP (comparison)
Syntax: Number of CMP targets, original number
The CMP command compares two values and marks CF, OF, and ZF:
Cmp eax, EBX; compare whether eax and ebx are equal. If they are equal, set ZF to 1CMP EAX, [404000]; compare whether the values of eax and offset [404000] are equal cmp [404000], EAX; and whether [404000] are equal to eax
DEC (auto-subtraction)
Syntax: DEC something
Dec uses a subtraction of 1, which is equivalent-
Dec can be used in the following ways:
Dec eax; eax minus 1dec [eax]; eax offset minus 1dec [401000]; the value with an offset of 401000 is reduced from 1dec [eax + 401000]; The offset is the value of eax + 401000 minus 1.
Dec commands can mark ZF and
DIV ()
Syntax: DIV Divisor
The DIV command is used to divide EAX by the divisor (unsigned Division). The divisor is usually EAX and the result is stored in EAX. The divisor has the divisor.
Example:
Mov eax, 64; EAX = 64 h = 100mov ecx, 9; ECX = 9div ecx; EAX divided by ECX
After division, EAX = 100/9 = 0B (decimal: 11) and ECX = 100 MOD 9 = 1
Div commands can mark CF, OF, and ZF
IDIV (entire Division)
Syntax: IDIV Divisor
IDIV is executed in the same way as div, but IDIV is a signed Division.
Idiv commands can mark CF, OC, and ZF
IMUL (multiplication)
Syntax: IMUL Value
IMUL target register, value, and value
IMUL target register and Value
The IMUL command can multiply EAX by the previous number (INUL value) or two values and place the product in the target register (IMUL target register, value, value) or multiply the target register to a value (IMUL target register, value)
If the product is too large, the target register cannot be installed, then OF and CF will be marked, and ZF will also be marked.
INC (auto-increment)
Syntax: INC something
INC, in contrast to DEC, adds 1 to the value
The INC command can mark ZF and
INT
Syntax: Number of int targets
The INT value must be an integer (for example, int 21 h). Similar to the call function, the INT command calls the program to control the hardware. Different values correspond to different functions.
Refer to the hardware specification for details.
JUMPS
These are the most important jump commands and trigger conditions (important with the * mark, the most important with the ** mark ):
Command condition JA *-jump if the value is greater than (unsigned)-CF = 0 and ZF = 0JAE-jump if the value is greater than or equal to (unsigned) -CF = 0JB *-if the value is smaller than or equal to, jump (unsigned)-CF = 1JBE-if the value is smaller than or equal to, jump (unsigned) -CF = 1 or ZF = 1JC-If CF is marked, jump-CF = 1 JCXZ-If CX is equal to 0, jump-CX = 0JE **-If CX is equal, jump- ZF = 1 JECXZ-jump if ECX is equal to 0-ECX = 0JG *-jump if it is greater than (Signed) -ZF = 0 and SF = OF (SF = Sign Flag) JGE *-jump if it is greater than or equal to (Signed) -SF = OFJL *-jump if it is smaller than (Signed)-SF! = (! = Is not) JLE *-if it is less than or equal to, it will jump (with the symbol-ZF = 1 and! = OFJMP **-Jump-force jump to JNA-jump if not greater than (unsigned)-CF = 1 or ZF = 1 JNAE-jump if not greater than or equal to (unsigned) -CF = 1JNB-jump if not less than (unsigned)-CF = 0 JNBE-jump if not less than or equal to (unsigned) -CF = 0 and ZF = 0JNC-jump if CF is not marked-CF = 0JNE **-jump if not equal-ZF = 0JNG-jump if not greater (signed) -ZF = 1 or SF! = OFJNGE-jump (Signed) if not greater than or equal to-SF! = OFJNL-jump if not less than (Signed)-SF = OFJNLE-jump if not less than or equal to (Signed) -ZF = 0 and SF = OFJNO-jump if OF is not marked-OF = 0JNP-jump if PF is not marked-PF = 0JNS-jump if SF is not marked-SF = 0JNZ-jump if it is not equal to 0-ZF = 0JO-jump if OF is marked-OF = 1JP-jump if PF is marked-PF = 1JPE-jump if it is an even number -PF = 1JPO-jump if it is an odd number-PF = 0JS-jump if SF is marked-SF = 1JZ-jump if it is equal to 0-ZF = 1
LEA (transfer with a valid address)
Syntax: Number and source of LEA
LEA can be seen as a LEA Command similar to MOV. Its functions are not widely used, but are widely used in Fast Multiplication:
Lea eax, dword ptr [4 * ecx + ebx]
Assign eax to 4 * ecx + ebx
MOV (transfer)
Syntax: MOV number, source number
This is a simple command. The MOV command assigns the source number to the target number, and the source value remains unchanged.
Here are some MOV Deformation:
MOVS/MOVSB/MOVSW/movsd edi, ESI: These variants can transmit content directed by ESI to the content directed by EDI.
MOVSX: The MOVSX command expands a single word or single byte to dual-or dual-byte transmission, and the original symbol remains unchanged.
MOVZX: MOVZX expands the size of a single byte or word to dubyte or dubyte and fills the remaining part with 0 (in general, it is to extract the source number to the destination number, and fill the other bit with 0)
MUL (multiplication)
Syntax: MUL Value
This command is the same as IMUL, but MUL can be multiplied by the unsigned number.
NOP (no operation)
Syntax: NOP
This command does not do anything.
Therefore, it is widely used in reverse engineering.
OR (logical OR)
Syntax: Number of OR, source number
OR command to perform logic OR operation on two values
This command clears the OF and CF tags and sets the ZF tag.
To better understand OR, consider the following binary string:
10010101100101001101
If you perform logic and operation on them, the result is 1101011111.
The result is 0 only when both sides are 0. Otherwise, it is 1. You can use a calculator to calculate. I hope you can understand why. It is best to calculate it by yourself.
POP
Syntax: POP Destination Address
The POP Command sends the first word at the top of the stack to the destination address. After each POP operation, ESP (Stack pointer register) is added to point to the top of the new stack.
PUSH
Syntax: PUSH Value
PUSH is the opposite of POP. It pushes a value to the stack and reduces the value of the top pointer to point to the top of the new stack.
REP/REPE/REPZ/REPNE/REPNZ
Syntax: REP/REPE/REPZ/REPNE/REPNZ ins
Repeat the preceding command until CX = 0. Ins must be an operator, such as CMPS, INS, LODS, MOVS, OUTS, SCAS, or STOS
RET (return)
Syntax: RET
RET digit
The RET command exits from a code area to the CALL instruction area.
RET digit clears the stack before returning
SUB (minus)
Syntax: Number of SUB objects, source number
SUB is opposite to ADD. It deducts the number of sources from the number of destination and stores the result in the destination number.
SUB can mark ZF, OF, and CF
TEST
Syntax: TEST operator and operator
This command 99% is used for "test eax, EAX". It performs the same functions as AND but does not store data. If EAX = 0, ZF is marked. If EAX is not 0, ZF is cleared.
XOR
Syntax: Number of XOR objects, source number
The XOR command returns an exception or operation on two numbers.
This command clears OF and CF, but marks ZF
For better understanding, consider the following binary string:
10010101100101001101
If they are different, the result is 1100011011.
If the two values are equal, the result is 0. Otherwise, the result is 1. You can use the calculator to check.
In many cases, we use "xor eax, EAX". This operation is to assign an EAX value to 0, because when a value is different or its own, it is always 0. You 'd better try it on your own to help you better understand it.
VII. logical operators
The following are common logical operators:
Now, the Assembly basics required by reverse engineering have been supplemented. goodbye in the next article!