Csapp, the third edition of "in-depth understanding of computer systems: A programmer's Perspective", is a good book, but it needs to be quite basic in reading. Moreover, some of the expressions are not straightforward.
For example, the No. 463 page mentions why the assembler sets the initial value of the reference in the call instruction to-4 (for 32-bit systems). Explained vague later. In conjunction with the expansion of the Code calculation formula:
*refptr = (unsigned) (ADDR (R.symbol) + *refptr-refaddr)
= (unsigned) (0x80483c8 + ( -4)-0X80483BB)
= (unsigned) (0x9)
Still confused.
In fact, as long as the inverse of the calculation, it is very good understanding. is the memory image after redirection is complete:
0x80483b4. Text
...
0X80483BA call 80483c8<swap>; E8 -XX
▲
0X80483BB--
0X80483BF (Next instruction)
...
0x80483c8 Swap ()
...
The question is as follows:
Known: the. Text section is relocated to 0X80483B4, and the call invocation is in 0X80483BA, and the Swap () function is relocated to 0X80483C8.
Ask: After relocation, the operation code of the call instruction (the so-called "relocation Reference" in the book) should be a few? (i.e. what is the offset of the swap function with respect to the PC when executing the call instruction?) )
First of all, the swap function is 0x9 relative to the call instruction when the PC is offset, that is, the complete machine code for the call instruction is E8 09 00 00 00.
Why is it? Because when the call command is executed, the PC value is already the address of the next instruction, which is 0X80483BF. In order to complete the call, the PC value is added as the number of calls and the result is the new PC value. That is 0X80483BF + 0x9 = 0x80483c8, exactly the address of the swap () function.
Next, the initial value of the reference in the call instruction-4 is what to do. I deformed the preceding formula:
*refptr = (unsigned) (ADDR (R.symbol) + *refptr-refaddr)
= (unsigned) (ADDR (R.symbol) + (*REFPTR-REFADDR))
= (unsigned) (0x80483c8 + ( -4-0X80483BB))
= (unsigned) (0X80483C8-0X80483BF)
= (unsigned) (0x9)
As you can see, -4 is set in order to translate the address of the operand itself from the call instruction to the address of the next instruction. For 32-bit machines, the call instruction is 5 bytes long, the opcode is 1 bytes, and the operand is exactly 4 bytes. The call command is located in 0x80483ba, where the operand is 0x80483bb, and 4 bytes is exactly the address of the next instruction.
So why use 4 instead of +4? In fact, it is very simple, which is related to the calculation formula used by the link program.
Csapp One of the reading essays: Why the assembler sets the initial value of the reference in the call instruction to-4