I wrote it using NASM, running on 32-bit Windows and Linux hosts, but later the demand increased and needed to run on 64-bit Windows and Linux, and Windows itself had a WOW (Windows on Windows) mechanism, 32-bit programs can run on 64-bit machines without porting at all, while Linux does not have a LOL mechanism (Linux on Linux, not laugth out loud ha, hehe ~), but Linux can install Ia-libs libraries (IA should be Intel x86 Archive to the LOL effect, however, compiling ELF64 and Win64obj is also interesting to me, so I want to transplant the program!
The first is to understand the CPU, register, basically all of the 32-bit registers are upgraded, eax into a rax,ebx into RBX, and so on, their bandwidth has become longer, use naturally also cool, one processing 8 bytes, one step can do a lot of previous operations need a few steps. Register to increase the R8,R9,R10,R11,R12,R13,R14,R15, so many registers, but also less how much memory to do intermediate variables, high efficiency, can save their own use is R12-R15, formerly generally only esi,edi,ebx three registers as their own save, Now, there are R12-R15,RBX, a total of 5! Why not RSI and RDI? Well, in a Linux system, these two registers are used as parameter passing on a 64-bit CPU, so they are not generally used for saving, but it is important to rsi,rdi the two registers, and the LODSB,STOSB and the like are still in the RSI, Rdi Save the source address and destination address. This, I think did very bad, why not to take the new register to pass parameters, biased to use my beloved RSI and RDI register it ... I don't do cpu, I can't complain! Complaints to complain, in this case, to facilitate the transplant, it is best not to use the LODSB and other instructions, but directly with the base address and address the way to access memory.
Next is the function call, the Unix 6 ABI rules the use of RDI,RSI,RDX,RCX,R8,R9 to pass on the first two parameters, less than 6, in the order of the above, to a few with a few, more than 6, the first 6 in the order of the above in the Register, the remaining from the back forward to press into the stack, and then, Set the rax=0, and finally use the call command to invoke the function, if more than 6 parameters, after the function returned need to repair the stack, you previously pressed a few parameters, put the top pointer back a few * 8 bytes to balance the stack. Note that the ABI rules of windows are not the same!
Another 64-bit CPU does not support the 32-bit register directly into the stack, so, sorry, your push eax is not available, use the push Rax,pop Rax. However, the direct manipulation of the stack pointer Rsp/esp is a way to compile on both 32-bit and 64-bit CPUs, without problems, and to continuously push multiple numeric values (such as function calls), often one-time minus the ESP/RSP, and then with the base address add the address of the form of the parameters, will be more efficient than a single push parameter! When GCC makes API calls, that's how it's done, so in fact it's better to write a compilation than GCC, and without notice, the C program compiled by GCC will be more efficient than the program written by the sink. I generally formal projects are in C language, but NASM can let me understand deeper, this is speechless!!
And the function of its own implementation, or can be used in the previous C-call way, as follows:
12345678910 |
function: %define param1 rbp+16 %define param2 rbp+24 %define param3 rbp+32 enter 16,0 %define local1 rbp-8 %define local2 rbp-16 ... leave ret |
Finally, the problem that bothered me at the time of porting is the return value of the C function, the return value of the C function in the 64-bit CPU is not in Rax, but in the edx:eax. In fact, most of the functions are no problem, generally in return-1, the problem is out, Edx:eax is-1, but Rax is not-1, high 32 bits are all 0. Low 32 bits are all 1.
Now time is not much, next time write an article detailed discussion.
Before the end, reference part of the C language document.
==========================================
Interfacing HLL code with ASM
C Calling Convention–standard stack frame
Arguments passed to a C function was pushed onto the stack, right to left, and before the function is called. The first thing the called function does is push the (e) BP Register and then copy (e) SP into it. This creates a data structure called, the standard C stack frame.
|
32-bit Code |
16-bit code, TINY, SMALL, or COMPACT memory models |
16-bit code, MEDIUM, LARGE, or HUGE memory models |
Create standard stack frame, allocate-bytes for local variables, save registers |
Push EBP MOV Ebp,esp Sub esp,16 Push EDI Push ESI ... |
Push BP MOV bp,sp Sub sp,16 Push di Push SI ... |
Push BP MOV bp,sp Sub sp,16 Push di Push SI ... |
Restore registers, destroy stack frame, and return |
... Pop esi Pop EDI MOV ESP,EBP Pop EBP Ret |
... Pop si Pop di MOV sp,bp Pop bp Ret |
... Pop si Pop di MOV sp,bp Pop bp Retf |
Size of ' slots ' in stack frame, i.e. stack width |
Bits |
+ Bits |
+ Bits |
Location of stack frame ' slots ' |
[Ebp + 8] [Ebp + 12] [Ebp + 16] ... |
[bp + 4] [bp + 6] [bp + 8] ... |
[bp + 6] [bp + 8] [bp + 10] ... |
If the argument passed to a function is wider than the stack, it'll occupy more than one ' slot ' in the stack frame. A 64-bit value passed to a function (long long or double) would occupy 2 stack slots in 32-bit code or 4 stacks slots in 16- Bit code.
Function arguments is accessed with positive offsets from the BP or EBP registers. Local variables is accessed with negative offsets. The previous value of BP or EBP is stored at [BP + 0] or [ebp + 0]. The return address (IP or EIP) is stored at [bp + 2] or [EBP + 4].
C Calling Convention–return values
A C function usually stores its return value in one or more registers.
|
32-bit Code |
16-bit code, all memory models |
8-bit return value |
AL |
AL |
16-bit return value |
Ax |
Ax |
32-bit return value |
EAX |
Dx:ax |
64-bit return value |
Edx:eax |
Space for the return value was allocated on the stack of the calling function, and a ' hidden ' pointer to this space is pass Ed to the called function |
128-bit return value |
Hidden pointer |
Hidden pointer |
C Calling Convention–saving Registers
GCC expects functions to preserve the Callee-save registers:
EBX, EDI, ESI, EBP, DS, ES, SS
You need not save these registers:
EAX, ECX, EDX, FS, GS, eflags, floating point registers
In some OSes, FS or GS is used as a pointer to thread local storage (TLS), and must is saved if you modify it.
C calling Convention–leading underscores
Some C compilers (those for DOS and Windows, and those with COFF output) prepend a underscore to the names of C functions and global variables. If a C global variable, e.g. conv_mem_size, is accessed by ASM code, it should being declared with a leading underscore in th E ASM Code:
EXTERN _conv_mem_size; NASM syntax
mov [_conv_mem_size],ax
Linux ELF does not use underscores. Watcom C uses trailing underscores for function names, and leading underscores for global variables.
If your GCC supports it, leading underscores can be turned off with the compiler Option-fno-leading-underscore
Pascal Calling conventions
Function arguments is pushed onto, the stack from left to right before the function is called. C-style variable-length argument lists is not possible in Pascal. (Look in file Stdarg. H and think about it.)
In C, the calling function must "clean up the Stack" (remove function arguments from the stack after the called F Unction returns). In Pascal, the called function must does this, before returning.
Pascal identifiers is case-insensitive. Mykewlproc () would be stored in the object code file as Mykewlproc
Other calling conventions
the __stdcall calling convention, used by Windows, is a hybrid of the C and Pascal calling conventions. Like C, function arguments is pushed right-to-left. Like Pascal, the called function must the stack. Exception: The caller must clean up the stack for functions that accept a variable number of arguments, e.g. Prin TF (const char *format, ...);
Watcom C uses a register-based calling convention. See sections 7.4, 7.5, 10.4, and 10.5 in Cuserguide.pdf in the Watcom documentation. Individual functions can is declared to use the normal, stack-based calling convention.
GCC can is made to use a register calling convention by compiling with gcc-mregparm=nnn ...
See the GCC documentation for details.
ASM 32/64