This is a creation in Article, where the information may have evolved or changed. The JIT (just-int-time) compiler is generated by any program that is converted to machine code. JIT Code and other code (for example, FMT. PRINTLN) differs in that the JIT code is generated during the run. Programs written in Golang are static types and are compiled in advance. It seems impossible to generate arbitrary code, let alone execute the code described. However, you can send instructions to a running process. This is done using the type Magic-the ability to convert any type to any other type. Please note that if you are interested in learning more about the Type Magic, please leave a comment below and I will write the article later. # # # x64 the JIT compiler machine code on the instruction set is a series of bytes that have a special meaning to the processor. The machine used to write this blog and test the code was using the x64 processor, so I used the [' x64 ' instruction set] (https://software.intel.com/en-us/articles/introduction-to-x64-assembly). The following code must be run on the x64 processor. # # Generate x64 code print "Hello World" in order to print "Hello World", the system call should instruct the processor to print the data. The system call to print the data is [write (int fd,const void * buf,size_t count)] (http://man7.org/linux/man-pages/man2/write.2.html). The first parameter of this system call is the location to be written, expressed as a file descriptor. Printing output to the console is accomplished by writing to the standard file descriptor stdout. The file descriptor number for STDOUT is 1. The second parameter is the location of the data that must be written. More information on this is provided in the next section. The third operand is count-that is, the number of bytes to write. In the "Hello world! "In the case, the number of bytes to write is 12. In order to make a system call, you need to save three operands in a specific register. Here is a table that shows the registers that hold the operands. Syscall # | Param 1 | Param 2 | Param 3 | Param 4 | Param 5 | Param 6 | |:-----: |:-----: |:-----: |:-----: |:-----: |:-----: |:-----: |:-----: | |rax|rdI|rsi|rdx|r10|r8|r9 put all this together, here is a series of bytes representing the instructions that initialize some registers. "' '" 0:48 C7 C0, xx rax,0x1 7:48 C7 C7 (xx) xx, rdi,0x1 e:48 C7 c2 0c xx/xx mov rdx,0xc '-The first instruction will Rax is set to 1-Indicates a write system call. -The second instruction sets RDI to 1-represents the stdout file descriptor-The third instruction sets the RDX to 12 to indicate the number of bytes to print. -bit missing data, actually called write so in order to specify that contains "Hello world! The location of the data, the data needs to have a location first-that is, it needs to be stored in an in-memory location. means "Hello world! "The byte sequence is 6c 6c 6f, 6f, 6c 64 21. This should be stored in a location where the processor will not attempt to execute. Otherwise, the program throws a segment error (segmentation fault). In this case, the data can be stored at the end of the executable instruction-that is, after the instruction is returned. Storing data after the return instruction is safe because the processor "jumps" to a different address when it encounters a return and does not execute sequentially. Since the return address is not known until the return instruction is laid out, it is possible to use its temporary placeholder and replace it with the correct address once the address of the data is known. This is the exact procedure that the connector follows. The linking process simply fills in these addresses to point to the correct data or function. "' 15:48 8d" rsi,[rip+0x0] # 0x15 1c:0f syscall 1e:c3 ret ' ' in the code above, load "Hello world! "The Lea instruction of the address points to itself (pointing to a location that is 0 bytes away from RIP). This is because the data is not stored and the data address is unknown. The system call itself is represented by a byte sequence of 0F 05. Data can now be stored because the return instruction has been placed. "' 1f:48 6c 6c 6f" 6f + 6c +//Hello world! "" Throughout the program, we can now update the instructions to point to the data. The following is the updated code: ' ' ' 0:48 C7 C0-XX, MOV rax,0x17:48 C7 C7-XX mOV rdi,0x1e:48 C7 C2 0c (XX) 8d mov rsi,[rip+0x3 0x1f1c:0f] # syscall1e:c3 rdx,0xc15:48 Ret1 F:48 6c 6c 6f, 6f, 6c,//Hello world! The code above can be represented as a fragment of any basic type in Golang. An array of type uint16/slice is a good choice because it can save pairs of small-ended ordered words while still maintaining readability. Here is the ' []uint16 ' data structure ' that holds the above program goprintfunction: = []uint16{0x48c7,0xc001, 0x0,//MOV%rax, $0x10x48,0xc7c7,0x100,0x0,//M OV%rdi, $0x10x48c7, 0xc20c, 0x0,//MOV 0x13,%rdx0x48, 0x8d35, 0x400, 0x0,//Lea 0x4 (%rip),%rsi0xf05,//syscall0xc3cc ,//ret0x4865, 0X6C6C, 0x6f20,//Hello_ (whitespace) 0x576f, 0x726c, 0x6421, 0xa,//world!} The above bytes are slightly biased compared to the bytes listed above. This is because when it aligns with the start of a slice entry, it is clearer (easier to read and debug) to represent the data "Hello world! ”。 Therefore, I use the Fill instruction cc Directive (no action) to push the start of the data section to the next entry in slice. I also updated the LEA to point to a 4-byte position to reflect this change. Note: You can find system call numbers for various system calls in this ' [link] (https://filippo.io/linux-syscall-table/). # # The instructions in the Convert slice function ' []uint16 ' data structure must be converted to a function so that it can be called. The following code demonstrates this transformation. "' Gotype printfunc func () Unsafeprintfunc: = (uintptr) (unsafe. Pointer (&printfunction)) Printer: = * (*printfunc)(unsafe. Pointer (&unsafeprintfunc)) printer () "Golang function value is just a pointer to a C function pointer (note level two pointers). The transformation from the slice to the function first extracts a pointer to the data structure that holds the executable code. This is stored in the Unsafeprintfunc. Pointers to Unsafeprintfunc can be converted to the desired function type. This method applies only to functions that have no parameters or return values. You need to create a stack frame for a function that invokes a parameter or return value. Function definitions should always start with instructions to dynamically allocate stack frames to support variadic functions. For more information about the different function types, see [here] (https://docs.google.com/document/d/1bMwCey-gmqZVTpRax-ESeVuZGmjwbocYs1iHplK-cjo/pub). If you want me to write information about generating more complex functions in Golang, please comment below. # # makes the function executable above the function does not actually run. This is because Golang stores all data structures in the data portion of the binaries. The data in this section sets the [No-execute] (https://en.wikipedia.org/wiki/NX_bit) flag to prevent it from executing. The data in the printfunction slice needs to be stored in an executable memory. This can be done by removing the No-execute flag on the printfunction slice or copying it to an executable memory location. In the following code, the data has been copied into a newly allocated executable memory (using MMAP). This method is better because only the entire page is set to not execute the flag-it is easy to make other parts of the data section impossible to execute. "' Goexecutableprintfunc, err: = Syscall. Mmap ( -1,0,128, Syscall. Prot_read | Syscall. Prot_write | Syscall. Prot_exec, Syscall. Map_private|syscall. map_anonymous) If err! = Nil {fmt. Printf ("Mmap err:%v", err)}j: = 0for I: = Range printfunction {executableprintfunc[j] = byte (Printfunction[i] >>8) executableprintfunc[j+1] = byte (Printfunction[i]) j = j + 2} "' Flag syscall. Prot_exec ensure that the newly allocated memory address is executable. Converting this data structure to a function will make it run smoothly. The following is the complete code, which attempts to run on the x64 machine. "' Gopackage mainimport (" FMT "" syscall "" unsafe ") type Printfunc func () func main () {printfunction: = []uint16{0x48c7, 0xc001, 0x0,//MOV%rax,$0x10x48, 0xc7c7, 0x100, 0x0,//MOV%rdi,$0x10x48c7, 0xc20c, 0x0,//MOV 0x13,%rdx0x48, 0x8d35, 0x400, 0x0,//Lea 0x4 (%rip),%rsi0xf05,//syscall0xc3cc,//ret0x4865, 0X6C6C, 0x6f20,//Hello_ (whitespace) 0x576f, 0x7 26c, 0x6421, 0xa,//world!} Executableprintfunc, err: = Syscall. Mmap ( -1,0,128,syscall. Prot_read|syscall. Prot_write|syscall. Prot_exec,syscall. Map_private|syscall. map_anonymous) If err! = Nil {fmt. Printf ("Mmap err:%v", err)}j: = 0for I: = Range printfunction {executableprintfunc[j] = byte (Printfunction[i] >> 8) Executableprintfunc[j+1] = byte (Printfunction[i]) j = j + 2}type printfunc func () Unsafeprintfunc: = (uintptr) (unsafe. Pointer (&executableprintfunc)) Printer: = * (*printfunc) (unsafe. POinter (&unsafeprintfunc)) printer ()} "# # Conclusion try the above source code. Please look forward to Golang's Deep Exploration!
via:https://medium.com/kokster/writing-a-jit-compiler-in-golang-964b61295f
Author: Sidhartha Mani Translator: jiangwei161002010 proofreading: polaris1119
This article by GCTT original compilation, go language Chinese network honor launches
This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove
935 Reads