This is a creation in Article, where the information may have evolved or changed.
A video in GopherCon2017 explains how to implement a simple strace with Golang, and this article is based on this presentation.
What is a system call
First look at the definition of the wiki:
In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
A system call is the process by which a program requests a service from the operating system kernel, typically including hardware-related services (such as accessing a hard disk), creating a new process, and so on. System calls provide an interface between a process and the operating system.
Syscall everywhere
As long as you write programs on the OS, you can't avoid dealing with syscall. For the most common example, fmt.Println("hello world")
here is the system call write
, we turn the source.
func Fprintln(w io.Writer, a ...interface{}) (n int, err error) { p := newPrinter() p.doPrintln(a) // writer 是 stdout n, err = w.Write(p.buf) p.free() return}Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout")func (f *File) write(b []byte) (n int, err error) { if len(b) == 0 { return 0, nil } // 实际的write方法,就是调用syscall.Write() return fixCount(syscall.Write(f.fd, b))}
Zero-copy
One more example, we often hear of zero-copy, we see zero-copy is used to solve what problem.
read(file, tmp_buf, len);write(socket, tmp_buf, len);
Borrow a picture to illustrate the problem
- The first step, which
read()
causes context switch, enters kernel mode from user mode, and DMA (Direct memory access) engine reads the content from the disk and stores it in the kernel address buffer.
- In the second step, the data is copied from the kernel buffer into the user buffer,
read()
returned, and the context switches back to the user state.
- The third step, the
write()
context switch, copies the buffer to the kernel address buffer.
- Fourth step,
write()
return, fourth context switch, DMA engine transmits the data from the kernel buffer to the protocol engine, usually enters the queue, waits for the transmission.
We see that the data is copied back and forth between the user space and the kernel space, which is not necessary.
The solution is: mmap
, sendfile
, specifically, you can refer to this article
Here we should have a certain understanding of the system call.
Strace
strace
is the tool used to view process system calls, and is generally used as follows
Strace <bin>strace-p <pid>//is used to count the number of system calls Strace-c <bin>//such as strace-c Echo hellohello% time Secon DS Usecs/call calls errors syscall--------------------------------------------------------------0.00 0.0 00000 0 1 Read 0.00 0.000000 0 1 Write 0.00 0.000000 0 3 Open 0.00 0.000000 0 5 Close 0.00 0.000000 0 4 Fstat 0.00 0.000000 0 7 mmap 0.00 0.000000 0 4 Mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 B RK 0.00 0.000000 0 3 3 Access 0.00 0.000000 0 1 Execve 0.00 0.000000 0 1 arch_prctl-------------------------------------------------------------- 100.00 0.000000 3 Total
The implementation of Stace is the system call Ptrace, we see what Ptrace is.
Ptrace
The man page is described as follows:
The Ptrace () system call provides a means by which one process (the "tracer") may observe and control th E Execution of another process (the "Tracee"), and examine and change the Tracee ' s memory and registers. It is primarily used to implement breakpoint Debuggingand system call tracing.
In simple terms, there are three main competencies:
- Tracking system Calls
- Read and write memory and registers
- Passing signals to the tracked program
Interface
int ptrace(int request, pid_t pid, caddr_t addr, int data);request包含:PTRACE_ATTACHPTRACE_SYSCALLPTRACE_PEEKTEXT, PTRACE_PEEKDATA等
Tracer use the PTRACE_ATTACH
command to specify the PID to be traced. Immediately after the call PTRACE_SYSCALL
.
The Tracee will run until the system call is encountered and the kernel will stop executing. At this point, the Tracer will receive SIGTRAP
a signal, tracer can print the memory and the information in the Register.
Next, Tracer continues the call PTRACE_SYSCALL
, Tracee continues execution until Tracee exits the current system call.
It is important to note that Tracer will be aware of this when entering Syscall and exiting Syscall.
Mystrace
Knowing the above, presenter implemented a go version of Strace, which needs to be compiled in the Linux AMD64 environment.
GitHub
Strace.go
Package Mainimport ("FMT" "OS" "Os/exec" "Syscall") func main () {var err error var regs syscall. Ptraceregs var ss Syscallcounter ss = Ss.init () fmt. Println ("Run:", OS.) Args[1:]) cmd: = Exec.command (OS). ARGS[1], OS. Args[2:] ...) Cmd. Stderr = OS. Stderr cmd. Stdout = OS. Stdout cmd. Stdin = OS. Stdin cmd. Sysprocattr = &syscall. sysprocattr{ptrace:true,} cmd. Start () Err = cmd. Wait () if err! = Nil {fmt. Printf ("Wait err%v \ n", err)} pid: = cmd. Process.pid exit: = True for {//Remember that Ptrace_syscall pauses the tracee when entering and exiting SYSCALL, so this is controlled by a variable, and the contents of the Rax are printed only once If exit {err = Syscall. Ptracegetregs (PID, ®s) if err! = nil {break}//fmt. Printf ("% #v \ n", regs) Name: = Ss.getname (regs. Orig_rax) fmt. Printf ("Name:%s, ID:%d \ n", name, Regs. Orig_rax) Ss.inc (regs. Orig_rax)}//above Ptrace a request command mentioned Err = Syscall. Ptracesyscall (PID, 0) If err! = Nil {panic (ERR)}//guess is to wait for the process to enter the next stop, where if you do not wait, then you will print a large number of repeated tune With the function name _, err = Syscall. WAIT4 (PID, nil, 0, nil) if err! = Nil {panic (err)} exit =!exit} ss.print ()}
counter for statistical information, Syscallcounter.go
package mainimport ( "fmt" "os" "text/tabwriter" "github.com/seccomp/libseccomp-golang")type syscallCounter []intconst maxSyscalls = 303func (s syscallCounter) init() syscallCounter { s = make(syscallCounter, maxSyscalls) return s}func (s syscallCounter) inc(syscallID uint64) error { if syscallID > maxSyscalls { return fmt.Errorf("invalid syscall ID (%x)", syscallID) } s[syscallID]++ return nil}func (s syscallCounter) print() { w := tabwriter.NewWriter(os.Stdout, 0, 0, 8, ' ', tabwriter.AlignRight|tabwriter.Debug) for k, v := range s { if v > 0 { name, _ := seccomp.ScmpSyscall(k).GetName() fmt.Fprintf(w, "%d\t%s\n", v, name) } } w.Flush()}func (s syscallCounter) getName(syscallID uint64) string { name, _ := seccomp.ScmpSyscall(syscallID).GetName() return name}
Final Result:
Run: [echo hello]Wait err stop signal: trace/breakpoint trapname: execve, id: 59name: brk, id: 12name: access, id: 21name: mmap, id: 9name: access, id: 21name: open, id: 2name: fstat, id: 5name: mmap, id: 9name: close, id: 3name: access, id: 21name: open, id: 2name: read, id: 0name: fstat, id: 5name: mmap, id: 9name: mprotect, id: 10name: mmap, id: 9name: mmap, id: 9name: close, id: 3name: mmap, id: 9name: arch_prctl, id: 158name: mprotect, id: 10name: mprotect, id: 10name: mprotect, id: 10name: munmap, id: 11name: brk, id: 12name: brk, id: 12name: open, id: 2name: fstat, id: 5name: mmap, id: 9name: close, id: 3name: fstat, id: 5helloname: write, id: 1name: close, id: 3name: close, id: 3 1|read 1|write 3|open 5|close 4|fstat 7|mmap 4|mprotect 1|munmap 3|brk 3|access 1|execve 1|arch_prctl
Comparing the results, we can find the same as strace.
Presenter GitHub
YouTube video