This is a creation in Article, where the information may have evolved or changed.
This paper analyzes the Golang socket file descriptor and the principle of goroutine blocking scheduling. Most of the code is Go code, and a small part is assembly code. A complete understanding of this article requires go language knowledge, and a network program is written with Golang. More importantly, it is necessary to understand the Goroutine scheduling principle in advance.
1. Connection object for TCP:
Connection object:
There is a named interface in Net.go Conn
that provides read-write and other operations for the connection:
type Conn interface { Read(b []byte) (n int, err error) Write(b []byte) (n int, err error) Close() error LocalAddr() Addr RemoteAddr() Addr SetReadDeadline(t time.Time) error SetWriteDeadline(t time.Time) error}
This interface is an abstraction of the structure below conn
. The conn
struct contains read-write and other operations on the connection:
type conn struct { fd *netFD}
To read data from a connection:
// Read implements the Conn Read method.func (c *conn) Read(b []byte) (int, error) { if !c.ok() { return 0, syscall.EINVAL } return c.fd.Read(b)}
To write data to a connection:
// Write implements the Conn Write method.func (c *conn) Write(b []byte) (int, error) { if !c.ok() { return 0, syscall.EINVAL } return c.fd.Write(b)}
To close the connection:
// Close closes the connection.func (c *conn) Close() error { if !c.ok() { return syscall.EINVAL } return c.fd.Close()}
Set read-write timeout:
// SetDeadline implements the Conn SetDeadline method.func (c *conn) SetDeadline(t time.Time) error { if !c.ok() { return syscall.EINVAL } return c.fd.setDeadline(t)}// SetReadDeadline implements the Conn SetReadDeadline method.func (c *conn) SetReadDeadline(t time.Time) error { if !c.ok() { return syscall.EINVAL } return c.fd.setReadDeadline(t)}// SetWriteDeadline implements the Conn SetWriteDeadline method.func (c *conn) SetWriteDeadline(t time.Time) error { if !c.ok() { return syscall.EINVAL } return c.fd.setWriteDeadline(t)}
As you can see, all operations on the connection are reflected in the operation of the pair *netFD
. We continue to trace the c.fd.Read()
function.
2. File descriptors
net/fd_unix.go:
File descriptor for network connection:
// Network file descriptor.type netFD struct { // locking/lifetime of sysfd + serialize access to Read and Write methods fdmu fdMutex // immutable until Close sysfd int family int sotype int isConnected bool net string laddr Addr raddr Addr // wait server pd pollDesc}
The file descriptor reads the data:
func (fd *netFD) Read(p []byte) (n int, err error) { if err := fd.readLock(); err != nil { return 0, err } defer fd.readUnlock() if err := fd.pd.PrepareRead(); err != nil { return 0, &OpError{"read", fd.net, fd.raddr, err} } // 调用system call,循环从fd.sysfd读取数据 for { // 系统调用Read读取数据 n, err = syscall.Read(int(fd.sysfd), p) // 如果发生错误,则需要处理 // 并且只处理EAGAIN类型的错误,其他错误一律返回给调用者 if err != nil { n = 0 // 对于非阻塞的网络连接的文件描述符,如果错误是EAGAIN // 说明Socket的缓冲区为空,未读取到任何数据 // 则调用fd.pd.WaitRead, if err == syscall.EAGAIN { if err = fd.pd.WaitRead(); err == nil { continue } } } err = chkReadErr(n, err, fd) break } if err != nil && err != io.EOF { err = &OpError{"read", fd.net, fd.raddr, err} } return}
Network Polling device
The
Network poll is the polling mechanism established for each socket file descriptor in Golang. Polling here is not a general poll, but the Golang runtime calls epoll_wait to get all socket file descriptors that generate IO events after scheduling goroutine or GC completion, or within a specified time. Of course, before the runtime polls, you need to add the socket file descriptor and the current Goroutine information to the Epoll maintained data structure and suspend the current goroutine when IO is ready. The execution of the current goroutine is resumed by epoll the file descriptor returned and the information that accompanies the goroutine.
Integrated Network Poller (platform-independent part).//Web polling (Platform Independent section)//A particular implementation (Epoll/kqueue) MU St define the following functions://the actual implementation (Epoll/kqueue) must define the following functions://Func Netpollinit ()//to initialize the poll Er, initialize the poll//func Netpollopen (fd uintptr, PD *polldesc) Int32//to arm edge-triggered notifications, initiate edge trigger notification for FD and PD//and Associate FD with pd.//a implementation must call the following function to indicate that PD is ready//an implementation must called the following function to denote the PD is ready.//func netpollready (gpp **g, PD *polldesc, mode Int32)//Polldesc contains 2 binary semaphores, RG and WG, to Park Reader and writer//Goroutines respectively. The semaphore can in the following states://Polldesc contains 2 binary signals, each of which is responsible for reading and writing goroutine pauses.//The signal may be in the following state://The Pdready-io ready notification is Suspend;//a Goroutine the secondary state to nil to consume a notification. PDREADY-IO Readiness Notification is pending;//a goroutine consumes the notification by changing the state To nil.//pdwait-a goroutine is ready to pause on the signal, but no pause has been completed. // This goroutine by changing this state to the G-pointer to commit the pause action. Alternatively, alternate, parallel, other notifications change the state to ready.//or, alternatively, parallel timeouts/shutdowns turn the secondary state into nil//Pdwait-a Goroutine prepares to park in the semaphore, but Not yet parked;//the Goroutine commits to park by changing the state to G pointer,//or, alternatively, Concurrent IO notification changes the state to ready,//or, alternatively, concurrent timeout/close changes the State to nil.//g pointer-goroutine//IO notification blocking on signal or timeout/shutdown will either be set to Ready or nil.//G pointer-the Goroutine is blocked on the SE maphore;//IO notification or timeout/close changes the state to ready or nil respectively//and U NParks the goroutine.//nil-nothing of the Above.const (pdready uintptr = 1 pdwait uintptr = 2)
The data structure of the
Network polling device is as follows:
Network Poller descriptor.//Net-poll descriptor type POLLDESC struct {link *polldesc/in Pollcache, protected by POLLCACHE.L Ock//The lock protects Pollopen, Pollsetdeadline, Pollunblock and Deadlineimpl operations. This fully covers SEQ, rt and WT variables. FD is constant throughout the POLLDESC lifetime. Pollreset, Pollwait, pollwaitcanceled and runtime Netpollready (IO readiness Notification)//proceed w/o taking the Lock. So closing, RG, RD, WG and WD is manipulated//in a lock-free-by-all operations. Note (Dvyukov): The following code uses UINTPTR to store *g (RG/WG),//That would blow up when GC starts moving objec Ts. The lock lock Object protects Pollopen, Pollsetdeadline, Pollunblock, and Deadlineimpl operations. And these operations are completely included on the SEQ, RT, TW variables. FD is a constant in the whole life of Polldesc. Handles Pollreset, pollwait, pollwaitcanceled, and Runtime.netpollready (IO ready notifications) without the need for locks. So closing, RG, RD, WG, and WD all operate as a lock-free operation. Lock Mutex//protectes the following fields FDUIntPtr closing bool Seq UINTPTR//protects from stale timers and ready notifications RG UIntPtr Pdready, pdwait, G waiting for read or nil RT Timer//Read deadline timer (set if rt.f! = Nil RD Int64//Read deadline WG UIntPtr//Pdready, pdwait, G waiting for write or nil W T timer//write deadline timer WD Int64//write deadline user unsafe. Pointer//user settable Cookie}
Set the current goroutine to block on FD:
pd.WaitRead():
func (pd *pollDesc) WaitRead() error { return pd.Wait('r')}func (pd *pollDesc) Wait(mode int) error { res := runtime_pollWait(pd.runtimeCtx, mode) return convertErr(res)}
RES is the result returned by the runtime_pollwait function and returned by the Coneverterr function after wrapping:
func convertErr(res int) error { switch res { case 0: return nil case 1: return errClosing case 2: return errTimeout } println("unreachable: ", res) panic("unreachable")}
- The function returns 0, indicating that IO is ready to return nil.
- Returns 1, indicating that the connection is closed and should be put back in errclosing.
- Returns 2, indicating that the operation on Io has timed out and should return errtimeout.
Runtime_pollwait will call the function in RUNTIME/THUNK.S:
TEXT net·runtime_pollWait(SB),NOSPLIT,$0-0 JMP runtime·netpollWait(SB)
This is a wrapper function, without parameters, jump directly to the function netpollwait in Runtime/netpoll.go:
func netpollWait(pd *pollDesc, mode int) int { // 检查pd的状态是否异常 err := netpollcheckerr(pd, int32(mode)) if err != 0 { return err } // As for now only Solaris uses level-triggered IO. if GOOS == "solaris" { onM(func() { netpollarm(pd, mode) }) } // 循环中检查pd的状态是不是已经被设置为pdReady // 即检查IO是不是已经就绪 for !netpollblock(pd, int32(mode), false) { err = netpollcheckerr(pd, int32(mode)) if err != 0 { return err } // Can happen if timeout has fired and unblocked us, // but before we had a chance to run, timeout has been reset. // Pretend it has not happened and retry. } return 0}
netpollcheckerr
The function checks if the PD has an exception:
// 检查pd的异常func netpollcheckerr(pd *pollDesc, mode int32) int { // 是否已经关闭 if pd.closing { return 1 // errClosing } // 当读写状态下,deadline小于0,表示pd已经过了超时时间 if (mode == 'r' && pd.rd < 0) || (mode == 'w' && pd.wd < 0) { return 2 // errTimeout } // 正常情况返回0 return 0}
netpollblock():
Returns True if IO is ready, or False if TimedOut or closed//waitio-wait only for completed IO, ignore errors//this function Netpollwait Loop Call//return TRUE indicates IO is ready, return false indicates IO operation has timed out or has been turned off Func netpollblock (PD *polldesc, mode int32, Waitio bool) BOOL {//Get the RG GPP for PD: = &PD.RG//If mode is W, then get PD WG If mode = = ' W ' {gpp = &PD.WG}//Set The GPP semaphore to WAIT//the GPP that sets PD in the loop is pdwait//Because CASUINTPTR is a spin lock, it needs to be called in the loop for {//If the IO is found in the loop is ready ( RG or WG of PG is Pdready State)//set RG/WG to 0, return true old: = *GPP if old = = Pdready {*gpp = 0 return true}//After each netpollblock execution, the GPP resets to 0//non 0 for repeat wait if old! = 0 {Gothr ow ("Netpollblock:double Wait")}//CAs operation to change GPP to pdwait if Casuintptr (GPP, 0, pdwait) {bre AK}}//need to recheck error states after setting GPP to WAIT//This is necessary because runtime_poll Unblock/runtime_pollsetdeadline/deadlineimpl//Do the Opposite:store to CLOSING/RD/WD, Membarrier, load of RG/WG////When the GPP is set to pdwait state, recheck GP P's Status//this is necessary because Runtime_pollunblock/runtime_pollsetdeadline/deadlineimpl will do the opposite operation//if the status is normal then suspend the current Goroutine//// When Netpollcheckerr checks io for a timeout or error, Waitio is true to wait for ioready//otherwise when Waitio is false and Io does not appear to be error or timeout will suspend current goroutine if Waitio | | Netpollcheckerr (PD, mode) = = 0 {//unlock function, set GPP to Pdwait, if the setting is unsuccessful//indicates that another event has occurred, you can let G continue running instead of suspending the current G f: = n Etpollblockcommit//Try to suspend the current G gopark (* * (**unsafe. Pointer) (unsafe. Pointer (&f)), unsafe. Pointer (GPP), "IO Wait")}//Be careful to not lose concurrent ready notification old: = Xchguintptr (GPP, 0) If old > pdwait {gothrow (' netpollblock:corrupted state ')} return old = = Pdready}
Runtime/proc.go:gopark():
// Puts the current goroutine into a waiting state and calls unlockf.// If unlockf returns false, the goroutine is resumed.// 将当前goroutine置为waiting状态,然后调用unlockffunc gopark(unlockf unsafe.Pointer, lock unsafe.Pointer, reason string) { // 获取当前M mp := acquirem() // 获取当前G gp := mp.curg // 获取G的状态 status := readgstatus(gp) // 如果不是_Grunning或者_Gscanrunning,则报错 if status != _Grunning && status != _Gscanrunning { gothrow("gopark: bad g status") } // 设置lock和unlockf mp.waitlock = lock mp.waitunlockf = unlockf gp.waitreason = reason releasem(mp) // can't do anything that might move the G between Ms here. // 在m->g0这个栈上调用park_m,而不是当前g的栈 mcall(park_m)}
The McAll function is a piece of assembly that calls Park_m on the m->g0 stack, not on the current goroutine stack. The function of McAll is divided into two parts, the first part holds the Pc/sp field of the current G's pc/sp to G Gobuf, and the second part calls the Park_m function:
Func McAll (fn func (*g))//Switch to M->g0 ' s stack, call FN (g).//FN must never return. It should Gogo (&g->sched)/To keep running G.text runtime McAll (SB), nosplit, $0-8//The function to be executed is saved in di movq Fn+0 (FP), DI//will store M TLS on CX GET_TLS (CX)///G object in Ax Movq g (CX), AX/save state in g->sched//will be tuned The user's PC is stored in BX movq 0 (SP), BX//caller ' s PC//To save the caller's PC to g->sched.pc movq BX, (G_SCHED+GOBUF_PC) (AX)/ /The address of the first parameter, that is, the address of the top of the stack, saved to BX Leaq fn+0 (FP), BX//caller's SP//Save SP's address to g->sched.sp Movq BX, (G_sched+gob UF_SP) (AX)//Save G object to G->sched->g movq ax, (g_sched+gobuf_g) (AX)//Switch to M->G0 & its stack, c ALL FN//Save G object Pointer to bx MOVQ g (CX), BX//save G->m to BX movq g_m (BX), BX//save m->g0 to Si movq m _g0 (BX), si cmpq si, ax//if G = = M->g0 call Badmcall JNE 3 (PC) movq $runtime Badmcall (SB), Ax JMP AX//Save M->g0 to G Movq SI, G (CX)//G = M->G0//restore G->SCHED.SP to SP Register//use G0 stack movq (g_sched+gobuf_sp) (SI), SP//SP = M->G0->SCHED.SP AX-Stack Pushq ax movq di, DX//To copy the address of FN to di Movq 0 (di), Di//Invoke function call DI//ax out stack POPQ ax Movq $runtime badmcall2 (SB), ax JMP, Microsoft Dynamics AX RET
The function of the Park_m function is divided into three parts, the first part let the current G and the current m out of relation, the second part is called the unlock function, here is called the netpoll.go source file Netpollblockcommit function:
// runtime·park continuation on g0.voidruntime·park_m(G *gp){ bool ok; // 设置当前g为Gwaiting状态 runtime·casgstatus(gp, Grunning, Gwaiting); // 让当前g和m脱离关系 dropg(); if(g->m->waitunlockf) { ok = g->m->waitunlockf(gp, g->m->waitlock); g->m->waitunlockf = nil; g->m->waitlock = nil; // 返回0为false,非0为true // 0说明g->m->waitlock发生了变化,即不是在gopark是设置的(pdWait) // 说明了脱离了WAIT状态,应该设置为Grunnable,并执行g if(!ok) { runtime·casgstatus(gp, Gwaiting, Grunnable); execute(gp); // Schedule it back, never returns. } } // 这里是调度当前m继续执行其他g // 而不是上面执行execute schedule();}
Netpollblockcommit function, set GPP to Pdwait, set to return 1 successfully, otherwise return 0. 1 is true,0 to false:
func netpollblockcommit(gp *g, gpp unsafe.Pointer) bool { return casuintptr((*uintptr)(gpp), pdWait, uintptr(unsafe.Pointer(gp)))}
The current goroutine of the wait IO for the socket file descriptor has been completed. First try to determine if the IO is ready early in the process, suspend the current goroutine if it is not ready, suspend it again to determine if the IO is ready, and if it is not ready, schedule the M
other currently running G
. If the IO is ready before dispatching Goroutine, the current goroutine will not be brought into the dispatch queue, and the pending G is run directly. Otherwise the current goroutine will enter the dispatch queue.
Next, wait for the runtime to wake it up. Runtime in the findrunnablequeue
execution starttheworld
, sysmon
function, will call the function in Netpoll_epoll.go netpoll
, find the IO-ready socket file descriptor, and find these socket file descriptor corresponding to the information included in the poll, Based on this information, modify the Goroutine state to grunnable before waiting for these socket file descriptors to be ready. In the above function, after executing Netpoll, a ready goroutine list is found, then the ready Goroutine is added to the dispatch queue, waiting for the dispatch to run.
In a function in Netpoll_epoll.go netpoll
, the epoll_wait
function returns n epollevent for the file descriptor of the event that occurred, and then uses its Data property for each event to event.data
convert to a *pollDesc
type. Then call the Netpollready function in Netpoll.go to *pollDesc
remove the data type from the type G
and append it to the netpoll
G-linked list passed by the caller of the function:
// 将ev.data转换为*pollDesc类型pd := *(**pollDesc)(unsafe.Pointer(&ev.data))// 调用netpollready将取出pd中保存的G,并添加到链表中netpollready((**g)(noescape(unsafe.Pointer(&gp))), pd, mode)
So runtime executes, and the function findrunnablequeue
starttheworld
executes the function and sysmon
netpoll
returns n Goroutine. These goroutine expected network events have occurred, the runtime will put these goroutine into the current P
operational queue, then dispatch them and run.