This is a creation in Article, where the information may have evolved or changed.
I. Introduction of Golang
1.1 Overview
Golang language is a new generation of programming language developed by Google, and go is simply expressive, concise, clear and efficient. Its parallel mechanism makes it easy to write multicore and Web applications, while the novelty type system allows for the building of resilient modular programs. Go compiles to machine code very quickly, with convenient garbage collection and powerful runtime reflection. His most widely known feature is the support for multi-core programming at the language level, and he has the simple keyword go to implement parallelism, like this:
The parallel unit of Go is not a traditional thread, thread switching requires a large context, this switch consumes a lot of CPU time, and go uses a lighter amount of goroutine to process, greatly improving the degree of parallelism, known as "the most parallel language." Docker, the latest wave of container technology, was written by go. As the GC is interspersed with goroutine, but this article does not discuss GC-related content, so skip GC, mainly discusses the Goroutine scheduling problem. The go version for this article is the latest Go1.7 as of June 29, 2016.
1.2 Comparison with other concurrency models
Python, such as the use of interpretive language is a multi-process concurrency model, the context of the process is the largest, so the switch is expensive, and because the multi-process communication can only use socket communication, or special set of shared memory, programming has brought great trouble and inconvenience;
C + + and other languages usually adopt multi-threaded concurrency model, compared to the process, the context of the thread is much smaller, and many threads are inherently shared memory, so programming compared to much easier. However, the start and destruction of threads, switching still consumes a lot of CPU time;
Then there is the thread pool technology, the thread is stored first, to maintain a certain amount of time to avoid frequent on/off thread consumption, but this primary technology has some problems, such as the thread has been blocked by the IO, so that the threads have been occupied the pit position, resulting in the next task is not queued to the team, Cannot get the thread to execute;
While the concurrency of Go is more complex, go uses a lighter amount of data structures instead of threads, a data structure that is lighter than threads and has his own stack to switch faster. However, to actually execute the concurrent thread, go goes through the scheduler to dispatch the Goroutine to the thread, and releases and creates new threads in a timely manner, and when a running Goroutine enters the block (a common scenario is waiting for IO), it is detached from the occupied thread. The other goroutine that are ready to run are placed on the thread to execute. Through more complicated dispatching means, the whole system obtains extremely high degree of parallelism without consuming a lot of CPU resources.
Characteristics of 1.3 Goroutine
Goroutine is introduced to facilitate the writing of high concurrent programs. When a goroutine is performing a blocking operation (such as a system call), the other goroutine in the current thread are handed over to the other thread to continue execution, thus preventing the entire program from blocking.
Because Golang introduces garbage collection (GC), Goroutine is required to be stopped when the GC is executed. The function can be implemented conveniently by implementing the scheduler by itself. Through multiple goroutine to achieve concurrent programs, both the advantages of asynchronous IO, but also multi-threaded, multi-process programming convenience.
The introduction of Goroutine also implies the introduction of great complexity. A goroutine contains both the code to execute and the stack and PC, SP pointers that are used to execute the code.
Since each goroutine has its own stack, the corresponding stack is created at the same time when the Goroutine is created. The stack space will continue to grow when the goroutine is executed. Stacks typically grow continuously, because each thread in each process shares the virtual memory space, and when there are multiple threads, it is necessary to assign a stack of different start addresses for each thread. This requires estimating the size of each line stacks before allocating the stack. If the number of threads is very large, it is easy to stack overflow.
To solve this problem, we have split Stacks technology: When you create a stack, you allocate only a small piece of memory, and if you make a function call that causes the stack space to be low, a new stack space is allocated elsewhere. The new space does not need to be contiguous with the old stack space. The parameters of the function call are copied to the new stack space, and the next function execution is performed in the new stack space.
Golang's stack management approach is similar, but for higher efficiency, the use of continuous stack (Golang continuous stack) implementation is the first allocation of a fixed-size stack, when the stack space is insufficient, allocate a larger stack, and the old stack all copied to the new stack. This avoids the frequent memory allocations and releases that the split stacks method can cause.
The execution of the goroutine can be preempted. If a goroutine has been occupying the CPU for a long time without being transferred, it will be preempted by the runtime, giving the CPU time to the other goroutine.
Second, the concrete realization
2.1 Concept:
M: Refers to the worker thread in Go, is the unit that actually executes the code;
P: is a scheduling goroutine context, Goroutine relies on p for scheduling, p is a real parallel unit;
G: Goroutine, is a piece of code in the Go language (presented as a function), the smallest parallel unit;
P must be bound to M to run, m must bind p to run, and generally, there is a maximum of maxprocs (usually equal to the number of CPUs) p, but there may be a lot of M, really run only p that is bound to M, so P is a true parallel unit.
Each p has its own Runnableg queue, which can take a g from the inside to run, but also has a global runnable g queue, G is attached to m above by P. The reason for not using the global runnable G queue alone is that the distributed queue helps to reduce the critical section size, and if multiple threads request the available G at the same time, if there are only global resources, then this global lock will cause how many threads are waiting.
But if an executing g goes into a block, the typical example is to wait for Io, then he and its m will wait there, and the context P will be passed on to the other available m, so that the blocking will not affect the degree of parallelism of the program.
2.2 Frame Diagram
2.3 Specific functions
Goroutine Scheduler code in/src/runtime/proc.go, some of the more critical functions are analyzed below.
1. Schedule function
The schedule function executes when the runtime needs to be dispatched, looks for a running G for the current p and executes it, looking in the following order:
1) Call the Runqget function to get an executable G from P's own runnable g queue;
2) If 1 fails, call the Findrunnable function to find an executable g;
3) If 2 does not have the ability to execute the G, then the end of the dispatch, from the last scene to continue execution.
2. findrunnable function
The Findrunnable function is responsible for finding the g that can be executed for a p, and its search order is as follows:
1) Call the Runqget function to get an executable G from P's own runnable g queue;
2) if 1) fails, call the Globrunqget function to get an executable g from the global Runnableg queue;
3) if 2) fails, call Netpoll (non-blocking) function to take an asynchronous callback of G;
4) If 3 fails, try to steal half the number of G from the other p to come over;
5) If 4) fails, call the Globrunqget function again to get an executable g from the global Runnableg queue;
6) If 5) fails, call Netpoll (block) function to take an asynchronous callback of G;
7) If 6 is still not taken to G, then call the STOPM function to stop this m.
3. Newproc function
The NEWPROC function is responsible for creating a running G and placing it in the current P's runnable g queue, which is similar to "go func () {...}" Statements are actually translated by the compiler after the call, the core code in the Newproc1 function. This function is executed in the following order:
1) Obtain the current P of G, and then remove a G from the free G queue;
2) if 1) take the parameter configuration of this g, or create a new G;
3) Add G to P's runnable g queue.
4. goexit0 function
The Goexit function is called when G exits. This function sets the G to a list of free g for later reuse and then calls the schedule function to dispatch.
5. HANDOFFP function
The HANDOFFP function passes p out of the system call or blocked M, and if P also has a runnable g queue, a new M is opened, the STARTM function is called, and the newly opened m is not empty.
6. Startm function
The Startm function dispatches an m or, if necessary, creates a m to run the specified p.
7. Entersyscall_handoff function
The Entersyscall_handoff function is used to pass p when the Goroutine enters the system call (which may block).
8. Sysmon function
The Sysmon function is created when go Runtime is started and is responsible for monitoring the status of all goroutine, determining whether a GC is required, netpoll, and so on. The retake function is called in the Sysmon function for preemptive scheduling.
9. Retake function
The retake function is the key to implement preemptive scheduling, and its implementation steps are as follows:
1) Traverse all p, if the p is in a system call and block, call HANDOFFP to hand it over to another m;
2) If the P is in a running state and the last scheduled time exceeds a certain threshold, then calling the Preemptone function will cause the stack space check to fail when the next function call is performed in the p. Then trigger Morestack () (Assembly code, located in Asm_xxx.s) and then make a series of function calls, the main calling process is as follows: Morestack () (assembly code), Newstack (), Gopreempt_m ()- > Goschedimpl ()->schedule () in the Goschedimpl () function, the G is unbound from M by calling DROPG (), and then Globrunqput () is added to the global runnable queue. Finally, call Schedule () to set the new executable G for the current P.
Third, summary
Because the go language has its own runtime, making the implementation of Goroutine relatively simple, the author tried to implement similar functions in c++11, but the preemptive scheduling of the protection field and the call to the other thread after G is blocked is difficult to implement, After all, go all calls have been runtime, so to come, C #, VB and other languages should be easier to implement. The author in the c++11 implementation of the Goroutine does not support preemptive scheduling and blocking post-delivery function, so only and directly using Std::thread for multi-threaded operation is compared, the work function is computationally intensive operation, the following is the effect of the comparison chart (project address in the https:// GITHUB.COM/INSZVA/CPPGO):
You can see that the author's library startup time is shorter (goroutine than thread), the execution to the highest peak of the system OS also empty a thread, and time is shorter than the multithreaded model. Compared to most concurrent design models, the go comparison advantage design is the concept of P context, if only G and M of the corresponding relationship, then when G block on the IO, M is not actually working, so that the resources are idle, and, without p, then all the G list is placed in the global, This causes the critical area to be too large and has a great impact on multi-core scheduling. And Goroutine in the use of the above features, the feeling can be used to do dense multi-core computing, but also can do high-concurrency IO applications, when doing IO applications, writing feel and the programmer's most friendly synchronization congestion, and actually because of the runtime scheduling, The bottom layer is running in a synchronous, non-blocking manner (ie, IO multiplexing), although it is not up to nodejs such asynchronous non-blocking concurrency, but also close. And compared to Nodejs,go can make better use of multi-core computing, because it is static compilation, can be found early in the program error. The language is still booming, and it is open source language, which is interested in keeping a constant focus.
Iv. references
Golang Code Warehouse: Https://github.com/golang/go
"Scalablego Schedule": Https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit
"Gopreemptive Scheduler": Https://docs.google.com/document/d/1ETuA2IOmnaQ4j81AtTGT40Y4_Jr6_IDASEKg0t0dBR8/edit