This is a creation in Article, where the information may have evolved or changed.
-
It seems like a long time did not update the blog, mainly because the recent period in a variety of reading and look at the source code, the records are mostly also belong to the nature of reading notes, so there is no finishing to the blog, after a succession of things to come up.
intro Recently, finally decided to take a long time to mit-6.824 the lab to do, the most valuable part of this course is that it designed a number of labs, so that you can be a certain degree of controllable workload of the coding after a relatively deep understanding of the many distributed programs facing the public problems and how to Solve them. For example, distributed fault tolerance, concurrency, network underlying implementation, and so on. The targeted language of this course is Golang. The reason is naturally not to say, because Golang's brevity is very suitable to replace the C + + and other languages as the implementation language of the lab.
One of the most important issues I encountered in implementing the process was how to find a balance between CPU-intensive tasks, I/o-intensive tasks, and the full utilization of multicore CPU boost program performance. Of course, the easiest solution to think about this is multithreading . However, due to the particularity of the distributed program, it may have a large number of network I/O or computational tasks. This inevitably requires the use of synchronous way to describe the feelings of asynchrony, the solution is to put these calculations or IO into a new thread to do, the specific thread scheduling to the operating system to complete (although we can use asynchronous IO, but asynchronous IO because of the existence of a large number of callback, It is not easy to organize the program logically, so it is not considered to use asynchronous IO directly. One problem with this is that there will be a lot of threads in the context, so the cost of context switching for threads cannot be overlooked. If we implement it in the JVM, a large number of thread may quickly run out of memory on the JVM heap, not only causing heap overflow, but also increasing GC time and instability. So recently I've looked at several common concurrency programming models and their corresponding common implementations.
common concurrency programming model classification Concurrent programming model, as its name implies, is to solve the related programming paradigm of high concurrency, which takes advantage of multicore features to reduce CPU waiting for higher throughput. So far, I think the more common concurrency programming model can be broadly divided into two categories:
- Message-based (event) active objects
- implementation of the process based on CSP model
Yes, it seems that the great God has done it, academia is terrible!
First of all, when M discovers that the gorouine linked list in P has been fully executed, it will steal the Goroutine from the other p and execute it, and its strategy is a work-through mechanism. When the other p does not have an executable goroutine, it will be executed from the global waiting queue looking for runnable goroutine, if not found, then m let out CPU scheduling.
The second problem, such as blocking IO reading local files, is that the call will systemcall into the kernel and inevitably cause the calling thread to block, so it is goroutine to encapsulate all possible blocking system calls as Gorouine friendly interfaces. The practice is to get an OS thread from a thread pool and execute the system call before each system call, while the Gorouine will change its state to gwaiting and give control to scheduler to continue scheduling. The return of the system call is synchronized via the channel. As a result, there is no way for Goroutine to fully co-process, because system calls always block threads. refer to the discussion on StackOverflow : Links
The third problem, go support simple preemptive scheduling, there is a Sysmon thread in goruntime, responsible for detecting the various states of Goruntime. Sysmon one of the duties is to detect if there is a long time consuming CPU goroutine, and if found, preempt it.
Reasons why Goroutine cannot be implemented on the JDK
Here, we have a general understanding of the principle of goroutine, the most important design in goroutine is that it restricts all the language levels of the API to the Goroutine layer, which masks the opportunity to execute code and specific threading interaction. So in Goroutine, we can actually ignore the existence of threads, and treat goroutine as a very inexpensive thread that can be created massively.
However, in Java, or the JVM language (such as scala,clojure) that intends to interact with the JDK, it is inherently impossible to fully implement Goroutine (Clojure, though async, is still not well-integrated with the blocking API in the JDK). Let's say that we implement a scheduler, a lightweight coprocessor, and a co-related primitive (such as resume, pause, etc.) in Java based on thread, and we can only assist in scheduling based on our own encapsulated API. If the Java blocking API is used directly in the created process, the result is that the OS thread used to schedule the coprocessor is stuck and cannot continue running scheduler for scheduling.
In summary, if we do not change the JDK native implementation of the premise, we can not completely achieve similar goroutine of the co-process.
Todo
So, to make the JDK support coroutine, then we can only change the JDK, yes, I am so persistent! = =
First list some related work:
JCSP CSP for Java Part 1 CSPs for Java Part 2
As shown, we can see that there are two m, two OS thread threads, each corresponding to a p, each p is responsible for dispatching multiple G. As a result, the basic structure of the Goroutine runtime is composed.
Below we analyze the specific code of G M p
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
struct G { UIntPtr stackguard0;//for stack protection, but can be set to stackpreempt for preemptive scheduling UIntPtr stackbase; Top of Stack Gobuf sched; Execution context, suspend execution of G and resume execution, all depend on it UIntPtr Stackguard; Same as stackguard0, but it will not be set to Stackpreempt UIntPtr stack0; Bottom of Stack UIntPtr stacksize; The size of the stack Int16 status; Six states of G Int64 goid; Identification ID of G int8* Waitreason; When status==gwaiting is useful, the reason for waiting may be to call time. Sleep or something. g* Schedlink; Point to the next g of the list UIntPtr Gopc; To create a program counter for this goroutine go statement pc, you can get specific functions and lines of code via PC }; struct P { Lock; Plan9 C's extended syntax, equivalent to lock lock; Int32 ID; Identification ID of P UInt32 status; Four states of P p* link; Point to the next p in the list m* m; This value is nil when it is currently bound to the M,pidle state Mcache* Mcache; Memory pool Grunnable status of the G-queue UInt32 Runqhead; UInt32 Runqtail; g* runq[256]; Gdead status of the G-linked list (via G's Schedlink) GFREECNT is the number of nodes on a linked list g* Gfree; Int32 gfreecnt; }; struct M { g* G0; M Default Execution g void (*MSTARTFN) (void); function pointers executed by the OS thread g* Curg; The currently running G p* p; The current associated p, if not currently executing G, can be nil p* NEXTP; The p that is about to be associated Int32 ID; Identification ID of M m* Alllink; Add to ALLM so that it is not garbage collected (GC) m* Schedlink; Point to the next m in the list }; |
Here, the most important three states of G are grunnable grunning gwaiting. The specific State migration is grunnable grunnable, grunning, gwaiting. Goroutine the context of the stack is saved and restored when the state changes. Now let's open up the definition of gobuf in G
1 2 3 4 5 6 |
struct GOBUF { UIntPtr sp; Stack pointer UIntPtr pc; Program Counter PC g* G; The associated G }; |
When you specifically want to save the stack context, the most important thing is to save the contents of the GOBUF structure. Goroutine specifically through void gosave(Gobuf*)
and void gogo(Gobuf*)
these two functions to achieve the storage and recovery of the stack context, the specific underlying implementation of the assembly code, so the Goroutine context Swtich will be very fast.
Next, let's take a concrete look at the scheduling strategy of Goroutine Scheduler in several main scenarios.
Goroutine the execution of the scheduler to the specific m, OS Thread. Each m executes a function, that is void schedule(void)
. What this function does is to select the appropriate goroutine from each run queue and execute the corresponding goroutine func
.
The specific schedule functions are as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
A round of scheduling: Find the G that can run, execute Never return static void schedule (void) { G *GP; UInt32 tick; Top GP = nil; Check global operational queues from time to time to ensure fairness Otherwise two goroutine are constantly reborn with each other, fully occupying the local operational queue Tick = m->p->schedtick; The optimization technique is actually tick%61 = = 0 if (tick-((UInt64) tick*0x4325c53fu) >>36) *61 = = 0 && runtime sched.runqsize > 0) { Runtime Lock (&runtime sched); GP = Globrunqget (m->p, 1); Get the available G from the global operational queue Runtime unlock (&runtime sched); if (GP) Resetspinning (); } if (gp = = nil) { GP = Runqget (m->p); If it is not found in the global queue, it will be found in the local runtime queue of P if (GP && m->spinning) Runtime throw ("schedule:spinning with local work"); } if (gp = = nil) { GP = Findrunnable (); Block until a usable g is found Resetspinning (); } Whether to enable the specified m to execute the G if (GP->LOCKEDM) { Give p to the specified m, then block the new P STARTLOCKEDM (GP); Goto top; } Execute (GP); Execute g } |
So here are a few questions to throw:
What do I do when M discovers that the Goroutine list assigned to me has been executed? When Goroutine is stuck in a system call, does m also block together? What happens when a gorouine consumes CPU for a long time?
First, we assume that a set of coroutine mechanisms based on the existing thread model can be established in the JDK, and that some methods may be called to create Coroutine objects, assign Coroutine tasks, and execute them. However, there are many existing blocking operations in the JDK, and the invocation of these blocking operations directly causes the thread to be blocked, so that the coroutine of the thread will lose the ability to reschedule. Maybe you have a lot of other ways to design, but the essential problem here is that no matter how you design, you can never get rid of the JDK state and the thread state is not uniform . Unless you do the same as go, all blocking operations are performed by wrap to the level of the co-process. So, once we use the JDK and thread-bound blocking APIs, this concurrency model basically breaks the dishes.
So let's analyze the implementation of Goroutine to explain why Java can't do goroutine that way.
Goroutine principle
Between the OS thread of the operating system and the user thread of the programming language, there is actually a 3-thread correspondence model, namely: 1:1,1:n,m:n.
The JVM specification specifies a 1:1 relationship between the JVM thread and the operating system thread, which means that thread in Java and the OS thread are 1:1. There are also patterns that implement threading models as 1:N, but this is not flexible enough. Goroutine Google Runtime is the default implementation of the M:N model, so it can be based on the specific type of operation (operating system blocking or non-blocking operation) to adjust the Goroutine and OS thread mapping situation, it is more flexible.
In the Goroutine implementation, there are three most important data structures, the G M P, respectively:
G: Represents a Goroutine M: represents an OS thread P: a P and a M are bound to represent the scheduler on this OS thread
The most typical representative is the actor of Akka, which is based on the concurrency model of the active object of the message (event). The concurrency model of actor is to abstract a sequence of computations into an actor object, and each actor communicates through an asynchronous message passing mechanism. In this way, the sequential block of the computed sequence is dispersed into an actor. Our actions in the actor should be as non-blocking as possible. Of course, in Akka the actor is based on the specific dispatcher to decide how to handle the message of an actor, the default dispatcher is Forkjoinexecutor, only suitable for handling non-blocking non-CPU-intensive messages There are other dispatcher in Akka that can be used to handle blocking or CPU-intensive messages, and the underlying implementation is cachedthreadpool. Combining these two dispatcher, we can build a complete concurrency model on the JVM.
Based on the implementation of the association process, the main representative here is goroutine. The Golang runtime implements the M:N model of the Goroutine and OS thread, so the actual goroutine is a more lightweight implementation based on threading, We can create goroutine in Golang without worrying about the overhead associated with expensive context Swtich. Between Goroutine, we can interact with the channel. Since go has wrap all system calls into the standard library, the call to these systemcall will be actively tagged goroutine as blocking state and save the scene, to scheduler execution. So in Golang, in most cases we can use blocking operations in Goroutine without worrying about concurrency being affected.
This concurrency model of Goroutine has a very obvious advantage, and we can simply use the blocking programming of people's love to express the feelings of Asynchrony, as long as the keyword can be used rationally go
. Compared to Akka's actor, the Goroutine program is more readable and better positioned incorrectly.
Can Java do goroutine like this?
Since goroutine is so useful, can we implement a set of concurrency model libraries similar to goroutine based on the JDK?? (Here I have raised a related question, see here) Unfortunately, if the JDK-based, it is not possible. Let's examine the nature of the problem.
Below I define the goroutine concurrency model as the following points:
Thread-based lightweight co-routing through the channel to communicate between the threads of the process only exposes the process, shielding the interface of the thread operation