This is a creation in Article, where the information may have evolved or changed.
Goroutine and Scheduler
November 2013by Skoo
We all know that the go language is native-supported language-level concurrency, and the smallest logical unit of concurrency is goroutine. Goroutine is a user-state thread provided by the go language, which is, of course, a thread of user-state running on a kernel-level thread. When we create a lot of goroutine, and they are all running on the same kernel thread, we need a scheduler to maintain these goroutine, ensure that all goroutine use CPU, and use CPU resources as fairly as possible.
The principle and implementation of this scheduler is worth studying in depth. The main support of the entire scheduler has 4 important structures, namely M, G, P, Sched, the first three are defined in Runtime.h, Sched defined in PROC.C.
- The SCHED structure is the scheduler, which maintains queues with storage m and G, as well as some state information of the scheduler.
- M is a kernel-level thread, an M is a thread, Goroutine is running above M, and M is a large structure that maintains a lot of information such as the small Object memory cache (Mcache), the currently executing goroutine, the random number generator, and so on.
- P Full name is processor, the processor, its main purpose is to perform goroutine, so it also maintains a goroutine queue, which stores all the goroutine that need it to execute, the role of P can be a bit confusing, At first it was easy to clash with M, and the following was the key to talking about their relationship.
- G is the goroutine implementation of the core structure, g maintenance of the goroutine required stack, program counter and its location m and other information.
Understanding the relationship between M, P, G Three is very important to understand the whole scheduler, I have found a diagram on the network to illustrate its three relationships:
The Gopher (Gopher) carried a pile of bricks to be processed in a car. M can be seen as the ground mouse in the picture, p is the car, G is the car in the bricks. A picture wins thousands of words, figuring out the relationship between the three of them, and here we begin to focus on how the rat is moving the bricks.
startup process
When we are concerned about the internal principles of most programs, we try to figure out how to start the initialization process and understand that this process is critical to further analysis. The assembly code _RT0_AMD64 in the Asm_amd64.s file is the entire startup process, and the core process is as follows:
CALLruntime·args(SB)CALLruntime·osinit(SB)CALLruntime·hashinit(SB)CALLruntime·schedinit(SB)// create a new goroutine to start programPUSHQ$runtime·main·f(SB)// entryPUSHQ$0// arg sizeCALLruntime·newproc(SB)POPQAXPOPQAX// start this MCALLruntime·mstart(SB)
After the boot process has done the scheduler initialization runtime Schedinit, call runtime Newproc to create the first goroutine, the Goroutine will execute the function is runtime Main, This first goroutine is the so-called Lord Goroutine. We wrote the simplest go program "Hello,world" is to completely run in this goroutine, of course, any go program entrance is from this goroutine start. The last call to runtime Mstart is the real execution of the main goroutine created in the previous step.
The Scheduler initialization runtime Schedinit function during startup mainly creates a batch of carts (p) based on the Gomaxprocs value set by the user, no matter how large the Gomaxprocs is set, and can only create up to 256 cars (p). These carts (p) are idle after the initial creation, i.e. they are not yet in use, so they are stored in the list of field maintenance of the scheduler structure (Sched) for pidle
future needs.
Looking at the runtime main function, you can see that the first thing to do when the main goroutine is started is to create a new kernel thread (gopher m), but this thread is a special thread that is specifically responsible for doing specific things throughout the run-system monitoring (Sysmon). The next step is to go to the Go program's main function to start the GO program execution.
The GO program is now up and running. A really work go program, must create a lot of goroutine, so after the go program started running, will add goroutine to the scheduler, the scheduler will be responsible for maintaining the normal execution of these goroutine.
Create Goroutine (G)
In the Go program, there are often similar codes:
go do_something()
The Go keyword is used to create a goroutine, and the following function is the code logic that the Goroutine needs to execute. The Go keyword corresponds to the interface of the scheduler runtime·newproc
. The runtime Newproc is very simple, it is responsible for making a brick (g), and then put the Brick (g) into the current hamster (M) in the car (P).
Each new goroutine need to have a stack of their own, g structure of the sched
field to maintain the stack address and program counters and other information, this is the most basic scheduling information, that is, this goroutine to abandon the CPU need to save this information, the next time to regain the CPU, This information needs to be loaded into the corresponding CPU registers.
Assuming that a large number of Goroutne have been created at this time, it is up to the scheduler to maintain these goroutine.
Create kernel thread (M)
There are no language-level keywords in the GO program that let you create a kernel thread, you can only create goroutine, and kernel threads can only be created by runtime according to the actual situation. When does runtime create a thread? To the hamster transport brick diagram, brick (G) too much, the hamster (M) and too little, really busy, just have free car (p) not used, then borrow some land from elsewhere (m) come over until the car (p) to run out. Here is a gopher (m) not enough to borrow from elsewhere (m) process, this process is to create a kernel thread (m). The interface functions for creating m are:
void newm(void (*fn)(void), P *p)
The core behavior of the NEWM function is to invoke the clone system call to create a kernel thread, where each kernel thread starts executing at the runtime Mstart function. The parameter p is a free car (p).
Each of the created kernel threads is executed from the runtime Mstart function, and they will be assigned to their own car to move bricks.
Dispatch Core
NEWM interface simply assigns a free p to the newly created M, which is equivalent to telling the borrowed Gopher (M)-"The next day, you will use the number 1th car to move bricks, Remember it's the number 1th car, and get your car in the parking lot later. "The Gopher (M) goes to get the car (P) This process is acquirep
. Runtime mstart in entering schedule
before the code in the P,runtime mstart function on the current m assembly is given:
} else if(m != &runtime·m0) {acquirep(m->nextp);m->nextp = nil;}schedule();
The content of the If branch is for the current m assembly on P, nextp
is the NEWM allocated to the idle car (p), but only then to really get hands. No p,m is unable to carry out goroutine, just like the ground mouse does not have the car can not be transported bricks the same reason. Corresponding to the action of the Acquirep is Releasep, the M assembly of P to load off, the work is done, the hamster needs to rest, the car also to the parking lot, and then go to sleep.
Gopher (M) to get their own car (p), entered the workshop began to work, that is, the above schedule
called. The code for simplifying schedule is as follows:
static voidschedule(void){G *gp;gp = runqget(m->p);if(gp == nil)gp = findrunnable();if (m->p->runqhead != m->p->runqtail &&runtime·atomicload(&runtime·sched.nmspinning) == 0 &&runtime·atomicload(&runtime·sched.npidle) > 0) // TODO: fast atomicwakep();execute(gp);}
Schedule function was simplified by me too much, mainly I do not like to stick large sections of the code, so only the skeleton code is retained. Here's a 4-step logic:
runqget
, the Gopher (M) tried to remove a brick (G) from his own car (P), and of course the result might have failed, that is, the hamster's car was empty, no bricks.
findrunnable
, if the hamster's own car without bricks, that also can not idle work is it, so the hamster will try to run to the factory warehouse to take a brick to deal with, the factory warehouse may not have bricks ah, this situation, the land rat did not lazy stop work, but quietly run out, random stare on a small partner (gopher), Then he tried to steal half of the bricks from its car into his car. If many attempts to steal bricks have failed, it shows that there is no brick can be moved, this time the rat will return to the car park, and then 睡觉
rest. If the hamster sleeps, the following process is of course stopped, and the hamster sleeps in the thread sleep.
wakep
, to this process, the poor hamster found himself in the car there are many bricks ah, they can not handle it, and then look back at the parking lot there is idle car, immediately ran to the dorm, your sister, incredibly still have a small partner in sleep, directly to the bottom of a foot, "Your sister, incredibly still sleeping, Lao Tzu is almost exhausted, Get up and work and share the job. The little buddy woke up and took his car and went to work. Sometimes, poor hamster ran to the dorm but found no sleep in the small partner, so will be very disappointed, finally had to tell the factory owner-"Parking lot and idle car ah, I can not move, hurriedly from other factories to borrow a gopher to help it." Finally, the factory owner got a new hamster to work on.
execute
, the hamster took the bricks and put them into the fire and practiced happily.
Here, it seems that the entire factory is functioning normally, impeccable appearance. No, there is a doubt unresolved ah, suppose the rat's car has a lot of bricks, it put a brick into the stove, when to take it out, put in the second brick? Do you want to keep the first piece of brick on fire before you take it out? It is estimated that the brick behind is really waiting for the flowers to be thanked. Here is to really solve the Goroutine scheduling, context switching problem.
Dispatch Point
When we look at the channel's implementation code, we can see that the runtime Park function is triggered when the channel reads and writes. After Goroutine calls Park, the Goroutine is set to the bit waiting state, discarding the CPU. The Goroutine of Park is in waiting state, and this goroutine is not in the car (P), and if it is not called runtime ready, it will never be executed again. In addition to channel operation, the timer, network poll, etc. may be park goroutine.
In addition to park can discard the CPU, calling the runtime gosched function can also let the current goroutine abandon the CPU, but unlike park, Gosched is to set Goroutine to Runnable State, Then put into the scheduler global waiting queue (that is, the factory warehouse mentioned above, this will understand why the factory warehouse will have bricks (G) it).
In addition, it is the turn of the system call, some system calls will also trigger the rescheduling. The go language is completely its own package system call, so in the encapsulation system call, can do a lot of hands and feet, that is, enter the system call when the execution of Entersyscall, exit and execute the Exitsyscall function. Only system calls that encapsulate the entersyscall can trigger a reschedule, which will change the state of the Trolley (P) to Syscall. Do you remember the Sysmon thread that was mentioned at the beginning? This system monitoring thread will scan all the cars (p), found a car (p) in the state of the Syscall, it is known that the car (p) encountered Goroutine is making a system call, so the system monitoring thread will create a new Gopher (M) To get this car in the Syscall to Rob, began to work, so that all the bricks in the car (G) can bypass the previous system call waiting. Was robbed of the car, such as the hamster system call back, found that his car did not, can not continue to work, so can only execute the system call Goroutine put back to the factory warehouse, himself 睡觉
went.
From the Goroutine dispatch point can be seen, the scheduler is still quite rough, scheduling granularity is a bit too big, fairness also did not think so good. In short, this scheduler is still relatively simple.
On-site processing
goroutine on the CPU swap out, constantly context switch, the thing that must be guaranteed is save site
and Restore the scene
, save the scene is when the Goroutine discard the CPU, the value of the relevant register to save to memory, the recovery site is Goroutine regain the CPU, You need to put all the previous register information back into the appropriate registers from memory.
Goroutine in the active abandonment of the CPU (park/gosched), will involve the call runtime McAll function, this function is also a compilation implementation, mainly the Goroutine stack address and program counter saved to the field of the G structure sched
, The McAll was completed on-site preservation. The function that restores the scene is runtime Gogocall, this function is called mainly in execute
, it is necessary to reload the corresponding register before executing goroutine.
people may have to have a little regret to be perfect