This is a creation in Article, where the information may have evolved or changed.
Original link: Go ' s work-stealing Scheduler
The Go Scheduler's task is to distribute a running goroutine on multiple system threads running on one or more processors. In multithreaded computing, there are two modes of scheduling: work sharing and work stealing.
- work sharing: When a processor creates a new thread, it attempts to migrate a subset of the threads to other processors and expects to take advantage of the IDLE processors.
- work stealing: underutilized processors proactively look for threads on other processors and "steal" some threads.
The migration of a work-stealing mode thread is less frequent than the work-sharing mode. When all the processors have work to run, no threads are migrated. Once the idle processor is available, the migration is considered.
Go has a work-stealing mode scheduler starting from version 1.1, which was contributed by Dmitry Vyukov. This article will explain in depth what is a job-stealing scheduler, and how to implement one.
The basis of scheduling
Go has a M:N scheduler that can use multi-core processors. At any time, M-goroutine need to be distributed on N system threads running on up to Gomaxprocs processors. The Go Scheduler uses the following terms to describe goroutines, threads, and processors:
- G:goroutine
- M: System thread (Machine)
- P: Processor
There is a processor-specific local goroutine queue and a global goroutine queue. Each system thread should be assigned to a processor, and if the processor is blocked or called by the system, there may be no threads on the processor. At any time, there are at most
Gomaxprocs processors are used for allocation. At any time, a thread can run on only one processor. The scheduler can also create more threads if one is needed.
Each round of scheduling is simply to find a running goroutine and execute it. In each round of scheduling, the search is performed in the following order:
runtime.schedule() { // only 1/61 of the time, check the global runnable queue for a G. // if not found, check the local queue. // if not found, // try to steal from other Ps. // if not, check the global runnable queue. // if not found, poll network.}
Once a running goroutine is found, it is executed until it is blocked.
Note: It appears that the global queue has a higher priority than the local queue, but occasionally the global queue is checked only to prevent the system thread from only calling Goroutine in the local goroutine queue before the Goroutine in the local queue is exhausted.
Steal
When a goroutine is created or an existing goroutine becomes operational, it is pushed to a running Goroutines queue on the current processor, and when the processor finishes executing a goroutine, it will attempt to run from its own local This goroutine pops up in the goroutine queue. If the queue is empty, the processor randomly chooses one of the other processors and tries to steal half the number of running goroutine from the locally-run Goroutine queue of the processor.
In the example above, P2 this processor cannot find any executable goroutines. Therefore, it randomly selects another processor P1 and steals 3 Goroutines into its own local queue. P2 will execute these goroutines, and the scheduler will be more evenly dispatched between multiple processors.
Rotating Threads
The scheduler always wants to distribute the goroutines to the thread in order to take advantage of the processor, but at the same time we need to limit too many tasks to conserve CPU resources. Paradoxically, the scheduler also needs to be able to scale to high-throughput and CPU-intensive programs.
If performance is critical, frequent preemption will be expensive, and this is a serious problem for high-throughput programs. Running goroutine should not be passed frequently between operating system threads, as this will result in an increase in latency. In addition, in the case of system calls, the system thread requires constant block and unblock, which is also very expensive and adds a lot of extra overhead.
To reduce inter-thread transfer, the scheduler implements a "spinning thread". Spinning threads consume extra CPU resources, but they minimize the preemption of operating system threads. A thread is spinning, if:
- A processor with a thread is looking for an executable goroutine.
- A thread that does not have a processor is looking for a processor that can be attached
- When it prepares a goroutine, if there is an idle processor and there is no other spinning thread, the scheduler cancels an extra thread and then rotates the thread.
At any time, there is a maximum of gomaxprocs threads in rotation. When a spinning thread finds work, it is out of the spin state.
If the idle thread does not have a processor to allocate, the idle threads that have been assigned to the processor are not blocked. When a new goroutine is created or a thread is blocked, the scheduler ensures that there is at least one spinning thread, which ensures that the program can still run when there is no goroutine to run, and also avoids excessive thread block/unblock.
The Go scheduler does a lot of work to avoid excessive preemption of operating system threads by dispatching them to the correct and underutilized processors, and by implementing a "spin" thread to avoid blocking/blocking conversions.
Scheduling events can be traced by executing tracer. If you think your processor utilization is low, then you can check what's going on.