Go Work-stealing Scheduler

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

This article is translated from Rakyll's Scheduler Copyright @ the original.

The Go Scheduler works by distributing the Goroutine to multiple operating system worker threads running on one or more processors. In multithreaded computing, there are two modes of scheduling: work-sharing (work sharing) and work-stealing (work-stealing).

    • work-sharing When a processor generates a new thread, it attempts to migrate some of these to other processors, hoping that they can be exploited by idle or underutilized processors.
    • work-stealing underutilized processors will proactively seek out threads for other processors and 窃取 some.

Work-stealing The median migration frequency is less than work-sharing. When all processors have work to run, no threads are migrated. Once you have an idle processor, you will consider migrating.

Go from 1.1 onwards has a work-stealing scheduler, contributed by Dmitry Vyukov. This article will explain in depth what the work-stealing Scheduler is and how Go implements it.

Scheduling Basics

Go has a M:N scheduler that can take advantage of multi-core processors. At any one time, M-goroutine need to be dispatched on N OS threads, which run on up to Gomaxprocs processors. The Go Scheduler interprets goroutine, threads, and processors using the following terminology:

    • G:goroutine
    • M:os Thread (machine)
    • P: Processor (translator: This is not referred to as the CPU, can be considered a Go scheduling context or scheduling processor, so the following processor if no special description refers to P)

There is a P-related local and global goroutine queue. Each m should be assigned to a p. If it is blocked or in a system call, p (we) may not have M. At any time, the maximum number of Gomaxprocs is p. At any time, each p can only have one M running. If needed, more M (we) can be created by the scheduler.

Each round of scheduling is simply to find a running goroutine and execute it. In each round of scheduling, the search is performed in the following order:

runtime.schedule() {    // only 1/61 of the time, check the global runnable queue for a G. 仅 1/61 的时间, 检查全局运行队列里面的 G.    // if not found, check the local queue. 如果没找到, 检查本地队列.    // if not found, 还是没找到 ?    //     try to steal from other Ps. 尝试从其他 P 偷.    //     if not, check the global runnable queue. 还是没有, 检查全局运行队列.    //     if not found, poll network. 还是没有, 轮询网络.}

Once a running G is found, it is executed until it is blocked.

Note that it looks as if the global queue has advantages over the local queue, but occasionally checking the global queue is critical to avoid M just dispatching from the local queue until there are no locally queued goroutine left.

Stealing (stealing)

When a new g is created or an existing G becomes operational, it is pressed into the list of available goroutine of the current P. When P finishes G, it tries to pop a G from its list of running Goroutine. If the list is now empty, p randomly selects the other p and tries to steal half of the running Goroutine (s) from its queue.

In the example above, P2 cannot find any goroutine that can be run. Therefore, it randomly selects another P1 and steals three of its goroutine (s) into its own local queue. P2 will be able to run these goroutine, and the scheduler's work will be more equitably distributed among multiple processors.

Spin Thread (Spinning threads)

The scheduler always wants to allocate as many running Goroutine (s) as possible to M to take advantage of the processor, but at the same time we need to stay too much work to conserve CPU and power. Paradoxically, the scheduler also needs to be able to scale to high-throughput and CPU-intensive programs. If performance is critical, continuing preemption for high-throughput programs is both expensive and problematic. Operating system threads should not switch frequently between goroutine (s), as this increases latency. In addition, when a system call occurs, the operating system thread needs to be constantly blocked and unblocked. This is expensive and adds a lot of overhead.

To minimize the switch, the GO Scheduler is implemented 自旋线程 . Spin threads consume a bit of extra CPU, but they minimize the preemption of OS threads. A thread is spin if:

    • M assigned P is looking for an executable goroutine;
    • M with no assigned p is looking for available p;
    • The scheduler also frees an additional thread to spin when it is preparing a goroutine and there is no free P and no other spin thread.

At any time there are up to gomaxprocs of the spin of M (We). When a spinning thread finds a job, it is out of the spin state.

If a free M is not given p, then the idle thread given p will not be blocked. When the new Goroutine (s) is created or M is blocked, the scheduler ensures that there is at least one spin m. This ensures that no operational goroutine (s) are not run; and avoid excessive M blocking or unblocking.

Conclusion

The Go scheduler does a lot to avoid excessive OS thread preemption, dispatching them to the correct and underutilized processors by stealing (stealing), and implementing 自旋 threads to avoid excessive blocking or unblocking switching.

Dispatch events can be traced with the execution tracker (execution tracer). If you happen to think your processor is poorly utilized, you can use it to explore what's going on.

Resources

    • Go Runtime Scheduler Source code
    • Extensible Go Scheduler Design Documentation
    • Daniel Morsing:go Scheduler
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.