Dispatch of Golang Goroutine

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Dispatch of Golang Goroutine

1. What is a co-process? A process is a lightweight thread that is user-state.        2, process, thread, association and the difference between: * Process has its own independent heap and stack, neither share the heap, nor share the stack, the process is scheduled by the operating system.        * Threads have their own separate stacks and shared heaps, shared heaps, non-shared stacks, and threads are also dispatched by the operating system (standard threads Yes).        * The process and thread share the heap, do not share the stack, the process by the programmer in the code of the process to display the schedule.        * The difference between the process and the thread is that the process avoids meaningless scheduling, which can improve performance.                * The execution process requires very little stack memory (presumably 4~5kb), and by default, the size of the line stacks is 1MB. Goroutine is a piece of code, a function entry, and a stack assigned to it on the heap.                So it's very cheap, we can easily create tens of thousands of goroutine, but they are not scheduled to be executed by the operating system. Runtime Gomaxprocs (runtime.                NUMCPU ())//Go version>=1.5, the default value of Gomaxprocs is the number of CPUs that the operating system sees when the Go program starts.        Note: The number of operating system threads used in the GO program includes: The thread that is serving CGO calls is blocking the thread of the operating system calls, so the number of operating system threads used in the Go program may be greater than the value of Gomaxprocs. 3, scheduling to understand the implementation of the process, the first need to understand the go in three very important concepts, they are g (goroutine), M (Machine) and P (process): g (Goroutine) G is goroutine                Header text, Goroutine can be interpreted as a managed lightweight thread, Goroutine is created using the Go keyword.                For example, func main () {go Other ()}, this code creates two Goroutine, one is main, the other is other, and note that main itself is also a goroutine.                Goroutine new, Hibernate, resume, stop are managed by Go runtime.        Goroutine goes into hibernation when an asynchronous operation is performed, and then resumes after the operation is completed without taking up the system thread.        Goroutine is added to the run queue when new or restored, waiting for M to take out and run.                M (machine) m is the head text of the Golang, which is equivalent to the system thread in the current version.                        M can run two kinds of code: * Go code, that is, goroutine,m run go code requires a p.                * Native code, such as blocked Syscall,m, does not require p to run native code.                M takes g out of the run queue and then runs G, and if G runs or goes into hibernation, the next G is removed from the running queue and repeats itself.                Sometimes G needs to invoke some native code that cannot avoid blocking, when M releases the held p and goes into a blocking state, and the other m gets the p and continues to run the G in the queue.        Go needs to ensure that there is enough m to run G, do not let the CPU idle, also need to ensure that the number of m not too much.                           P (Process) p is the header of the process and represents the resources required for M to run G.                       Although the number of P is equal to the number of CPU cores by default, it can be modified by the environment variable Gomaxproc, which is not associated with the CPU core at the actual run time. P can also be understood as a mechanism to control the parallelism of the Go code, * If the number of P equals 1, there is currently a maximum of one thread (M) to execute the go code; * If the number of P equals 2, the current maximum is only two                    Thread (M) to execute the GO code.                The number of threads executing native code is not controlled by P.            Because only one thread at a time (M) can have data in P,p is lock free, and reading and writing this data is very efficient.            Note: Each p maintains a local run queue, and there is a global run queue in addition to each p owning a local run queue.            Why do you need to create a user space scheduler? The POSIX threading API is a very large logical extension of the existing UNIX process model, and threads get a veryThe same control as the process.                For example, the thread has its own signal mask, the thread can be given the CPU affinity function (that is, the specified thread can only run on a certain CPU), the thread can be added to [cgroups], the resources used by the thread can be queried.                All of these controls increase the cost of features that the Go program doesn't need at all when using goroutines, which can grow sharply when your program has 100,000 threads.                Another problem is that, based on the go model, the operating system cannot make particularly good decisions.                For example, when running a garbage collection, Go's garbage collector requires that all threads be stopped and that memory be in a consistent state.                This involves waiting for all run-time threads to arrive at a point where the system knows beforehand that the memory is consistent at this point.                When many of the dispatched threads are scattered at random points, the result is that you have to wait for most of them to arrive in a consistent state.                The Go Scheduler can make this decision by scheduling only at the point where the memory remains consistent.                This means that when the program is stopped for garbage collection, the program only has to wait for a thread that is actively running on a CPU core. There are currently three common threading models: one is N:1, that is, multiple user-space threads are running on an OS thread.                This model can be quickly context-switched, but it cannot take advantage of the multi-core system (multi-core systems). Another model is 1:1, and one thread that executes the program matches one OS thread.                This model can take advantage of all the cores on the machine, but the context switch is very slow because it has to fall into the OS (trap through the OS). Go attempts to gain the full advantage of both worlds through the M:N scheduler. It calls any number of goroutines on any number of OS threads. You can quickly make context switches and take advantage of all the core strengths of your system.            The main disadvantage of this model is that it increases the complexity of the scheduler.          Principle: The number of p in the initialization is determined by gomaxprocs; The procedure is to add g;      The number of G exceeds the processing capacity of M, and there is free p, the runtime will automatically create a new m; m get p to work, take the Order of G: local run Queue > global run Queue > Other P's Run queue, if all running queues are not                The available g,m will return p and go to sleep. A G in case of blocking events such as blocking, g occurs context switch condition: * system call; * Read and Write Chan Nel; * gosched actively give up, will throw G into the global queue; When a G is blocked, M0 gives up p, and M1 takes over its task queue, and when M0 executes a blocking call and then throws G0 to the global queue, it                Go to sleep (because there is no p, unable to work). Example: When G0 calls a system call.                        Because a thread cannot execute code while blocking to a system call, it is necessary to hand over the p corresponding to this thread so that the P can be dispatched. M0 gave up its p to ensure that the other M1 could run it.                        The scheduler ensures that there are enough threads to run all p.                        M1 may only be created by the system in order to handle G0 system calls, or it may come from a thread pool.                        This thread M0 in the system call will remain on the G0 that caused the system call, because technically it is still executing, though it is blocked in the OS.                        When this system call returns, the thread must attempt to obtain a P1 to run the returned G0, and the normal mode of operation is to "steal" a PN from one of the other threads in the other thread, MN.                        If "steal" is unsuccessful, it puts its G0 in a global run queue, then puts itself in the thread pool or goes to sleep.                        This global run queue is where each p is used to acquire a new G after running its own local run queue. Each p also periodicallyCheck the G on this global run queue, otherwise, G on the global run queue may be starved for execution. Note: The reason the Go program runs on multiple threads is because the system calls are processed, even if Gomaxprocs equals 1.                    The runtime uses the goroutines called by the calling system instead of the thread. "Stealing": When a P runs out of all the G to be dispatched.                        If the number of G in the local run queue of each p is unbalanced, the change will occur, or it will cause a P1 to end after it executes the G in its local run queue, although there are still many G to execute in the system.                        So in order to keep running the Go code, a p can get G from the global run queue, but if there is no G in the global run queue, then P will have to get G from the other PN's run queue. When a P completes its task, it tries to "steal" half of the G in the other P-run queue.                    This ensures that each p always has a live dry, and then in turn ensures that all thread m is at the maximum load possible.                    Note: Goroutine is based on preemptive scheduling, a goroutine up to 10ms will be replaced next. This is similar to the current CPU schedule for the mainstream system (fragmented by time) windows:20ms linux:5ms-800ms
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.