Why does the Go language allow millions other goroutines, while Java only allows level thousands of threads?

Source: Internet
Author: User
Original link: [https://rcoh.me/posts/why-you-can-have-a-million-go-routines-but-only-1000-java-threads/] (https:// rcoh.me/posts/why-you-can-have-a-million-go-routines-but-only-1000-java-threads/) Many have had a JVM Programmers with relevant language experience may have encountered the following questions: "[ERROR] (run-main-0) java.lang.OutOfMemoryError:unable to create native Thread:[error] Java.lang.OutOfMemoryError:unable to create native Thread:[error] at java.base/java.lang.thread.start0 (native Method) [ERROR] at Java.base/java.lang.thread.start (thread.java:813) ... [ERROR] at Java.base/java.lang.thread.run (thread.java:844) ", exceeding the Thread limit causes memory overflow. Run on Linux on the author's laptop, which typically occurs when 11,500 or so of the thread is created. But if you use the Go language to do similar attempts, each creating a goroutine and letting it Sleep permanently, you will get a completely different result. In the author's notebook, The go language creates about 70 million goroutine before the author waits impatiently. Why can we create more goroutines than thread? Answering this question requires returning to the operating system level for a pleasant exploration. This is not just an academic issue---in the real world It also reveals how software is designed. In fact, the author has encountered many times when the JVM's thread reaches the upper limit, either because the garbage code causes the thread to leak, or because some development engineers have no idea that the JVM has a thread limit. # # * * So what exactly is Thread? * * "Thread" itself can actually represent a lot of different meanings. In this article, the author describes it as a logicalThread. The Thread consists of a series of instructions (operations) that can be executed in a linear order, and a path that can be logically executed. Each Core in the CPUs can only really concurrently execute a logic thread<sup>[1]</sup> at the same time. This results in a conclusion: if your number of threads is larger than the number of cores in the CPU, a portion of the threads must be paused to allow the other threads to work until the threads reaches a certain time before it resumes execution. While pausing and resuming a thread, you need to log at least two things: 1. The position of the currently executing instruction. Also known as: The line of code that the thread is executing when the current thread is paused; 2. A stack space is also required. Also think: This stack space holds the state of the current thread. A stack contains local variables, which are pointers to heap memory (this is for Java, non-pointers are stored for C/s + +). All the threads in a process are <sup>[2]</sup> that share a heap of memory. With the above two things, the CPU when the thread is dispatched, there is enough information, you can pause a thread, dispatch other thread to run, and then resume the suspended thread to continue execution. These operations are usually completely transparent to the thread. From the thread point of view, it has been running continuously. The only way that the thread is canceled to be dispatched can be observed is to measure the time of the subsequent operation <sup>[3]</sup>. Let's go back to the original question, why can we create so many goroutinues? # # # **JVM is using the operating system thread** Although the specification does not require all modern generic JVMs, the thread in all modern, universal-purpose JVMs currently on the market is designed to be the thread of the operating system, as I know it. Below, I will use the concept of "User space threads" to refer to threads that are dispatched by the language instead of being dispatched by the operating system kernel. The operating system level implementation of threads mainly has the following two points limit: first limit the total number of threads, followed by the language level of thread and the operating system layer thread 1:1 mapping scenario, there is no support for massive concurrency solution. # # # **JVM fixed stack size use actionSystem level thread, each thread requires a lot of static memory * * The second use of the operating system level of thread brings the problem is that each thread requires a fixed stack of memory. Although this memory size can be configured, in a 64-bit JVM environment, a thread uses 1MB of stack memory by default. While you can make the default stack memory size smaller, you weigh memory usage, which increases the risk of stack overflow. The greater the number of recursion in your code, the more likely it is to trigger a stack overflow. If you use the stack default of 1MB, then create 1000 threads, will use 1GB of RAM, although RAM is now very cheap, but if you want to create 100 million threads, you need t level of memory. # # # **go: dynamic-size stack **go language to avoid using too much stack memory (mostly unused) causes memory overflow, using a very clever trick: the stack size of Go is dynamic, with the size of the stored data growing and shrinking. This is not a trivial thing, and this feature has been developed over several iterations <sup>[4]</sup>. Many other people's articles on the Go language have been described in detail, this article does not intend to discuss the internal details here. The result is a new goroutine that actually consumes only 4KB of stack space. A stack that consumes only 4KB,1GB of memory can create 2.5 million goroutine, which is really a big improvement compared to the Java one stack that consumes 1MB of memory. # # # The context switch in the JVM is very slow * * The maximum ability to use the operating system's threads is generally at the million level, and the primary consumption is the delay in context switching. * * Because the JVM is a threads of the operating system, that is, threads by the operating system kernel. The operating system itself has a list of all running processes and threads, while the operating system assigns each of them a "fair" time slice of the CPU to use <sup>[5]</sup>. When the kernel switches from one thread to another, it actually has a lot of things to do. The operation of a new thread or process must begin with a world perspective, and it can abstract the fact that other threads are running on the same CPU. This article does not want to say more here, but if you are interested, you can refer to [here] (Https://en.wikipedia.org/wiki/Context_switch). (The key point of the T problem is that context switching probably requiresConsumes 1-100µ seconds. This does not seem to be time consuming, but in reality every average switch consumes 10µ seconds, and if you want to allow all threads to be called within a second, then threads can have up to 100,000 threads on a core, and in fact the threads itself There is no time to do a meaningful job of your own. **go language completely different processing: running multiple goroutines on an OS thread **golang the language itself has its own scheduling policy, allowing multiple goroutines to run on the same OS thread. Since Golang can run the context switch of the code like the kernel, it can save a lot of time to avoid switching from the user state to the ring-0 kernel and back again. But this is only superficially visible, and in fact the Goroutines,go language, which supports 1 million of the Go language, actually does more and more complex things. Even if the JVM brings threads to the user space, it still cannot support millions other threads, imagine that in your new system, switching between the thread only takes 100 nanoseconds, even if only the context switch, there is only 1 million threads per second to do 1 0 times the context of the switchover, and more importantly, you have to get your CPU to do such things at full load. Supporting true high concurrency requires another way of optimizing: When you know that the thread can do useful work, just dispatch the thread! If you are running multi-threaded, in fact, only a few threads are doing useful work at any time. The Go language introduces channel mechanisms to assist with this scheduling mechanism. If a goroutine is waiting on an empty channel, the scheduler can see this and no longer run the goroutine. At the same time go language is a step forward. It incorporates many of the most time-free goroutines into one of its own operating system threads. This allows the goroutine of the activity to be dispatched through a thread (which is much smaller), but is separated by a millions of of the goroutines in which most of the state is asleep. This mechanism also helps to reduce latency. It is not possible to support intelligent scheduling unless Java adds some language features to support scheduling of visible features. But you can build a runtime scheduler yourself in "User state" to schedule when a thread can work. Actually, this is the millions of actors<sup>[6]</sup> concurrency box that makes up Akka.The basic concept of the frame. # # * * Conclusion thinking * * future, there will be more and more from the OS level of the thread model to the lightweight user space level of the threads model migration occurs <sup>[7]</sup>. From a usage perspective, the use of advanced concurrency features is required and is the only requirement. This demand does not actually add to the complexity of the excess. If the Go language instead of operating system-level threads to replace the current scheduling and stack space self-growth mechanism, in fact, in the runtime code package to reduce the code of thousands of lines. But for most user cases, this is a better model. Complexity is well abstracted by the authors of the language library, so that software engineers can write high-concurrency programs. ---1. Hyper-Threading Technology (hyperthreading) can multiply and efficiently use CPU cores. The instruction pipeline (instruction Pipelineing) can also increase the CPU's ability to perform parallel execution, however, so far it is O (numcores). 2. This view is not tenable under certain special circumstances, if there is such a scenario, please inform the author. 3. This is actually an attack vector. Javascript can detect small differences in time caused by keyboard interrupts. This can be used by malicious websites to listen, not to interrupt your keyboard, but to use them for their time. [Https://mlq.me/download/keystroke_js.pdf] (https://mlq.me/download/keystroke_js.pdf) [4. The go language initially used the "segmented stack model", where the stack space was split into different areas of memory (the translator notes: the stack space in other languages is generally continuous) while using some very clever bookkeeping mechanisms for stack tracking. Later versions are implemented in order to improve performance, and in some special scenarios, a continuous stack is used instead of the "split stack model". Just like adjusting the hash table, allocate a new large stack space and, with some complex pointers, copy everything to the new larger stack space. 5. Threads can flag their priority by calling Nice (see man Nice) to get more information to control their scheduled dispatch. 6. In order to achieve large-scale high concurrency, the Actor and Goroutines for Scala/java users are the same. Just like Goroutines, the actors Scheduler can see which actors have messages in their mailboxes and only run actors that are ready to do useful work. You can actually have more actors,Instead of the routines you can have, because actors does not need a stack. However, this means that if an actor does not process the message quickly, the scheduler will be blocked (because the actor does not have its own stack, so it cannot pause in the middle of the message). A blocked scheduler means that there is no message handling and things stop quickly. This is a compromise approach. 7. On the Apache Web server, each request requires an OS-level Thread, so the concurrent connection performance of an Apache Web server is only thousands of levels. Nginx chooses a different model, even with one operating system level Thread to handle hundreds or even thousands of concurrent connections, allowing for a better level of concurrency. Erlang also uses a similar model, allowing millions of actors to run simultaneously. Gevent brings Python's greenlet (user-space thread) into Python to achieve a higher degree of concurrency than otherwise supported (Python threads are OS threads).

via:https://rcoh.me/posts/why-you-can-have-a-million-go-routines-but-only-1000-java-threads/

Author: Russell Cohen Translator: skyismine2010 proofreading: polaris1119

This article by GCTT original compilation, go language Chinese network honor launches

This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove

854 Reads
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.