Why Goroutine can have millions, Java threads can only have thousands of?

Source: Internet
Author: User

  Author |russell cohen  Translator | Rate   This article analyzes the differences between Java and Golang in the underlying principles, analyzing why Java can only create thousands of threads, and Golang can have millions of goroutines, and the realization principle of the two is analyzed in context switch and stack size.   Many experienced engineers will see this error when using JVM-based languages:  [error] (run-main-0) java.lang.OutOfMemoryError:unable to create native Thread: [ERROR] java.lang.OutOfMemoryError:unable to create native thread: [ERROR]     at java.base/java.lang.t Hread.start0 (Native Method) [ERROR]     at Java.base/java.lang.thread.start (thread.java:813) ... [ERROR]     at Java.base/java.lang.thread.run (thread.java:844)   Uh, this is caused by threading outofmemory. When running the Linux operating system on my laptop, this error occurs after only 11,500 threads have been created.   If you do the same thing on the Go language and start the goroutines that is always dormant, you will see very different results. On my laptop, I was able to create 70 million goroutines before I felt really bored. So why is the number of goroutines far beyond the thread? To uncover the answer to the question, we need to go down a round trip down the operating system. It's not just an academic issue, it has a real impact on how you design software. In a production environment, I have had multiple encounters with JVM thread limitations, some because of bad code leak threads, or because engineers are unaware of the JVM's threading limitations.   What exactly is a thread?   The term "threading" can be used to describe many different things. In this article, I'll use it to refer to a logical thread. That is, a series of operations in a linear order; a logical path for execution. CPU for eachA core can only really concurrently execute one logical thread at a time [1]. This poses an inherent problem: if the number of threads is greater than the number of cores, then some threads must be paused for the other thread to run their work, and the task will be restored when it is its turn to execute. To support pausing and resuming, the thread needs at least the following two things:
    1. Some kind of instruction pointer. That is, what line of code am I executing when I pause?
    2. a stack. That is, what is my current state? The stack contains a local variable and a pointer to the heap allocated by the variable. All threads in the same process share the same heap [2].
At two points above, the system has enough information to suspend a thread, allow other threads to run, and then restore the original thread again when the thread is dispatched to the CPU. This operation is generally completely transparent to the thread. From the thread's point of view, it is running continuously. The only way a thread can perceive a rescheduling is to measure the timing between successive operations [3].   Back to our most primitive question: why do we have so many goroutines?  JVM using operating system threads   Although not required by the specification, all modern, generic JVMs I know have delegated threads to the platform's operating system threads for processing. In the next section, I will use the user space thread to refer to a thread that is dispatched by the language instead of the thread that the kernel/os. The threads implemented by the operating system have two properties that greatly limit the number of them that can exist, and any solution that maps the language thread and the operating system thread to 1:1 cannot support large-scale concurrency.   In the JVM, fixed-size stacks   using operating system threads will cause each thread to have a fixed, large memory cost   Another major problem with operating system threads is that each OS thread has a fixed-size stack. Although this size is configurable, in a 64-bit environment, the JVM allocates 1M stacks for each thread. You can set the default stack space to be smaller, but you need to weigh the use of memory because it increases the risk of stack overflow. The more recursion you have in your code, the more likely the stack overflow will occur. If you keep the default value, then 1000 threads will use 1GB of RAM. Although RAM is much cheaper now, few people will be preparing terabytes of RAM to run millions of threads.  go's behavior is different: The dynamic-size stack  golang takes a clever trick to prevent the system from running out of memory because it runs a large (mostly unused) stack: The stack of Go is dynamically allocated and grows and shrinks as the amount of data is stored. This is not an easy thing to do, and its design has gone through multiple rounds of iterations [4]. I'm not going to explain the internal details (about this, there are a lot of blog posts and other materials in detail), but the conclusion is that each new goroutine only about 4KB stack. Each stack is only 4KB, so on a 1GB RAM, we can have 2.5 million goroutine, which is a huge boost relative to the 1MB per thread in Java. In the JVM: Latency of context Switches  From the perspective of context switching, there can be only tens of thousands of threads using an operating system thread   because the JVM uses operating system threads, it relies on the operating system kernel to dispatch them. The operating system has a list of all the processes and threads that are running, and tries to assign them a "fair" CPU Run Time [5]. There's a lot of work to do when the kernel switches from one thread to another. New running threads and processes must abstract the fact that other threads are also running on the same CPU. I will not discuss the details here, but if you are interested, you can read more materials. The important thing here is that the switching context consumes 1 to 100 microseconds. This does not seem to be much time, and it is relatively realistic to switch between 10 microseconds at a time, and if you want to schedule at least one thread per second, then only about 100,000 threads can be run on each core. This does not actually give the thread time to perform useful work. The  go behavior is different: running multiple Goroutines golang on an operating system thread implements its own scheduler, allowing numerous goroutines to run on the same OS thread. Even if Go runs the same context switch as the kernel, it can avoid switching to ring-0 to run the kernel and then switch back, which saves a lot of time. However, this is only a paper analysis. More complex things need to be done to support millions of goroutines,go.   Even if the JVM puts threads into user space, it cannot support millions of threads. Suppose that in such a new design system, the switch between new threads requires only 100 nanoseconds. Even if all you do is context switches, you can only run about 1 million threads if you want to schedule each thread 10 times per second. More importantly, in order to do this, we need to make the most of the CPU. Another optimization is required to support true concurrency: When you know that a thread can do useful work, you need to dispatch it. If you run a large number of threads, only a small number of threads will perform useful work. Go is achieved with the integrated Channel and Scheduler (scheduler). If a goroutine waits on an empty channel, the scheduler sees this and does not run the goroutine. Go is one step closer to putting most of the idle threads on its operating system thread. In this way, the active goroutine (which is expected to be much less) is scheduled to execute on the same thread, and millions of the most dormant goroutine are processed separately. This helps reduce latency.   Unless Java adds language features, allowsIt is not possible to support intelligent scheduling if the scheduler is to be observed. However, you can build the runtime scheduler in user space, which can sense when a thread can perform its work. This forms the basis of a framework like Akka, which can support millions of actor[6].     conclusions      the transition between the operating system threading model and the lightweight, user-space threading model continues to occur, [7] The future is likely to continue. This is the only option for highly concurrent user scenarios. However, it is quite complex. If Go chooses to use OS threads instead of its own scheduler and incremental stack mode, then they can reduce the code by thousands of lines at run time. For many user scenarios, this is really a better model. Complexity can be abstracted by language and library writers, so that software engineers can write large numbers of concurrent programs.    Supplemental Materials
  1. Hyper-Threading doubles the core effect. The instruction Stream (instruction pipelining) can also increase the parallel effect of the CPU. But for the moment, it's still O (Numcores).
  2. There may be some special scenarios in which this assertion is incorrect, and I think someone will remind me of that.
  3. This is actually an attack. JavaScript can detect minor differences in timing caused by keyboard interrupts. The malicious site uses it to listen for timings, rather than listening for keystrokes. See also: Https://mlq.me/download/keystroke_js.pdf.
  4. Golang first uses a segmented stack model in which the stack actually expands to a separate memory area, which is tracked using a very clever logging feature. The subsequent implementation improves performance in a specific scenario, replacing the stack with a continuous stack, which is much like resizing a Hashtable, allocating a new, larger stack, and using some very skilful pointers, all of which can be carefully copied into the new, larger stack.
  5. Threads can flag priorities by calling Nice (see man Nice), which gives them greater control over how often they are dispatched.
  6. The Actor implements the same purpose for Scala/java as goroutines by supporting large-scale concurrency. Similar to Goroutines, the actor scheduler can see which actor has messages in their Inbox, and only run actors that can perform really useful work. The number of actors we can have can even exceed goroutines, because actors don't need stacks. However, this also means that if the actor cannot process the message quickly, the scheduler will block (because the actor does not have its own stack, so it cannot pause during the actor's process of processing the message). A blocked scheduler means that the message cannot be processed and the system will soon have problems. This is a trade-off.
  7. In Apache, each request is handled by an OS thread, which restricts Apache from effectively handling only thousands of concurrent connections. Nginx chooses another model, an OS thread that can handle hundreds or even thousands of concurrent connections, allowing for a higher level of concurrency. Erlang uses a similar model, which allows millions of actors to execute concurrently. Gevent brings a greenlet (user-space thread) to Python, which can achieve a higher degree of concurrency than ever before (Python threads are OS threads).
Original link: https://rcoh.me/posts/why-you-can-have-a-million-go-routines-but-only-1000-java-threads Course recommendation Many people have heard that continuous delivery can improve efficiency, but to say how good, how thorough, estimated many people will look at each other, I will combine my personal years of practical experience to share with you how to design, implement and landing. Limited time offer 45 yuan, last two days!

Why Goroutine can have millions, Java threads can only have thousands of?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.