Gossip high concurrency of those myths, see how the Jingdong architect pulled it down the altar

Last Update:2018-09-04 Source: Internet

Author: User

Tags epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Come here to find like-minded small partners!

High concurrency is also a hot word for the past few years, especially in the Internet circle, the openings do not talk about a high concurrency problem, are embarrassed to go out. Does high concurrency have that damndest? At the same rate, tens of thousands of concurrent, billions of traffic, it sounds really scary. But think about it, so big concurrency and traffic is not all through the router?

>>>>

0x00 everything originated from the NIC

High concurrency of traffic through the low-key router into our system, the first hurdle is the network card, how to resist high concurrency network card? This problem does not exist at all, and in the network card, the same as the same, are electrical signals, network card in the eyes of the fundamental difference is not come out you are tens of thousands of concurrent or a torrent, so the measurement network card cow is said bandwidth, never concurrency volume of the argument.

Network card is located in the physical layer and link layer, the end of the data to the network layer (IP layer), the network layer has an IP address, has been able to recognize that you are tens of thousands of concurrent, so the network layer can be proud to say, I solved the high concurrency problem, can come out blowing bragging. Who's got a network layer? The protagonist is the router, this thing is mainly playing network layer.

>>>>

0x01 confused

Non-professional, we generally put the network layer (IP layer) and the Transport Layer (TCP layer) together, the operating system provided, for us is transparent, very low-key, very reliable, so that we have ignored him.

The blown cow starts from the application layer, and all of the application layers originate from the socket, which will eventually pass through the transport layer into thousands of sockets, the ones that have blown, but how to handle these sockets quickly. What is the difference between processing the IP layer data and processing the socket?

>>>>

0x02 no connection, no use waiting

The most important difference is that the IP layer is not connection-oriented, and the socket is connection-oriented, the IP layer does not have the concept of connection, in the IP layer, to a packet to deal with a, without the end, and the socket must be indecisive, socket is connection-oriented, contextual, Read a sentence I love you, excited half a day, you do not look backwards and forwards, is blind excitement.

You want to look back and forth to understand, to occupy more memory to memory, it takes longer to wait, different connections to do a good job of isolation, it is necessary to allocate different threads (or association). All of these are solved well, seemingly still a bit difficult.

>>>>

0x03 thanks to the operating system

The operating system is a good thing, on the Linux system, all the IO is abstracted into a file, network IO is no exception, is abstracted into a socket, but the socket is not only an IO abstraction, it also abstract how to handle the socket, The most famous is the Select and Epoll, well-known nginx, Netty, Redis are based on Epoll, these three guys are basically in tens of thousands of concurrent areas of the necessary divine skill.

But many years ago, Linux only provided select, which can handle a very small amount of concurrency, and Epoll is designed for high concurrency, thanks to the operating system. However, the operating system does not solve all the problems of high concurrency, just let the data into our application quickly from the network card, how to deal with is the persistent.

One of the operating system's mission is to maximize the ability to play the hardware, solve the high concurrency problem, which is the most direct and effective solution, followed by distributed computing. The Nginx, Netty, and Redis we mentioned earlier are examples of maximizing hardware capabilities. How to maximize the ability to play the hardware?

>>>>

0x04 Core Contradictions

To maximize the ability to play the hardware, the first to find the core contradiction lies. I think that this core contradiction from the beginning of the birth of the computer until now, almost no change, is the CPU and IO between the contradiction.

The CPU was brutally developed at the speed of Moore's law, while the IO device (disk, NIC) was lackluster. The turtle speed IO device becomes a performance bottleneck, which inevitably leads to low CPU utilization, so increasing CPU utilization is almost synonymous with the ability to perform hardware.

>>>>

0X05 interrupts and caches

CPU and IO Device collaboration is basically interrupted, such as the operation of the read disk, the CPU is just a read disk-to-memory instructions to disk drive, and then immediately return, when the CPU can continue to do other things, read disk to memory itself is a time-consuming work, such as disk drive execution of instructions, An interrupt request is sent to the CPU, telling the CPU that the task is complete and the CPU is processing the interrupt request, at which point the CPU can directly manipulate the data read into the memory.

The interrupt mechanism allows the CPU to handle IO problems at minimal cost, so how to improve the utilization of the equipment? The answer is cache.

The operating system maintains a cache of IO device data, including read and write caches, and read caching is easy to understand, and we often use caching at the application level to avoid generating read IO as much as possible.

The Write Cache application layer uses a few, operating system write cache, which is designed to improve the efficiency of IO writing. The operating system merges and dispatches the cache when it writes IO, such as a write disk that uses an elevator scheduling algorithm.

>>>>

0x06 Efficient use of network cards

The first thing to solve in high concurrency is how to use the network card efficiently. The network card and the disk, the internal also has the cache, the NIC receives the network data, first stores the net card cache, then writes the operating system kernel space (memory), our application reads in the memory the data, then processes.

In addition to the network card has a cache, the TCP/IP protocol also has a send buffer and a receive buffer, as well as the SYN backlog queue, accept backlog queue.

These caches, if not properly configured, can cause a variety of problems. For example, in TCP connection phase, if the concurrency is too large, and nginx inside the socket set value is too small, it will cause a large number of connection requests failed.

If the network card cache is too small, when the cache is full, the network card will directly discard the newly received data, resulting in packet loss. Of course, if our application is not efficient at reading the network IO data, it will accelerate the stack of network card cache data. How to read network data efficiently? Epoll is now widely used in Linux.

The operating system abstracts the IO device into a file, and the network is abstracted into the socket,socket itself as a file, so the Read/write method can be used to read and send the network data. In high concurrency scenarios, how to efficiently use sockets to quickly read and send network data?

To make efficient use of IO, you must understand the IO model at the operating system level, and in the Classic of UNIX network programming, we summarize five IO models, namely, blocking IO, non-blocking IO, multiplexed io, signal-driven IO and asynchronous IO.

>>>>

0x07-Blocking IO

For example, when we call the Read method to read the data on the socket, if the socket read cache is empty at this point (no data is sent from the other end of the socket), the operating system suspends the thread that calls the Read method until there is data in the socket read cache. The operating system then wakes the thread.

Of course, the Read method also returns the data as it wakes up. I understand that the so-called blocking is whether the operating system hangs threads.

>>>>

0x08 non-blocking IO

For non-blocking IO, if the socket read cache is empty, the operating system does not suspend the thread calling the Read method, but immediately returns a eagain error code, in which case the Read method can be polled until the data is read by the socket's read cache. The disadvantage of this approach is that it consumes a lot of CPU.

>>>>

0x09 multiplexing IO

For blocking IO, because the operating system suspends the calling thread, if you want to process multiple sockets at the same time, you must create multiple threads accordingly, threads consume memory and increase the load on the operating system for thread switching, so this pattern is not suitable for high concurrency scenarios. Is there a way to lower the number of threads?

Non-blocking IO seems to be able to solve, polling multiple sockets in a thread, it seems to be able to solve the problem of the number of threads, but in fact, this scheme is invalid, because the call to the Read method is a system call, the system call is implemented by a soft interrupt, resulting in user-state and kernel-state switching, so very slow.

But this idea is right, there is no way to avoid system calls it? Yes, it is multiplexed io.

On the Linux system select/epoll these two system APIs support multiplexing Io, through these two APIs, a system call can monitor multiple sockets, as long as a socket read cache has data, the method immediately returns, Then you can read this readable socket, if all the socket read cache is empty, it will block, that is, the thread that calls Select/epoll is suspended.

So select/epoll is essentially blocking IO, but they can monitor multiple sockets at the same time.

>>>>

0x0A the difference between select and Epoll

Why are there two system APIs for the multiplexed IO model? I analyze the reason that select is defined in the POSIX standard, but performance is not good enough, so the various operating systems have introduced better performance APIs, such as Epoll on Linux, IOCP on Windows.

As for why the select is slow, we are more acceptable for the reason there are two points, the point is that after the Select method returns, it is necessary to traverse all the monitored sockets, rather than the change of the ssocket, and one thing is to call the Select method every time, You need to copy the bitmap of the file descriptor in both the user and kernel states (by calling three times the Copy_from_user method to copy the read, write, and exception three bitmaps). Epoll can avoid the two points mentioned above.

>>>>

0x0B Reactor multithreaded Model

In the Linux operating system, the most reliable and stable IO mode is multiplexing, how can our application use a lot of multiplexed io? After many years of practice summary, made a reactor model, the current application is very extensive, the famous Netty, Tomcat NiO is based on this model.

The core of reactor is the event dispatcher and event handlers, which are the backbone of the multiplexed IO and network data processing, the core of which is the listener socket event (select/epoll_wait), and then distribute the event to the event handler. Both the event dispatcher and the event handler can be based on the thread pool.

It is important to mention that there are two major types of events in the socket event, one is the connection request, the other is a read-write request, the connection request is successfully processed and a new socket is created, and the read-write request is based on the newly created socket.

So in the network processing scenario, the implementation of the reactor mode will be a little bit around, but the principle has not changed. Specific implementations can refer to Doug Lea's "Scalable IO in Java" (http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf)

Reactor schematic diagram

>>>>

0x0C Nginx Multi-process model

Nginx By default is a multi-process model, Nginx is divided into master process and worker process, is really responsible for listening to network requests and processing requests only the worker process, all worker processes are listening to the default port 80, However, each request is processed only by a worker process.

The trick is that each process must scramble for a lock before receiving the request, and the process that gets the lock has permission to handle the current network request. There is only one main thread for each worker process, and the advantage of single threading is that there is no lock handling and no lock processing of concurrent requests, which is basically the highest level in high concurrency scenarios. (Refer to Http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf)

Data through the network card, operating system, network protocol middleware (TOMCAT, Netty, etc.), and finally to our application developers hands, how do we handle these high concurrent requests? We should consider this problem from the point of view of upgrading the processing capability of single machine.

>>>>

0x0D Breakthrough Barrel Theory

According to the network card, operating system, middleware (Tomcat, Netty and so on) many hurdles, finally to our application developers hands, how do we handle these high concurrent requests?

We still first from the perspective of upgrading single-machine processing ability to think about this problem, in the actual application of the scene, the focus of the problem is how to improve the CPU utilization (who called it the fastest development), the wood barrel theory of the shortest plate decision water level, then why not improve the utilization of the short board IO, but to improve the CPU utilization?

The answer to this question is that in practical applications, increasing CPU utilization tends to increase the utilization of IO at the same time. Of course, there is no point in improving CPU utilization when IO utilization is nearing its limit. Let's take a look at how to improve CPU utilization, and then see how to improve the utilization of IO.

>>>>

0x0E Parallel and concurrency

Increased CPU utilization The Main method is to use CPU multicore for parallel computing, parallel and concurrency is different, on a single-core CPU, we can listen to MP3, side coding, this is concurrency, but not parallel, because in the single-core CPU field of view, It is impossible to listen to MP3 and coding at the same time.

Parallel computing is only possible in the multi-core era. Parallel computing This thing is too advanced, the model of industrial application mainly has two kinds, one is the shared memory model, the other is the message delivery model.

>>>>

0x0F Multithreaded design mode

For the shared memory model, its basic principle comes from the master Dijkstra in a half-century ago (1965) A paper "Cooperating sequential Processes", this paper proposed the famous concept of signal volume, The wait/notify used in Java for thread synchronization is also an implementation of the semaphore.

Master's things do not understand, learn not to feel ashamed, after all, the master of antecedents children also few. Toyo has a call Jcs Hao's summed up the experience of multi-threaded programming, wrote a book called "Java Multithreaded design mode", this is quite grounded gas (can read). Let's take a brief look.

1. Single Threaded execution

This mode is to turn multi-threading into a single thread, multi-threaded at the same time access to a variable, there will be a variety of inexplicable problems, this design mode directly to the multithreading into a single-threaded, so security, of course, performance is down. The simplest implementation is to use synchronized to protect the code blocks (methods) that have security implications. In the concurrency domain there is a critical section (Criticalsections) concept, and I feel that this pattern is one thing.

2. Immutable Pattern

If the shared variable is never changed, then there is no problem with multiple thread accesses, and it is always safe. This model is simple, but good, can solve a lot of problems.

3. Guarded suspension Patten

This pattern is actually a wait-notification model that suspends the current thread (wait) when the thread execution condition is not met, wakes up all waiting threads (notifications) when the condition is met, and uses synchronized,wait/in the Java language Notifyall can quickly implement a wait-notification model. Jcs this model to the multi-threaded version of If, I think very appropriate.

4. balking

This pattern is similar to the previous one, with the difference being that it exits directly when the thread execution condition is not satisfied, rather than suspending it as in the previous pattern. The most widely used scenario is a multithreaded version of the Singleton pattern, where objects are created (without satisfying the conditions of the created object) and no more objects (exits) are created.

5. Producer-consumer

Producer-consumer model, known all over the world. The most I touch is a threading io (such as querying a database), and one (or more) threads processing IO data so that both IO and CPU can be used as components. If both the producer and the consumer are CPU-intensive, then the producer-consumers are themselves making trouble for themselves.

6. Read-write Lock

Read-write lock solves the performance problem in a few scenarios, and supports parallel reads, but the write operation allows only one thread to do it. If the write operation is very small and the amount of read concurrency is very large, consider using copy on write technology at this time, and I personally feel that I should separate the write-time copy as a single pattern.

7. Thread-per-message

is a request that we often refer to as a thread.

8. Worker Thread

An upgraded version of the request thread that solves the performance problems caused by the frequent creation and destruction of threads using the threading pool. The Bio-era Tomcat is the model used.

9. Future

When you call a time-consuming synchronization method that is annoying and wants to do something else at the same time, consider using this pattern, which is essentially a synchronous mutation step converter. Synchronization can become asynchronous, essentially starting another thread, so this pattern and a request thread is somewhat related.

Ten. Two-phase termination

This pattern solves the need to gracefully terminate threads.

Thread-specific Storage

Thread local storage, to avoid locking, unlock the cost of the weapon, C # inside a support concurrency container concurrentbag is the use of this mode, the planet's fastest database connection pool HIKARICP borrowed from the implementation of Concurrentbag, made a Java version, Interested students can refer to.

Active Object (This does not speak)

This model is equivalent to the last palm of the 18 palm of the dragon, combined with the design pattern of the front, a bit complex, the personal feel that the significance of reference is more than the implementation.

Recently the Chinese have also had several related books, but overall still is Jcs Hao this book more can withstand scrutiny. Based on the shared memory model to solve the concurrency problem, the main problem is to use a good lock, but with a good lock, or difficult, so later someone made a message delivery model, this later chat.

Based on the shared memory model to solve the concurrency problem, the main problem is to use a good lock, but with a good lock, or difficult, so later someone made a message delivery model.

>>>>

0X10 Message Delivery model

Shared memory model difficulty is still very big, and you can not theoretically prove that the program is correct, we always accidentally write a deadlock program, whenever there is a problem, there will always be a master out, So the messaging (message-passing) model was born (in the 70 's), and the messaging model had two important branches, one actor model and one CSP model.

>>>>

0X11 Actor Model

The actor model became famous for Erlang, and later Akka appeared. In the actor model, there is no concept of the process, thread in the operating system, all actors, and we can think of the actor as a more versatile, better-used thread.

Within the actor is linear processing (single threaded), the actor in a message interaction, that is, the actor is not allowed to share data, without sharing, there is no need to lock, which avoids the various side effects of the lock.

There is no difference between the creation of an actor and the new object, which is very fast, small, and not as slow and expensive as the creation of a thread, and the actor's dispatch is not like a thread that causes the operating system context to switch (mainly the preservation and recovery of various registers), so the scheduling consumes very little.

Actor also has a somewhat controversial advantage, the actor model closer to the real world, the real world is also distributed, asynchronous, message-based, especially actor for exception (failure) processing, self-healing, monitoring, etc. are more in line with the real world logic.

But this advantage has changed the thinking habits of programming, most of our current programming thinking habits and the real world have a lot of differences (this back to detail), generally speaking, change our thinking habits of things, resistance is always beyond our imagination.

>>>>

0x12 CSP Model

Golang support the CSP model at the language level, a sensory difference between the CSP model and the actor model is that in the CSP model, the producer (the message sender) and the consumer (message receiver) are completely loosely coupled and the producer is completely unaware of the consumer's presence, but in the actor model, Producers must know the consumer, otherwise there is no way to send messages.

The CSP model is similar to the producer-consumer model that we mentioned in multithreading, the core difference I think is that the CSP model has something like green thread, green thread in Golang is called the co-process, the co-process is also a very lightweight scheduling unit, Can be created quickly and resource consumption is low.

Actor to some extent need to change our way of thinking, and the CSP model does not seem so big, more easily accepted by the current developers, all say Golang is an engineering language, in the actor and CSP choice, can also see this embodiment.

>>>>

0x13 Diverse Worlds

In addition to the messaging model, there is an event-driven model, a functional model. The event-driven model is similar to the observer pattern, in which the producer of the message must know that the consumer can send the message, and in the event-driven model, the consumer of the event must know the producer of the message to register the event-handling logic.

Akka consumers can cross the network, the specific implementation of the event-driven model, such as VERTX, consumers can also subscribe to cross-network events, from this point of view, everyone is complementary.

Welcome to 1-10 years of Java engineers Friends join Java Advanced Architecture: 828545509

The Group provides free learning guidance structure materials and free answers

Do not know the problem can be raised in the group after the career planning and interview guidance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More