Java and Netty achieve high-performance high concurrency __java

Source: Internet
Author: User
Tags class definition
http://blog.csdn.net/nicajonh/article/details/54985352
1. Background 1.1. Amazing performance Data

A friend of the Circle recently told me through DMS that they implemented a cross node remote service invocation of 10W TPS (1K complex Pojo objects) by using the NETTY4 + thrift compression binary codec technique. Compared to the traditional communication framework based on Java serialization +bio (synchronous blocking IO), the performance is increased by more than 8 times times.

In fact, I am not surprised by this data, according to my 5 years of experience in NIO programming, through the selection of the appropriate NIO framework, coupled with high-performance compression binary codec technology, carefully designed reactor threading model, to achieve the above performance indicators is entirely possible.

Let's take a look at how the Netty supports the Cross node remote service invocation of 10W TPS, and we will briefly introduce the next Netty before we begin to explain it formally. 1.2. Netty Basic Primer

Netty is a high-performance, asynchronous event-driven NIO framework that provides support for TCP, UDP, and file transfer, and as an asynchronous NIO framework, Netty All IO operations are asynchronous and non-blocking, through the future-listener mechanism, The user can take the initiative to obtain or obtain the IO operation result by the notification mechanism conveniently.

As the most popular NIO framework, Netty has been widely used in the field of Internet, large data distributed computing, gaming industry and communication industry, and some well-known open source components are also based on the Netty NIO framework. 2. Netty High-performance Road 2.1. Performance model analysis for RPC calls 2.1.1. The three sins of poor performance in traditional RPC invocation

Network transmission mode problem: The traditional RPC framework or RMI-based remote service (process) calls using synchronous blocking IO, when the client's concurrent pressure or network Shizhan large, synchronous blocking IO will cause frequent wait for IO thread blocking frequently, because the thread is not efficient work, IO processing capability naturally declines.

Below, we look at the drawbacks of bio communication through the Bio Communication Model diagram:

Figure 2-1 Bio Communication Model Diagram

The service side of the bio communication model, usually by a separate acceptor thread is responsible for listening to the client's connection, after receiving the client connection to create a new thread processing request message for the client connection, after processing completes, returns the reply message to the client, the thread destroys, This is a typical request-answer model. The biggest problem with this architecture is that you don't have flexible scalability, when the amount of concurrent traffic increases, the number of threads on the server side is linearly proportional to the number of concurrent accesses, because the thread is a very valuable system resource of the Java Virtual machine, the performance of the system drops dramatically after the number of threads expands, and as the concurrent volume continues to increase, A handle overflow, thread stack overflow, and so on can occur and cause the server to eventually go down.

Serialization mode problem: Java serialization has several typical problems:

1) Java serialization mechanism is a Java internal object codec technology, can not be used across languages; for example, for docking between heterogeneous systems, the code stream after Java serialization needs to be deserialized into original objects (replicas) in other languages, which is difficult to support at present;

2) compared to other open source serialization framework, Java serialization of the code stream is too large, whether the network transmission or persistent to disk, will lead to additional resource consumption;

3 poor serialization performance (high CPU resource occupancy).

Threading model problem: With synchronous blocking IO, this causes each TCP connection to occupy 1 threads, and because the thread resource is a valuable resource for the JVM virtual machine, when IO read-write blocking causes the thread to not be released in time, it can cause a sharp drop in system performance and even cause the virtual machine to fail to create a new thread. 2.1.2. Three themes for high performance

1 Transmission: What channel to send data to each other, the BIO, NiO or Aio,io model to a large extent determines the performance of the framework.

2 Protocol: What communication protocol, HTTP or internal private protocol is used. The choice of protocol is different and the performance model is different. The performance of an internal private protocol can often be better designed than a public protocol.

3 Thread: How the datagram is read. After reading the codec in which thread, the codec after the message how to distribute, the reactor threading model of the different, the impact on performance is very large.

Figure 2-2 RPC Invocation Performance three element 2.2. Netty High-performance Road 2.2.1. Asynchronous non-blocking Communication

During IO programming, multithreading or IO multiplexing technology can be used to process multiple client access requests simultaneously. IO multiplexing technology enables the system to handle multiple client requests simultaneously in a single thread by reusing multiple IO blocking to the same select block. Compared with the traditional multithreading/multi process model, I/O multiplexing has the advantage of low system overhead, no need to create new additional processes or threads, no need to maintain the running of these processes and threads, reduce the maintenance workload of the system and save the system resources.

JDK1.4 provides support for non-blocking io (NIO), jdk1.5_update10 versions use Epoll instead of traditional select/poll, which greatly improves the performance of NIO communications.

The JDK NIO communication model is shown below:

Fig. 2-3 of NiO using a multiplex model diagram

In contrast to the socket class and the ServerSocket class, NiO also provides two different socket channel implementations for Socketchannel and Serversocketchannel. Both types of new channels support both blocking and non-blocking modes. Blocking mode is very simple to use, but performance and reliability are not good, and non-blocking mode is the exact opposite. Developers can generally choose the right pattern according to their needs, and generally, low load, low concurrency applications can choose to synchronize blocking IO to reduce the complexity of programming. However, for high load and high concurrency network applications, it is necessary to develop the non blocking mode of NIO.

The Netty architecture is designed and implemented in accordance with the reactor pattern, and its server-side communication sequence diagram is as follows:

Figure 2-3 NiO service-side communication sequence diagram

The client communication sequence diagram is as follows:

Figure 2-4 NIO client communication sequence diagram

Netty IO thread Nioeventloop because of the aggregation of multiplexer selector, can concurrently handle hundreds of client channel, because the read and write operations are non-blocking, this can fully improve the operation efficiency of IO threads, Avoid thread hangs caused by frequent IO blocking. In addition, because Netty uses asynchronous communication mode, an IO thread can concurrently handle n client connection and read-write operations, which fundamentally solves the traditional synchronous blocking IO one-link model, and the architecture performance, elasticity scalability and reliability are greatly improved. 2.2.2.0 Copy

Many users have heard that Netty has "0 copies" function, but the specific embodiment of where and not clear, this section on the detailed netty "0 copy" function to explain.

The "0 copies" of Netty are mainly embodied in the following three aspects:

1 Netty receive and send Bytebuffer adopt direct buffers, use the direct memory of the heap to read and write the socket, do not need two copies of the byte buffer. If the traditional heap memory (HEAP buffers) is used to read and write the socket, the JVM copies the heap memory buffer into direct memory before writing to the socket. Compared to the direct memory in the heap, the message has a memory copy of the buffer one more time during the sending process.

2 Netty provides a combination of buffer objects, you can aggregate multiple Bytebuffer objects, users can operate a buffer as easy as the combination of buffer operation, Avoids the traditional way of copying several small buffer into a large buffer by means of memory copy.

3 Netty File Transfer adopts the Transferto method, it can send the data of the file buffer directly to the target channel, avoid the problem of memory copy caused by the traditional cyclic write method.

Below, we explain the above three kinds of "0 copies", first look at the creation of Netty receive buffer:

Figure 2-5 Asynchronous message read "0 copy"

Once a message is read every loop, the Bytebuf object is fetched through the Bytebufallocator Iobuffer method, and the following continues to look at its interface definition:

Figure 2-6 Bytebufallocator allocating heap memory through Iobuffer

In order to avoid copying a copy of the heap memory to direct memory, the Netty Bytebuf allocator creates two copies of the non heap memory avoidance buffer and improves read/write performance through "0 copies" when performing socket IO reading and writing.

Below we continue to look at the second "0 copy" Implementation Compositebytebuf, which encapsulates multiple bytebuf into a bytebuf, providing a unified, encapsulated BYTEBUF interface, its class definition is as follows:

Figure 2-7 Compositebytebuf class inheritance relationship

By inheriting relationships we can see that compositebytebuf is actually a bytebuf wrapper that combines multiple bytebuf into a single set and then provides a unified Bytebuf interface, which is defined as follows:

Figure 2-8 Compositebytebuf class definition

Add Bytebuf, do not need to do memory copy, the relevant code is as follows:

Figure 2-9 New Bytebuf "0 copies"

Finally, let's look at the "0 copies" of the file transfer:

Figure 2-10 File Transfer "0 copies"

Netty File transfer defaultfileregion through the Transferto method to send files to the target channel, the following focuses on FileChannel method of Transferto, its API Doc description is as follows:

Figure 2-11 File Transfer "0 copies"

For many operating systems it sends the contents of the file buffer directly to the target channel without the need to copy it, which is a more efficient way of delivering the "0 copies" of file transfers. 2.2.3. Memory Pool

With the development of JVM virtual machines and JIT Just-in-time compilation technology, object allocation and recycling is a very lightweight task. But for buffer buffers, the situation is slightly different, especially for the allocation and recycling of direct memory outside the heap, which is a time-consuming operation. In order to reuse buffers as much as possible, Netty provides a buffer reuse mechanism based on the memory pool. Here we look at the implementation of the Netty Bytebuf:

Figure 2-12 Memory Pool Bytebuf

The

Netty provides a variety of memory management policies that enable differentiated customization by configuring related parameters in the Startup helper class.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.