1. Background 1.1. The cost of downtime 1.1.1. Telecom Industry
In a survey of 74 operators in 46 countries, KPMG International found that the global communications industry lost about $40 billion a year, accounting for 1%-3% of total revenue. There are many factors that lead to revenue loss, the main reason is the billing bug.
1.1.2. Internet industry
August 16 3:50 P.M. to 3:55 Pacific Time (Beijing time August 17 6:50 to 6:55), Google suffered an outage. According to the post-mortem, Google lost $545,000 in just 5 minutes. That is, the service loses $108,000 per minute of interruption.
In 2013, starting from 2:45 P.M. est August 19, users took the lead in discovering that the Amazon site was down and back to normal in about more than 20 minutes. The outage allowed Amazon to lose nearly $67,000 per minute, and consumers were unable to shop through Amazon.com, Amazon Mobile and amazon.ca sites during the outage.
1.2. Software Reliability
Software reliability refers to the probability of the software running without error in a given time and under certain circumstances. Software reliability includes the following three elements:
1) Specified time: Software reliability is only reflected in its operational phase, so the run time as a measure of the time specified. The elapsed time includes the cumulative time that the software system is working and hangs (open but idle) after it has been run. Due to the randomness of software running environment and program path selection, the failure of software is random event, so the running time belongs to random variable.
2) Prescribed environmental conditions: environmental conditions refer to the operating environment of the software. It involves various supporting elements required by the software system to run, such as support hardware, operating system, other supporting software, input data format and scope, and operating procedures. The reliability of the software is different under different environmental conditions. Specifically, the specified environmental conditions are mainly described in the software system run-time Computer configuration and input data requirements, and assume that all other factors are ideal. With clearly defined environmental conditions, can also effectively determine the responsibility of software failure in the user side or provider;
3) mandated functions: Software reliability is also related to the mandated tasks and functions. Due to the different tasks to be completed, the software will run a different section, the sub-modules will be called differently (that is, the program path selection), and its reliability may be different. Therefore, to accurately measure the reliability of a software system must first clarify its tasks and functions.
1.3. Reliability of Netty
First of all, we want to analyze its reliability from the main use of Netty, there are three kinds of mainstream usage of Netty at present:
1) constructs the basic communication component of RPC call, and provides the ability of remote service invocation across nodes;
2) NIO communication framework for data exchange across nodes;
3) Other basic communication components of the application protocol stack, such as the HTTP protocol and other application-layer protocol stacks based on Netty development.
Taking Ali's distributed service Framework Dubbo as an example, Netty is the core of the Dubbo RPC framework. The example diagram of its service invocation is as follows:
Figure 1-1 Dubbo's node role description diagram
Among them, the service provider and the service caller can make RPC calls through the Dubbo protocol, and the sending and receiving of messages is done by Netty by default.
Through the analysis of Netty mainstream application scenarios, we find that the reliability problems faced by Netty are broadly divided into three categories:
1) Traditional network I/O failures, such as network flash, firewall hang connection, network timeout, etc.
2) NiO-specific faults, such as the bug in the NIO class, read and write half-packet processing anomalies, reactor thread running and so on;
3) Codec-related exceptions.
In most business scenarios, the business is often paralysed once the netty fails due to certain failures. Therefore, from the business demands, the reliability requirements of the Netty framework are very high. As one of the most popular NIO frameworks in the industry today, Netty has been widely used in different industries and fields, and its high reliability has been tested by hundreds of production systems.
How does the Netty support high reliability of the system? So, let's start with a couple of different dimensions.
2. Netty high reliability of the road 2.1. Network communication class failure 2.1.1. Client Connection timed out
In the traditional synchronous blocking programming mode, the client socket initiates the network connection, often needs to specify the connection timeout time, this is to have two main purposes:
1) in the synchronous blocking I/O model, the connection operation is synchronous blocking, if the timeout time is not set, the client I/O thread may be blocked for a long time, which will result in a decrease in the number of available I/O threads for the system;
2) Business layer needs: Most systems will have a limit on the execution time of the business process, such as the response time of the web interaction class is less than 3S. The client sets the connection time-out to implement the business tier timeout.
The JDK native socket connection interface is defined as follows:
Figure 2-1 JDK Socket Connection Timeout interface
For NiO Socketchannel, in non-blocking mode, it will directly return the connection results, if there is no successful connection, and no IO exception, you need to register Socketchannel on the selector to listen to the connection results. Therefore, the timeout of the asynchronous connection cannot be set directly at the API level, but it needs to be actively monitored by a timer.
Here we first look at the Socketchannel connection interface definition for the JDK NiO class library:
Figure 2-2 JDK NIO class Library Socketchannel Connection interface
As can be seen from the above interface definition, the NIO class library does not have an out-of-the-box connection timeout interface for users to use directly, and if you want to support connection timeouts in NIO programming, the NIO framework or user-packaged implementations are often required.
Let's look at how Netty supports connection timeouts, first of all, when you create a NIO client, you can configure the connection timeout parameter:
Figure 2-3 Netty Client creation support setting connection timeout parameters
After setting the connection time-out, Netty will create a scheduledfuture mount on the reactor thread for timed monitoring of the connection timeout when initiating the connection, as follows:
Figure 2-4 Creating a time-out monitoring timer task based on connection timeout
After the connection time-out timer task is created, the nioeventloop is responsible for execution. If the connection timed out, but the server still does not return a TCP handshake answer, close the connection, as shown in the code.
If processing completes the connection operation within the time-out period, the connection timeout task is canceled, and the associated code is as follows:
Figure 2-5 Canceling the connection time-out timer task
The Netty Client Connection timeout parameter is configured with other commonly used TCP parameters and is very convenient to use, and the upper-level user does not care about the underlying timeout implementation mechanism. This not only satisfies the user's individual requirement, but also realizes the layered isolation of the fault.
2.1.2. Communication-to-end forced shutdown connection
During normal communication between the client and the server, the TCP link will be abnormal if there is a network flash, a sudden shutdown of the opponent's process, or other non-graceful closing of the link event. Because TCP is full-duplex, both sides of the communication need to close and release the socket handle before a handle leak occurs.
During the actual NIO programming process, we often find functional and reliability issues due to the fact that the handle was not closed in time. The reasons are summarized as follows:
1) IO Read and write operations are not just concentrated within the reactor thread, and some custom behavior on the user's upper layer may lead to an out-of-order IO operation, such as a business-custom heartbeat mechanism. These custom behaviors increase the difficulty of uniform exception handling, the more the IO operation is dispersed, the more the probability of failure occurs;
2) Some of the abnormal branches do not take into account, because the external environment causes the program to enter these branches, it will cause failure.
Below we pass the fault simulation, see how Netty is handling the end link forced shutdown exception. First start the Netty server and the client, after the TCP link is established successfully, both sides maintain the link, view the link status, the result is as follows:
Figure 2-6 Netty Server and client TCP link status OK
Force the client to shut down, impersonate the client down, and the server console to print the following exception:
Figure 2-7 simulating a TCP link failure
From the stack information can be judged, the server has been monitored to the client forced to close the connection, below we see whether the server has freed the connection handle, execute the netstat command again, the results are as follows:
Figure 2-8 Viewing the fault link status
As you can see from the execution result, the server has closed the TCP connection to the client and the handle resource is released normally. It can be concluded that the fault has been handled automatically by the Netty bottom.
Let's take a look at how the Netty is aware of the link shutdown exception and handle it correctly, and look at the Abstractbytebuf Writebytes method, which is responsible for writing the buffer data of the specified channel to Bytebuf, with the detailed code as follows:
Figure 2-9 The Writebytes method of Abstractbytebuf
IOException occurred while calling Socketchannel's Read method, the code is as follows:
Figure 2-10 IO exception reading buffer data
In order to ensure that the IO exception is handled uniformly, the exception is thrown up, and the Abstractniobytechannel is handled by the uniform exception, the code is as follows:
Figure 2-11 Link Exception exit exception handling
In order to be able to unify the exception policy, also for the convenience of maintenance, to prevent improper handling of the problem caused by the handle leakage, such as the closure of the handle, the unified call Abstractchannel Close method, the code is as follows:
Figure 2-12 Unified socket handle Shutdown interface
2.1.3. Normal connection shutdown
For short-connection protocols, such as the HTTP protocol, after the data interaction between the two parties is completed, the connection is usually closed by the service side as agreed by both parties, and after the client obtains the TCP connection shutdown request, it closes its socket connection and the two parties formally disconnect.
In the actual NIO programming process, there is often a misunderstanding: think as long as the other side to close the connection, an IO exception occurs, after capturing the IO exception and then close the connection. In fact, the legitimate shutdown of a connection does not occur with an IO exception, which is a normal scenario, which causes the connection handle to leak if the scenario is omitted from judgment and processing.
Let's simulate the fault together and see how Netty is handled. The test scenario is designed as follows: After the transformation of the Netty client, the double link is established successfully, waiting for 120S, the client closes the link normally. See if the server can perceive and release handle resources.
First start the Netty client and the server, both sides of the TCP link connection is normal:
Figure 2-13 TCP connection Status OK
After 120S, the client closes the connection, the process exits, in order to be able to see the entire processing process, we set the breakpoint at the reactor thread on the server side, first do not handle, at this time the link state is as follows:
Figure 2-14 TCP connection handle waiting to be released
As can be seen, at this time the server does not close the socket connection, the link is in the close_wait state, release code let the service end execution, the result is as follows:
Figure 2-15 TCP connection handle normal release
Next we look at the server is how to determine the client to close the connection, when the connection is legitimately closed by the other side, the closed socketchannel will be in a ready state, the Socketchannel read operation return value is-1, indicating that the connection has been closed, the code is as follows:
Figure 2-16 the number of bytes read needs to be judged
If Socketchannel is set to non-blocking, its read operation may return three values:
1) greater than 0, indicating the number of bytes read;
2) equals 0, does not read the message, possibly TCP is in the Keep-alive state, receives the TCP handshake message;
3)-1, the connection has been legitimately closed by the other side.
By debugging, we find that the return value of the NIO class library is indeed-1:
Figure 2-17 Link gracefully closed with a return value of-1
Once the connection is closed, Netty the close operation bit to true and closes the handle with the following code:
Figure 2-18 Connection graceful shutdown, freeing resources
2.1.4. Fault customization
In most scenarios, when the underlying network fails, the underlying NIO framework should be responsible for releasing resources, handling exceptions, and so on. Top-level business applications do not need to be concerned with the underlying processing details. However, in some special scenarios, users may need to perceive these exceptions and customize them for such exceptions, such as:
1) The connection mechanism of the client;
2) The cache of the message is re-sent;
3) detailed fault details are recorded in the interface log;
4) operation and maintenance related functions, such as alarm, trigger mail/SMS, etc.
Netty's processing strategy is an IO exception, the underlying resource is freed by it, and the exception stack information is notified to the upper user as an event, and the exception is customized by the user. This processing mechanism not only guarantees the security of exception handling, but also provides flexible customization capability to the upper layer.
The specific interface definition and the default implementation are as follows:
Figure 2-19 Fault-specific interface
The user can override this interface to personalize the exception customization. For example, to initiate a re-connect.
2.2. Link Validity detection
When a network occurs a single pass, the connection is stuck in a firewall, a long time GC, or an unexpected exception occurs on the communication thread, the link is not available and is not readily discoverable. In particular, anomalies occur during the early morning business downturn, and when the peak business hours arrive, because the link is not available can result in an instantaneous large volume of business failures or timeouts, which will pose a significant threat to system reliability.
From the technical aspect, to solve the link reliability problem, we must periodically check the validity of the link. At present, the most popular and common practice is heartbeat detection.
The heartbeat detection mechanism is divided into three levels:
1) The heartbeat detection at TCP level, that is TCP's keep-alive mechanism, its scope is the whole TCP protocol stack;
2) The heartbeat detection of the protocol layer, mainly exists in the raised here connection protocol. such as the SMPP Agreement;
3) Application layer of heartbeat detection, it is mainly by the business products through the agreed way to send each other the heartbeat message implementation.
The purpose of heartbeat detection is to confirm that the current link is available, that the other person is alive and able to receive and send messages normally.
As a highly reliable NIO framework, Netty also provides a heartbeat detection mechanism, following which we are familiar with the detection principle of the heartbeat.
Figure 2-20 Heartbeat detection mechanism
Different protocols, heartbeat detection mechanism also exist differences, summed up mainly divided into two categories:
1) Ping-Pong-type heartbeat: sent by the communication party ping message timed, the other party received a ping message, immediately return to Pong reply message to each other, belonging to the request-response type heartbeat;
2) ping-ping type heartbeat: Do not differentiate heartbeat request and answer, by the communication between the Parties in accordance with the agreed timing to send the heartbeat Ping message, it belongs to the two-way heartbeat.
The heartbeat detection strategy is as follows:
1) consecutive n heartbeat detection has not received the other party's pong reply message or ping request message, the link has been considered logical failure, which is known as the heartbeat timeout;
2) How to read and send the heartbeat message when the IO anomaly occurred directly, indicating that the link has failed, which is known as a heartbeat failure.
Regardless of the heartbeat timeout or the heartbeat failure, you need to close the link, the client initiates the re-connect operation, to ensure that the link can return to normal.
Netty's heartbeat detection is actually implemented using the link idle detection mechanism, with the following code:
Figure 2-21 The code package path for heartbeat detection
The idle detection mechanism provided by Netty is divided into three types:
1) Read idle, link duration t not read to any message;
2) write idle, link duration t does not send any messages;
3) Read/write idle, link duration t not receive or send any messages.
The default read-write idle mechanism for Netty is to have a timeout exception and close the connection, but we can customize its time-out implementation mechanism to support different user scenarios.
The timeout interface for Writetimeouthandler is as follows:
Figure 2-22 Write timeout
The timeout interface for Readtimeouthandler is as follows:
Figure 2-23 Read timeout
Read and write idle interfaces are as follows:
Figure 2-24 Read and write idle
With the link idle detection mechanism provided by Netty, it is very flexible to realize the heartbeat detection of the protocol layer. In the private stack design and Development section of the Netty authoritative guide, I use the custom task interface provided by Netty to implement another heartbeat detection mechanism that interested friends can refer to.
2.3. Protection of reactor threads
The reactor thread is the core of IO operations, and the NIO framework engine, in the event of a failure, will cause the multiplexer and multiple links mounted on it to fail to function properly. Therefore, its reliability requirements are very high.
I have encountered a failure due to improper handling caused reactor thread to run, a large number of business requests processing failure. Let's take a look at how Netty can effectively improve the reliability of reactor threads.
2.3.1. Exception handling Be careful
Although the reactor thread primarily handles IO operations, the exception that occurs is usually an IO exception, but, in fact, a non-IO exception occurs in some special scenarios, which can cause the reactor thread to fly if only the IO exception is caught. To prevent this from happening, it is important to capture throwable in the loop, rather than an IO exception or exception.
The relevant code for Netty is as follows:
Figure 2-25 Reactor Thread exception protection
After capturing Throwable, even if an unexpectedly unknown exception occurs, the thread does not fly, it sleeps 1S, prevents the loop of the exception caused by the death cycle, and then resumes execution. The core idea of this approach is:
1) The exception of a message should not cause the entire link to be unavailable;
2) A link that is not available should not cause other links to be unavailable;
3) A process that is not available should not cause other cluster nodes to be unavailable.
2.3.2. Dead Cycle Protection
Normally, a dead loop is detectable, preventable, but not completely avoidable. Reactor threads typically deal with IO-related operations, so we focus on the IO-level dead loop.
The most famous of the JDK NiO class library is the Epoll bug, which causes selector null polling, IO thread CPU 100%, which seriously affects the security and reliability of the system.
Sun in the JKD1.6 Update18 version claimed to have resolved the bug, but according to industry testing and feedback from everyone, until the early version of JDK1.7, the bug still existed and was not fully repaired. The host resource footprint of the bug occurs as follows:
Figure 2-26 Epoll Bug CPU empty Polling
Sun is not on the issue of solving the problem, only from the NIO framework level of problem avoidance, below we see how netty is to solve the problem.
Netty's solution strategy:
1) According to the characteristics of the bug, first detect whether the bug occurred;
2) transfer the channel of the issue selector to the new selector;
3) Old problem selector closed, replace with new selector.
The following is a specific look at the code, first detect whether the bug has occurred:
Figure 2-27 Epoll Bug Detection
Once the bug has been detected, rebuild the selector with the following code:
Figure 2-28 Rebuilding Selector
After the rebuild is complete, replace the old selector with the following code:
Figure 2-29 replacing selector
The operation of mass production system shows that Netty's evasion strategy can solve the problem of CPU dead loop of IO thread caused by epoll bug.
2.4. Graceful Exit
The graceful outage of Java is usually realized by registering the shutdownhook of the JDK, when the system receives the exit instruction, it first marks the system in the exit state, no longer receives the new message, then finishes processing the backlog of messages, finally calls the resource recycling interface to destroy the resource, and the last thread exits execution.
Usually graceful exit has a time limit, such as 30S, if the execution time is still not completed before exiting the operation, then the monitoring script directly kill-9 PID, forced exit.
Netty's graceful exit feature continues to improve as the version is optimized and evolved, let's look at the graceful exit of Netty5.
First look at the reactor threads and thread groups, which provide an elegant exit interface. The Eventexecutorgroup interface is defined as follows:
Figure 2-30 Eventexecutorgroup Graceful Exit
Nioeventloop's resource release interface implementation:
Figure 2-31 Nioeventloop Resource Release
Channelpipeline The Shutdown Interface:
Figure 2-32 Channelpipeline Closing the interface
At present, the main interface and class library provided by Netty provide the interface of resource destruction and graceful exit, the user's custom implementation class can inherit these interfaces, complete the release of user resources and graceful exit.
2.5. Memory protection 2.5.1. Memory leak protection for buffers
To increase memory utilization, Netty provides a pool of memory and objects. However, the cache pool implementation requires strict management of memory requests and releases, which can easily lead to memory leaks.
If the memory pool technique is not implemented, each time the object is created in the form of a local variable of the method, the JVM is automatically freed as long as it is no longer referenced after the use is complete. However, once the memory pool mechanism is introduced, the object's life cycle will be managed by the memory pool, which is usually a global reference and will not reclaim this portion of memory if the JVM is not explicitly freed.
For Netty users, the user's skill level varies greatly, some users who do not know about the JVM memory model and memory leak mechanism may only remember to apply for memory and forget to actively release memory, especially Java programmers.
To prevent memory leaks due to user omissions, Netty automatically releases memory in the tail handler of pipe line, with the following code:
Figure 2-33 Tailhandler Memory reclamation operation
For a pool of memory, the real thing is to recycle the buffer back into the memory pool, with the following code:
Figure 2-34 POOLEDBYTEBUF Memory reclamation operation
2.5.2. Buffer memory Overflow protection
The reader of the protocol stack knows that when we decode the message, we need to create a buffer. There are usually two ways to create a buffer:
1) Capacity pre-allocation, in the actual reading and writing process if not enough to expand;
2) creates a buffer based on the protocol message length.
In a real-world business environment, if you encounter problems such as malformed stream attacks, Protocol message encoding anomalies, message drops, and so on, you may be able to resolve to an extra-long length field. I have encountered a similar problem, the message Length field value is more than 2G, because one branch of the code does not effectively protect the length limit, resulting in memory overflow. System restarts after a few seconds of memory overflow, fortunately in time to locate the root cause of the problem, almost lead to serious accidents.
Netty provides a codec framework, so it is important to protect the upper limit of the decoding buffer. Below, let's look at how the Netty is protected against buffer caps:
First, specify the maximum buffer length when allocating memory:
Figure 2-35 Buffer Allocator can specify the maximum length of the buffer
Second, when writing to the buffer, if the buffer capacity is not sufficient to expand, first to determine the maximum capacity, if the expanded capacity exceeds the upper limit, then the extension is denied:
Figure 2-35 Buffer Extension Cap protection
Finally, at the time of decoding, the message length is judged, if the maximum capacity limit is exceeded, then the decoding exception is thrown and the memory is denied:
Figure 2-36 Half-packet decoding that exceeds the capacity limit, failed
Figure 2-37 Throwing an toolongframeexception exception
2.6. Traffic shaping
Most commercial systems have a number of network elements or components, such as participation in SMS interaction, will involve mobile phones, base stations, SMS Center, SMS gateways, SP/CP and other network elements. Different network elements or parts of the processing performance is different. In order to prevent the downstream network element from being crushed due to the low performance of the surge service or downstream network element, it is sometimes necessary for the system to provide traffic shaping function.
Let's take a look at the definition of traffic shaping (traffic shaping): Traffic shaping (traffic shaping) is a measure of actively adjusting the rate of traffic output. A typical application is to control the output of local traffic based on the TP indicator of downstream network nodes. The main difference between traffic shaping and traffic policing is that traffic shaping caches the messages that need to be discarded in traffic policing-usually by putting them in buffers or queues, also known as traffic shaping (traffic Shaping, or TS, for short). When the token bucket has enough tokens, these cached messages are sent out evenly. Another difference between traffic shaping and traffic policing is that shaping can increase latency, while regulation rarely introduces additional delays.
The principle of traffic shaping is as follows:
Figure 2-38 Flow shaping schematic diagram
As a high-performance NIO framework, Netty's traffic shaping has two functions:
1) to prevent the downstream network element is crushed due to the uneven performance of the upper and lower network elements, the business process is interrupted;
2) to prevent the communication module received the message too fast, the backend business thread processing is not timely caused by the "dead" problem.
Below we will specifically learn the traffic shaping function under Netty.
2.6.1. Global traffic Shaping
The scope of global traffic shaping is process-level, no matter how many channel you create, its scope is for all channel.
Parameters can be set by the user: the receiving rate of the message, the sending rate of the message, and the shaping period. The relevant interfaces are as follows:
Figure 2-39 Global Traffic shaping parameter settings
The principle of Netty traffic shaping is to compute the number of Bytebuf writable bytes read each time, get the current message traffic, and then compare it with the traffic shaping threshold. If the threshold has been reached or exceeded. The wait time delay is calculated, and the current bytebuf is placed in the task cache of the scheduled tasks, and the bytebuf is processed by the timed task thread pool after the delayed delay. The relevant code is as follows:
Figure 2-40 Dynamic calculation of current traffic
If the shaping threshold is reached, the newly received bytebuf is cached, placed in the thread pool's message queue, processed later, with the following code:
Figure 2-41 Caching the current bytebuf
The delay time of the scheduled task is calculated based on the detection period T and the traffic shaping threshold, the code is as follows:
Figure 2-42 Calculating the cache wait period
It should be pointed out that the greater the threshold limit of traffic shaping, the higher the accuracy of traffic shaping, the traffic shaping function is a guarantee of reliability, it can't be 100% accurate. This is related to the back-end codec and the buffer processing strategy, which is not mentioned here. Interested friends can think about, netty why not do 100% accurate.
The biggest difference between traffic shaping and flow control is that flow control rejects messages, traffic shaping does not reject and discard messages, no matter how much it receives, it always sends messages at approximately constant speed, similar to the principle and function of transformers.
2.6.2. Single link traffic shaping
In addition to global traffic shaping, Netty also supports traffic shaping for links, and the associated interfaces are defined as follows:
Figure 2-43 Single-Link traffic shaping
The biggest difference between single-link traffic shaping and global traffic shaping is that it is scoped to a single link and can have different shaping strategies for different links.
Its implementation principle is similar to global traffic shaping, we will not repeat it. It is worth noting that Netty supports user-defined traffic shaping policies, and can customize the shaping strategy by inheriting the Abstracttrafficshapinghandler doaccounting method. The relevant interfaces are defined as follows:
Figure 2-44 Customizing traffic shaping strategies
3. Summary
Although Netty has done a lot of fine-grained design on architecture reliability, and defensive programming has made the system a lot of reliability protection. However, the reliability of the system is a continuous investment and improvement process, it is not possible to work in one release, reliability is a long way to go.
From the business point of view, different industries, application scenarios for the reliability of the requirements are also different, such as the telecommunications industry, the reliability requirements are 5 9, for the railway and other special industries, the reliability requirements are higher, to reach 6 9. For some of the enterprise's edge IT systems, reliability requirements are lower.
Reliability is an investment, for enterprises, the pursuit of extreme reliability of research and development costs is a heavy burden, but on the contrary, if the system is not important to the reliability, once the unfortunate encounter online accident, the loss is often amazing.
For architects and designers, it is a big challenge to weigh the reliability of the architecture with the other features. By studying and learning the reliability design of Netty, we may bring some enlightenment to you.
4. Netty Study Recommended Books
There are many articles on the market that introduce Netty, if readers want systematic study Netty, recommend two books:
1) "Netty in Action"
2) "Netty authoritative guide"
5. Introduction of the author
Li Linfeng, who graduated from Tohoku University in 2007 and entered Huawei in 2008 for the design and development of high-performance communications software, has 6 years of experience in NIO design and development and is proficient in Netty, Mina and other NIO frameworks. Netty, founder of the Chinese community, author of the Netty authoritative guide.
Original link: http://www.infoq.com/cn/articles/netty-reliability
Netty Reliability Analysis of Netty series