Netty Graceful exit mechanism and principle

Source: Internet
Author: User

1. Graceful exit of the process 1.1.kill-9 PID problems

It is easy and efficient to force a process to be killed by Kill-9 PID on Linux, so many program stop scripts often choose how to use the kill-9 PID.

Whether the Kill-9 PID of Linux or the taskkill/f/pid of Windows forces a process to exit, there are some side effects: the effect on the application is equal to the sudden power-down, which can lead to some of the following problems:

    1. The data in the cache has not been persisted to the disk, resulting in data loss;
    2. The write operation of the file is in progress, no update completed, suddenly exited, resulting in file corruption;
    3. The message queue in the thread has received a request message that has not yet been processed, causing the request message to be lost;
    4. The database operation has been completed, such as the account balance update, ready to return the reply message to the client, the message is still queued in the communication thread to wait for the send queue, the process of forced exit caused the reply message is not returned to the client, the client initiated a timeout retry, will bring a duplicate update problem;
    5. Other problems, etc...
1.2.JAVA Graceful Exit

The graceful outage of Java is usually realized by registering the shutdownhook of the JDK, when the system receives the exit instruction, it first marks the system in the exit state, no longer receives the new message, then finishes processing the backlog of messages, finally calls the resource recycling interface to destroy the resource, and the last thread exits execution.

2. How to Achieve NettyThe Graceful exit

To achieve graceful exit of Netty , you first need to understand how graceful exit of the generic Java process is implemented. Let us first explain the implementation of the graceful exit principle, and the actual code to explain. Finally see how to achieve the graceful exit of Netty .

2.0.1. Introduction to Signals

The signal is a simulation of the interrupt mechanism at the software level, in principle, a process receives a signal and the processor receives an interrupt request can be said to be the same, it is a mechanism of asynchronous communication between processes. Taking the Linux kill command as an example, the Kill-s SIGKILL pid (ie, kill-9 pid) kills the specified PID immediately, and SIGKILL is the signal sent to the PID process.

The signal has platform affinity, and some of the termination process signals supported by the Linux platform are as follows:

Signal Name

Use

SIGKILL

Terminate process, Force kill process

SIGTERM

termination process, software termination signal

Sigtstp

Stop the process, the terminal comes to the stop signal

Sigprof

Terminate process, statistical distribution graph with timer to time

SIGUSR1

Terminate process, user-defined signal 1

SIGUSR2

Terminate process, user-defined signal 2

SIGINT

Terminating a process, interrupting a process

Sigquit

Build the core file to terminate the process and generate the core file

There are some differences in the Windows platform, and some of its signals are as follows: SIGINT (CTRL + C interrupt), Sigill, SIGTERM (software termination with Kill), Sigbreak (Ctrl+break interrupt).

Signal selection: In order to not interfere with the normal signal operation, but also to simulate the Java asynchronous notification, on Linux we need to select a special signal. By looking at the description on the signal list, we found that SIGUSR1 and SIGUSR2 are the signals that allow the user to customize, we can choose SIGUSR2, for testing convenience, we can choose SIGINT on Windows.

2.0.2. Graceful exit of Java program

First look at the flowchart of the graceful exit of the generic Java process:

The first step is to initialize the signal instance when the application process starts, and its code example is as follows:

Signal sig = new Signal (Getossignaltype ());

Where the argument to the signal constructor is a string string, which is the semaphore name described in section 2.1.1.

The second step, according to the operating system name to obtain the corresponding signal name, the code is as follows:

Private String Getossignaltype ()   {       return system.getproperties (). GetProperty ("Os.name"). toLowerCase (). StartsWith ("Win")? "INT": "USR2";    }

Determine whether the Windows operating system, if so select SIGINT, receive the command CTRL + C interrupt, otherwise select the USR2 signal, receive SIGUSR2 (equivalent to kill-12 pid) instructions.

The third step is to register the instantiated Signalhandler with the JDK signal, once the Java process receives kill-12 or CTRL + C, the callback handle interface, the code example is as follows:

Signal.handle (SIG, Shutdownhandler);

Where Shutdownhandler implements the handle (Signal Sgin) method of the Signalhandler interface, the code example is as follows:

Fourth, in the handle interface receiving the signal callback, initialize the Shutdownhook thread of the JDK and register it with runtime, the sample code is as follows:

private void Invokeshutdownhook () {Thread t = new Thread (new Shutdownhook (), "Shutdownhook-thread"); Runtime.getruntime (). Addshutdownhook (t); }

The fifth step, after receiving the process exit signal, executes the exit operation of the virtual machine in the callback's handle interface, the sample code is as follows:

Runtime.getruntime (). exit (0);

When the virtual machine exits, the underlying automatically detects whether the user has registered the Shutdownhook task, and if so, automatically pulls up the Shutdownhook thread, executes its run method, and the user only needs to perform the resource release operation in Shutdownhook, the sample code is as follows:

Class Shutdownhook implements runnable{@Overridepublic void Run () {System.out.println ("Shutdownhook Execute start ...") ; System.out.print ("Netty nioeventloopgroup shutdowngracefully ..."); try {TimeUnit.SECONDS.sleep (10);// Simulates the processing operation before the application process exits} catch (Interruptedexception e) {e.printstacktrace ();} System.out.println ("Shutdownhook Execute End ..."); System.out.println ("Sytem shutdown over, the cost time is 10000MS");}}

Below we test the generic Java graceful exit program in the Windows environment, open the CMD console, and pull up the program to be tested as follows:

To start a process:

View the thread information and discover that the registered Shutdownhook thread did not start, as expected:

Execute CTRL + C at the console to exit the process, as shown in the following example:

As shown, the Shutdownhook thread that we define is executed when the JVM exits, and as a test program, it sleeps after 10S exits, and the information about the console printing is as follows:

Below we summarize the technical essentials of graceful Exit For general purpose Java programs:

2.0.3. NettyThe Graceful exit

In real projects,Netty as a high-performance asynchronous NIO communication framework, often used as the basic communication framework for the access, resolution and scheduling of various protocols, such as in RPC and distributed service frameworks, often using Netty The underlying communication framework as an internal private protocol.

When the application process gracefully exits, the Netty as a communication framework also needs to be gracefully exited, mainly for the following reasons:

    1. Release NiO threads, handles and other resources as soon as possible;
    2. If you use flush to do bulk message sending, you need to send the backlog of messages that have been accumulated in the send queue to complete;
    3. The message that is being write or read needs to be processed;
    4. Set the scheduled task in the Nioeventloop thread scheduler, which needs to be performed or cleaned up.

Let's look at the main operations and resource objects involved in Netty graceful exit:

Netty 's graceful exit summed up three big steps:

    1. The state bit of the NIO thread is set to the St_shutting_down state, and no new messages are processed (the message is not allowed to be sent externally);
    2. Pre-processing operations before exiting: Sending a message that has not yet been sent in the sending queue or is being sent, completing a timed task that has expired or expires before the exit time-out, and executing the exit hook task that registers the user to the NIO thread;
    3. The release of the resource: the release of all channel, the de-registration and shutdown of the multiplexer, the emptying of all queues and scheduled tasks, and finally the exit of the NIO thread.

Below we look specifically at how to achieve the graceful exit of Netty :

Netty gracefully exits the interface and the total entry in Eventloopgroup, calling its Shutdowngracefully method, the relevant code is as follows:

Bossgroup.shutdowngracefully (); Workergroup.shutdowngracefully ();

In addition to the Shutdowngracefully method without a parameter, you can specify the timeout and period for the exit, and the associated interface is defined as follows:

Eventloopgroup's shutdowngracefully works in detail in the next chapter, combined with the graceful exit mechanism of Java, can realize the graceful exit of Netty , the related pseudo-code is as follows:

Unified definition of JVM Exit events and JVM exit events as a topic for intra-process publishing//all consumers needing graceful exit to subscribe to the JVM Exit event topic//Monitor JVM exit Shutdownhook after launch, release JVM Exit Event// The consumer supervisor hears the JVM exit event and begins to perform its own graceful exit//If all non-daemon threads have successfully completed graceful exit, the process actively exits//if the timeout to exit is still not exited normally, then by the shutdown script through the kill-9 PID strong kill process, forced to quit

To summarize: After the shutdownhook of the JVM is triggered, call the Shutdowngracefully method of all Eventloopgroup instances to gracefully exit. Since Netty has a better support for graceful exit, it is relatively simple to implement.

2.0.4. Some myths

In practice, due to the principle of graceful exit and the release of resources is not very clear, or the interface of the Netty is not well understood, it is easy to elegant exit and resource release confusion, resulting in a variety of problems.

The following case: The intention is to close a channel, but call the channel associated with the EventLoop shutdowngracefully, Causes all the channel to be closed on the eventloop thread and the multiplexer registered on that thread, as shown in the error code:

Ctx.channel (). EventLoop (). shutdowngracefully ();

The correct approach is as follows: Call the channel's Close method, close the link, and release the resources associated with the channel:

Ctx.channel (). Close ();

Unless the whole process gracefully exits, the shutdowngracefully method of Eventloopgroup and EventLoop is not normally called, and more is the closure of link channel and the release of resources.

3. NettyElegant Exit principle Analysis

Netty Graceful exit involves thread groups, threads, links, timed tasks, and so on, the underlying implementation details are very complex, the following layers of decomposition, through the source code to analyze its implementation principle.

3.1. Nioeventloopgroup

Nioeventloopgroup is actually a nioeventloop thread group, its graceful exit is relatively simple, the direct traversal EventLoop array, loop calls their shutdowngracefully method, the source code is as follows:

3.2. Nioeventloop

The shutdowngracefully method of calling Nioeventloop, first of all, is to modify the state of the thread to be closed , its implementation in the parent class Singlethreadeventexecutor, their inheritance relationship is as follows:

Singlethreadeventexecutor shutdowngracefully code is relatively simple, is to modify the state bit of the thread, it is necessary to note that the changes need to make a judgment on concurrent calls, if it is called by Nioeventloop itself, do not need to lock, Otherwise you need to lock the code as follows:

Explain why to lock, because shutdowngracefully is the public method, any can get to Nioeventloop code can call it, in Netty , The business code usually does not need to get nioeventloop directly and manipulate it, but Netty does a thick encapsulation of nioeventloop, which not only reads and writes messages, but also performs timed tasks and executes user-defined task as a thread pool. Thus, the method of acquiring Nioeventloop in the channel is open, which means that as long as the user can get to the channel, there is a possibility of concurrent execution of shutdowngracefully in theory, so concurrency protection is done when graceful exits.

After the status modification is completed, the remaining operations are performed primarily in Nioeventloop, with the following code:

We continue to see the implementation of CloseAll, it is the principle of the registration on the selector All the channel is closed , but some channel is sending messages, temporarily can not shut down, need to be executed later, the core code is as follows:

To loop through the Close method of channel unsafe, we jump to unsafe and analyze the Close method.

3.3. Abstractunsafe

The Close method of Abstractunsafe mainly does the following several things:

1. Determines whether the current link has a message being sent, and if so, encapsulates the shutdown action into a task and then executes it later in the EventLoop:

2. Empty the Send queue and no longer allow new messages to be sent:

3. Call the Close method of Socketchannel to close the link:

4. Call pipeline's firechannelinactive to trigger the link shutdown notification event:

5. Finally, call deregister to cancel the Selectionkey from the multiplexer:

Now that the graceful exit process has been completed, does this mean that the nioeventloop thread can exit, in fact it is not.

Here, just do the channel closure and from the selector to register, summarized as follows:

    1. By InFlush0 to determine whether a message is currently being sent, if so, the channel close action is not performed and the close () operation is performed later in the task queue of the NIO thread;
    2. Because a new send message has not been allowed to join, once the send operation completes, the link is closed , the link shutdown event is triggered, and the registration operation is unregistered from the selector.

As has been said before, Nioeventloop in addition to I/O read and write, but also the timing of task execution, shutdown shutdownhook execution, and so on, if there is a scheduled task expires, even if Chanel is closed , However, you still need to continue execution and the thread cannot exit. Below we specifically analyze the taskqueue process.

3.4. Taskqueue

After Nioeventloop executes the closeall () operation, it needs to call Confirmshutdown to see if it can actually exit, and its processing logic is as follows:

1. Execute the queued task in taskqueue with the following code:

2. Execute the Shutdownhook that is registered to Nioeventloop in the following code:

3. Determines whether the specified timeout is reached for graceful exit, and exits immediately if the time-out is reached or exceeded, the code is as follows:

4. If the specified timeout is not reached, do not exit for the time being, and if there are new tasks to be added to each 100MS detection, the following will continue:

In the Confirmshutdown method, there are some treatments for the obsolete shutdown () methods, such as:

Call the new Shutdowngracefully series method, the judging condition is never set up, so for the obsolete shutdown related processing logic, no longer detailed analysis.

So far, the Confirmshutdown method is complete, Confirmshutdown returns True, the Nioeventloop thread formally exits,Netty Graceful Exit completes, the code is as follows:

3.5. Question and Answer 3.5.1. Runalltasks Repeat execution issues

In the Nioeventloop run method, the Runalltasks method has been called, why is there a continuation call Runalltasks method in Confirmshutdown, the question code is as follows:

There are two main reasons:

1. To prevent the execution of task tasks or user-defined threads from executing too much of the nioeventloop thread's scheduling resources,Netty restricts nioeventloop thread I/O operations and non-I/O operation time by restricting non-i/ o The execution time of the operation, as shown in the Code in the Red box. There is an execution time limit, which can result in a timed task that has expired, a normal task that is not finished, and the need to wait for the next selector poll to continue. Before the thread exits, a task that is supposed to execute but does not complete is finished, so the Runalltasks method is called again in Confirmshutdown;

2. After calling the Runalltasks method, the user adds a new normal or scheduled task to Nioeventloop before executing confirmshutdown, so it needs to traverse and process the task Queue again before exiting.

3.5.2. Graceful exit ensures that all messages queued in the communication thread are sent out

The actual is not guaranteed, it can only guarantee that if the message is being sent in the process, call the graceful exit method, the link will not be closed , continue to send, if the sending operation is completed, whether or not the message has not been sent out, the next round of selector polling, the link would Off , messages that are not sent to completion will be discarded, even half-packet messages. Its processing schematic diagram is as follows:

Its principle is relatively complex, the main logic processing is now interpreted:

    1. Call graceful exit, whether to close the link, to determine whether the standard is InFlush0 true, if False, will perform a link shutdown operation;
    2. If the user is similar to a bulk send, such as every n or timed to trigger the flush operation, then the graceful exit method is called during this period, InFlush0 is false, the link is closed , the backlog of pending messages will be discarded;
    3. If the link is in the process of sending the message gracefully when it exits, it does not exit immediately, waiting for the next selector poll to exit after the send is completed. In this scenario, there are two possible scenarios:

Scenario A: If the backlog of messages are sent all at once, there is no write half of the packet, there will be no message loss;

Scenario B: If the message is not sent at one time, Netty will monitor the dictation event, triggering the next poll of selector and sending a message with the following code:

Selector polling, the read and write events are processed first, and then the scheduled and normal tasks are processed, so there is the last chance to continue sending before the link closes , with the following code:

If it is very unfortunate, again sent still did not send the backlog of messages all finished, once again the write half of the package, whether or not there is a backlog of messages, the execution of Abstractunsafe.close task will be the link is closed , because as long as the message sent to complete the operation ,Netty will put InFlush0 to false, the code is as follows:

After the link is closed , all messages that have not been sent are discarded.

Some readers may have questions about whether the InFlush0 will be modified to true if the flush operation is called before the Abstractunsafe.close is executed after the second send. This is not possible because the threading model has changed since Netty 4.X, and the flush operation is not executed by the user thread, but by the nioeventloop thread corresponding to the channel. So there is no case of inFlush0 modification between the two.

Netty The threading model after 4.X is as follows:

In addition, message discards can occur if the backlog message is not completed within the timeout period because the graceful exit has a time-out.

For these scenarios, the application layer is required to ensure related reliability, or to optimize the graceful exit mechanism of Netty .

Netty Graceful exit mechanism and principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.