Netty Heartbeat service Idlestatehandler Source code Analysis

Source: Internet
Author: User

Introduction: Netty provided by the heart rate

Netty as a network framework, providing many functions, such as the codec we said before, Netty prepared a lot of ready-made codecs, at the same time, Netty also prepared for us in the network, a very important service-----heartbeat mechanism. It is an essential feature in the RPC framework to check whether the other side is valid through the heartbeat.

The Netty provides Idlestatehandler, Readtimeouthandler,writetimeouthandler to detect the validity of the connection. Of course, you can also write a task yourself. But we are not going to use custom tasks today, but instead use Netty internal.

Say the following three handler effects.

Serial Number name function
1 Idlestatehandler When the idle time of the connection (read or write) is too long, a idlestateevent event is triggered. You can then handle the event by overriding the Usereventtrigged method in your channelinboundhandler.
2 Readtimeouthandler If no Read event occurs in the specified event, the exception is thrown and the connection is automatically closed. You can handle this exception in the Exceptioncaught method.
3 Writetimeouthandler When a write operation cannot be completed within a certain amount of time, this exception is thrown and the connection is closed. You can also handle this exception in the Exceptioncaught method.

Attention:
Among them, the description of Writetimeouthandler, the famous "Netty combat" and his English original description is outdated, the original description:

Throws a writetimeoutexception if no outbound data is written to it within the specified time interval.

When the book was published, Netty's documentation was true, but on December 28, 2015, a classmate modified the logic to see the git log below:

Seems to be a Chinese sister .... And now the document description is:

Raises a {@link writetimeoutexception} When a write operation cannot finish in a certain period of time.
When a write operation cannot be completed within a certain amount of time, a writetimeoutexception is generated.

ReadTimeout events and WriteTimeout events will automatically close the connection, and, it is exception handling, so here is just the following, we focus on Idlestatehandler.

1. What is Idlestatehandler
    • Look back at Idlestatehandler:

When the idle time of the connection (read or write) is too long, a idlestateevent event is triggered. You can then handle the event by overriding the Usereventtrigged method in your channelinboundhandler.

    • How to use it?

Idlestatehandler is both an outbound processor and an inbound processor, inheriting the Channelduplexhandler. Idlestatehandler is typically added to pipeline in the Initchannel method. The Usereventtriggered method is then rewritten in its own handler, and when an idle event (read or write) occurs, the method is triggered and the specific event is passed in.
At this point, you can try to write the data to the target Socekt through the Context object and set up a listener that closes the socket if the send fails (Netty prepares a ChannelFutureListener.CLOSE_ON_FAILURE listener for closing the socket logic).
In this way, a simple heartbeat service is implemented.

2. Source Code Analysis
    • 1. Construction method, this class has 3 constructs the method, mainly to the 4 property assignment value:
privatefinalboolean observeOutput;// 是否考虑出站时较慢的情况。默认值是false(不考虑)。privatefinallong// 读事件空闲时间,0 则禁用事件privatefinallong writerIdleTimeNanos;// 写事件空闲时间,0 则禁用事件privatefinallong//读或写空闲时间,0 则禁用事件
    • 2. Handleradded method

When the handler is added to the pipeline, the Initialize method is called:

Private void Initialize(Channelhandlercontext CTX) {Switch(state) { Case 1: Case 2:return; } state =1;initoutputchanged(CTX); Lastreadtime = LastWriteTime =Ticksinnanos();if(Readeridletimenanos >0) {//The schedule method here calls EventLoop's Schedule method to add a timed task to the queueReaderidletimeout =Schedule(CTX,New Readeridletimeouttask(CTX), Readeridletimenanos, Timeunit.nanoseconds); }if(Writeridletimenanos >0) {writeridletimeout =Schedule(CTX,New Writeridletimeouttask(CTX), Writeridletimenanos, Timeunit.nanoseconds); }if(Allidletimenanos >0) {allidletimeout =Schedule(CTX,New Allidletimeouttask(CTX), Allidletimenanos, Timeunit.nanoseconds); }}

Whenever a given parameter is greater than 0, a timed task is created and each event is created. At the same time, the state status is set to 1 to prevent duplication of initialization. Call the Initoutputchanged method to initialize the "Monitor outbound data Properties" code as follows:

privatevoidinitOutputChanged(ChannelHandlerContext ctx) {    if (observeOutput) {        Channel channel = ctx.channel();        Unsafe unsafe = channel.unsafe();        ChannelOutboundBuffer buf = unsafe.outboundBuffer();        // 记录了出站缓冲区相关的数据,buf 对象的 hash 码,和 buf 的剩余缓冲字节数        ifnull) {            lastMessageHashCode = System.identityHashCode(buf.current());            lastPendingWriteBytes = buf.totalPendingWriteBytes();        }    }}

First of all, say this observeoutput "Monitoring outbound data Properties" role. Because someone on GitHub mentioned issue, issue address, it would have been without this parameter. Why do you need it?

Suppose: When your client application receives 30 seconds of data each time, and your write idle time is 25 seconds, then when your data is not written out, write idle time is triggered. In fact, it's not logical. Because your app isn't free at all.

How to solve it?

Netty's solution is to record information about the last output message and use a value of firstxxxxidleevent to indicate if it is active again, each read-write activity will update the corresponding first value to True, if False, Indicates that no read or write events have occurred during this time. At the same time, if the first record of the outbound data and the second obtained outbound related data is different, then the data is slow outbound, there is no need to trigger idle events.

In general, this field is used to deal with "the client receives data extremely slow, slow to more than idle time" extreme situation. Therefore, the Netty default is to close this field.

    • 3.3 Timed task classes within this class

Such as:

These 3 timed tasks correspond to read, write, read, or write events, respectively. There is a parent class. This parent class provides a template method:

When the channel is closed, the task is not performed. Conversely, the run method of the subclass is executed.

1. The Run method of the Read event

The code is as follows:

protected void Run(Channelhandlercontext CTX) {LongNextdelay = Readeridletimenanos;if(!reading) {Nextdelay-=Ticksinnanos()-Lastreadtime; }if(Nextdelay <=0) {//Reader is idle-set a new timeout and notify the callback.        //used to cancel the task PromiseReaderidletimeout =Schedule(CTX, This, Readeridletimenanos, Timeunit.nanoseconds);Booleanfirst = Firstreaderidleevent; Firstreaderidleevent =false;Try{//Submit the task againIdlestateevent event =newidlestateevent(Idlestate.Reader_idle, first);//Trigger user handler use            Channelidle(CTX, event); }Catch(Throwable t) {ctx.Fireexceptioncaught(t); }    }Else{//Read occurred before the Timeout-set a new timeout with shorter delay.Readeridletimeout =Schedule(CTX, This, Nextdelay, Timeunit.nanoseconds); }}

The method is simple:

    1. Gets the user-set time-out.
    2. If the read operation is finished (the Channelreadcomplete method setting is executed), the current time is subtracted from the time of the given time and the last read operation (the Channelreadcomplete method setting is executed), and the event is triggered if it is less than 0. Conversely, continue into the queue. The interval time is the new calculation time.
    3. The trigger logic is: first put the task again in the queue, the time is just the beginning of the set time, return a Promise object, used to cancel the operation. Then, set the first property to false, which means that the next read is no longer the second time, and this property is changed to true in the Channelread method.
    4. Creates a write event object of type Idlestateevent that passes this object to the user's Usereventtriggered method. Completes the action that triggered the event.

In general, each read operation will record a time, the scheduled task time, will calculate the current time and the last read time interval, if the interval exceeds the set time, the Usereventtriggered method is triggered. It's that simple.

Then look at the Write event task.

2. Write the event's Run method

The logic of writing a task is basically the same as the logic of reading a task, and the only difference is that there is a judgment on the slow data of the outbound.

if (hasOutputChanged(ctx, first)) {     return;}

If this method returns True, the trigger event action is not executed, even if the time is up. Look at the implementation of this method:

Private Boolean hasoutputchanged(Channelhandlercontext CTX,BooleanFirst) {if(Observeoutput) {Update this value if the last write time is not the same as the time of the previous record, indicating that the write operation has been made        if(Lastchangechecktimestamp! = lastwritetime) {lastchangechecktimestamp = LastWriteTime;//But if, in this method, the call gap is modified, the event is still not triggered            if(!first) {//#firstWriterIdleEvent or #firstAllIdleEvent                return true; }} Channel Channel = CTX.Channel(); unsafe unsafe = Channel.unsafe(); Channeloutboundbuffer buf = unsafe.Outboundbuffer();//If there is data in the outbound area        if(Buf! =NULL) {//Get the object of the outbound buffer hashcode            intMessagehashcode = System.Identityhashcode(BUF. Current());//Get all bytes of this buffer            LongPendingwritebytes = buf.totalpendingwritebytes();//If it is not equal to the previous, or the number of bytes is different, the output is changed, and the "last buffer reference" and "number of bytes remaining" are refreshed            if(Messagehashcode! = Lastmessagehashcode | | pendingwritebytes! = lastpendingwritebytes)                {lastmessagehashcode = Messagehashcode; Lastpendingwritebytes = pendingwritebytes;//If the write operation is not done, the task is slow to write and does not trigger an idle event                if(!first) {return true; }            }        }    }return false;}

Write some notes, or just comb it out:

    1. If the user has not set the need to observe the outbound situation. Returns false to continue the execution of the event.
    2. Conversely, continue downward, if the last write time and the time of the previous record is not the same, the description of the write has just done, then update this value, but still need to determine the first value, if this value is false, indicating that the write event is completed in two method call gap Or the first time you access this method, the event is still not triggered.
    3. If the above conditions are not met, the buffer object is removed, and if there is no object in the buffer, the idle event is triggered if there is no slow occurrence of the write. Conversely, the hashcode and remaining bytes of the current buffer object are recorded, and then compared to the previous one, if either is unequal, the data changes, or the data is slowly written out. Then update these two values and leave the next judgment.
    4. Continue to judge first, if it is Fasle, this is the second call, there is no need to trigger idle events.

The entire logic is as follows:

Here's a question, why do I have to trigger events the first time? Assuming that the client began to become very slow, this time, the timer task listener found the time to enter here to judge, when the last recorded buffer related data has been different, this time it is triggered event?

In fact, here is a consideration for Netty: Assuming that a very slow problem is actually occurring, it is likely to cause OOM, which is much more serious than the call to connect idle. Why do we have to trigger events for the first time? If not triggered, the user does not know what to send, when a write idle event triggered, followed by OOM, the user can perceive: may be written too slow, the data behind the end is not written in, so an OOM occurred. So, a warning here is still necessary.

Of course, this is one of my guesses. If necessary, you can go to Netty to mention a issue.

Well, the slow special handling of the client writes is over. Then look at the logic of the other task.

3. Run method for all events

This class is called allidletimeouttask, which means that it monitors all events. When a read-write event occurs, it is logged. The code logic is basically consistent with the write event, except here:

long nextDelay = allIdleTimeNanos;if (!reading) {   // 当前时间减去 最后一次写或读 的时间 ,若大于0,说明超时了   ticksInNanos() - Math.max(lastReadTime, lastWriteTime);}

The time calculation here is to take the maximum value from the read-write event. Then, like writing an event, you determine whether a slow write occurs. Finally call the ctx.fireusereventtriggered (evt) method.

Usually this is the most used. The construction method is generally:

pipeline.addLast(newIdleStateHandler(0030, TimeUnit.SECONDS));

Read and write are 0 to disable, 30 for 30 seconds without a task read-write event occurs, the event is triggered. Note that when you are not 0, these three tasks overlap.

Summarize

Idlestatehandler can implement the heartbeat function, which triggers the usereventtriggered method of the user handler when the server and client do not have any read-write interaction and exceed the given time. Users can try to send messages to each other in this method, or close the connection if the send fails.

The implementation of Idlestatehandler is based on the EventLoop timing task, each read and write will record a value, when the scheduled task runs, by calculating the current time and set the time and the time of the last event, to determine whether idle.

Internally there are 3 scheduled tasks, corresponding to read events, write events, read and write events. It is usually sufficient for the user to listen to read and write events.

At the same time, some extreme situations are considered within the Idlestatehandler: 客户端接收缓慢,一次接收数据的速度超过了设置的空闲时间 . Netty determines whether the outbound buffer is judged by the Observeoutput property in the constructor method.

If the outbound is slow, Netty does not think this is idle and does not trigger idle events. But the first time is to be triggered anyway. Because the first time can not judge whether the outbound slow or idle. Of course, if you are slow, OOM is bigger than the idle question.

So, when your application has a memory overflow, oom and so on, and write idle rarely occurs (using Observeoutput as True), then you need to be aware that the data outbound speed is too slow.

The default observeoutput is false, meaning that even if your app is outbound slowly, Netty thinks it is write idle.

It can be seen that the role of this observeoutput is not so important, if it really happened out of the station slowly, judge whether idle is not important, it is important to OOM. So Netty chose the default false.

There is also a note: The first thing we said Readtimeouthandler, is inherited from the Idlestatehandler, when the read idle event triggered, the Ctx.fireexceptioncaught method is triggered, and passed in a Readtimeoutexception, and then close the Socket.

The Writetimeouthandler implementation is not based on Idlestatehandler, and his principle is that when the Write method is called, a timed task is created, and the task content is determined whether the write time is exceeded based on the completion of the incoming promise. When a timed task starts running at a specified time, it is found that the IsDone method of promise returns false, indicating that it has not finished writing and that the description timed out, throwing an exception. When the Write method finishes, the scheduled task is interrupted.

Okay, here's the class about Netty's own heartbeat. These features are critical for developing stable, high-performance RPC.

Good luck!!!

Netty Heartbeat service Idlestatehandler Source code Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.