The Mobile IM Development Guide Series article will cover all aspects of an IM app, including technology selection, landing optimization, and more. In addition, the author will combine his experience in the development of iOS IM SDK in netease Cloud For many years, in-depth analysis of various common problems in real development.
Recommended Reading
Mobile IM Development Guide 1: How to choose a technology
Mobile IM Development Guide 3: How to optimize login modules
What is the heartbeat command?
In an IM service design that uses TCP long connections, the heartbeat is often involved. Heartbeat generally refers to a certain end (the majority of cases is the client) at a certain time to the peer to send a custom command to determine whether the two sides survive, because it is sent at a certain interval, similar to the heartbeat, it is called the heartbeat command.
Why do I need to make a heartbeat at the application level?
So why do I need to make a heartbeat at the application layer, is TCP not a reliable connection? Can we not rely on TCP to do wire break detection? For example, use TCP's KeepAlive mechanism to implement. is the application-level heartbeat A current best practice? What kind of heartbeat is best practice?
Have never thought about these problems before, just a simple heartbeat ah!
For clients, the maximum driving force for a business using a TCP long connection is that, with the current connection available, each request is simply data sent and accepted, with no DNS resolution, connection establishment, and so on, which greatly speeds up the request and facilitates the reception of real-time messages from the server.
But only if the connection is available. If the connection does not hold well, each request becomes luck: good luck, sending a request over a long connection and receiving feedback. Bad luck, the current connection has expired, the request has not received feedback until the timeout, but also need a connection to establish the process, its efficiency is not even worse than HTTP. The prerequisite for connection retention must be to detect the availability of the connection and proactively discard the current connection and establish a new connection if the connection is not available.
Based on this premise, there must be a mechanism for detecting connection availability. At the same time, the particularity of the mobile network also requires the client to send certain signaling during free time to avoid the connection being recycled. See " and operator rip B".
For the server, it is also important to be aware of the connectivity availability in a timely manner: on the one hand, the server needs to clean up the invalid connection to reduce the load, on the other hand, as well as the business needs, such as the game copy of the server needs to deal with the player drop line problems.
It says the importance of staying connected, so now go back to the concrete implementation. Why do we need to use the application-level heartbeat to do the testing instead of using the TCP feature directly?
We know that TCP is a connection-based protocol, its connection state is maintained by a state machine, after the connection is completed, both sides will be in the established state, after which the state will not be actively changed. This means that if the upper layer does not make any calls and keeps the TCP connection idle, then this connection, although without any data, remains connected, one day, a week, or even one months, even during which the intermediate route crashes and restarts countless times. For a reality often encountered chestnuts: when we ssh to their own VPS, and then accidentally kicked off the network cable, at this time the net changes will not be detected by TCP, when we re-plug back into the network cable, still can use SSH, and there is no TCP reconnect at this time.
Some people will say that TCP is not a KeepAlive mechanism, through this mechanism to achieve it can be? But in fact, the TCP KeepAlive mechanism does not really apply to this. After the Keep Alive mechanism is turned on, the TCP layer will send the appropriate KeepAlive probe after the timing time to determine the connection availability. The general time is 7200 s, failed to retry 10 times, each time out of the time of the S. Is it obvious that the default value does not meet our needs, and that the settings are modified to meet it? The answer is still negative. Because the TCP KeepAlive is used to detect the death and death of the connection, the heartbeat mechanism comes with an additional function: Detecting the surviving state of the communicating parties. Both may sound like a meaning, but they are actually quite different. Consider a situation in which a server is overloaded for some reason, CPU 100%, unable to respond to any business request, but the use of a TCP probe is still able to determine the state of the connection, the typical connection is alive but the business provider is dead, for the client, The best option at this point is to reconnect to the other server after the disconnection, instead of always thinking that the current server is available and has been sending some requests that will inevitably fail to the current server.
As we can see from the above, KeepAlive is not suitable for detecting scenarios where both sides survive, and this scenario relies on the heartbeat of the application layer. The application-level heartbeat has greater flexibility to control the timing, spacing, and process of detection, and even provides additional information on the heartbeat pack. From this perspective, the heartbeat of the application layer is indeed a best practice.
How do I implement a heartbeat command?
From the above we can conclude that at present, the application layer heartbeat is really the best practice to detect the validity of the connection, whether the two sides survive, then the remaining problem is how to achieve.
The simplest and most brutal course of action is a timed heartbeat, such as a heartbeat every 30 seconds, and no heartbeat back within 15 seconds. The current connection has been invalidated, disconnected, and re-connected. This approach is most straightforward and easy to implement. The only problem is to compare power consumption and flow. 5 bytes in a protocol packet, 2,880 heartbeat packets a day, one months is 5 * 2 * 2880 * = 0.8 M of traffic, if the phone more than a few IM software, each moonlight heartbeat on a few megabytes of traffic, not to mention the frequent heartbeat caused by the loss of electricity.
Since frequent heartbeat will bring the disadvantage of power consumption and consumption, the improvement direction is to reduce the heartbeat frequency, but also can not affect the real-time of connection detection too much. Based on this requirement, the heartbeat interval can generally be adjusted according to the program state, when the program in the background (here mainly consider Android), as long as possible to lengthen the heartbeat interval, 5 minutes, or even 10 minutes can be. When the App is in the foreground, it operates according to the original rules. Connection reliability can also be relaxed, to avoid a heartbeat timeout when the connection is invalid, using error accumulation, only after the heartbeat timeout n times to determine that the current connection is not available. And, of course, some little trick. For example, the heartbeat packet cycle time is timed rather than fixed by the last instruction packet received, which can also reduce the number of heartbeats to a certain extent.
Above is the understanding and practice of the heart rate command of netease cloud letter , "mobile IM Development Guide" The third article will introduce you how to optimize the login module, please look forward to.
Mobile IM Development Guide 2: Heart rate Instructions