In serious network programs, the heartbeat protocol at the application layer is essential. Heartbeat messages should be used to determine whether the processes of the other party can work normally. "Kill idle connections" is only a matter of time. Here I would like to explain the usage of shared_ptr and weak_ptr.
If a connection does not receive data for several seconds (for example, 8 seconds later), disconnect it. There are two simple and crude methods:
Each connection saves "lastReceiveTime of last received data", and then uses a timer to traverse all connections every second to disconnect connections (now-connection. lastReceiveTime)> 8 s. In this way, there is only one repeated timer globally, but all connections must be checked every timeout. If the number of connections is large (tens of thousands), this step may be time-consuming.
Set one-shot timer for each connection. The timeout value is 8 s. When the connection times out, the connection is disconnected. Of course, timer must be updated every time the data is received. This method requires many one-shot timers, which will be updated frequently. If the number of connections is large, the reactor's timer queue may be under pressure.
Using timing wheel can avoid the disadvantages of the above two methods. Timing wheel can be translated as "time wheel" or "dial.
The connection timeout does not require exact timing. It only takes about 8 seconds to time out and disconnect. Less than one second does not matter. Processing connection timeout can use a simple data structure: a cyclic queue consisting of eight buckets. The first bucket puts down the connection that Will time out for one second, and the second put down the connection that Will time out for two seconds. Each connection puts itself into 8th buckets as soon as it receives data, and then disconnects the connection in the first bucket in the callback every second, and moves the empty bucket to the end of the team. In this way, the connection can be closed after eight seconds without data. More importantly, you do not need to check all the connections each time. As long as you check the connections in the first bucket, the task is dispersed.
Timing wheel Principle
Hashed and hierarchical timing wheels: efficient data structures for implementing a timer facility, A new hierarchical structure, such as timing and hash timing, is proposed. In view of the features of the problem to be solved in this article, we don't need to implement a general timer, just implement simple timing wheel.
The basic structure of Simple timing wheel is a cyclic queue, and a pointer (tail) pointing to the end of the team. This pointer moves one cell every second, just like the hour hand on the clock. timing wheel is named accordingly.
The following figure shows the timing wheel state at a certain time. The number in the grid is the countdown (opposite to the usual timing wheel), indicating the remaining life of the connection in the grid (bucket.
One second later, the tail pointer moved one cell. The grid at four o'clock was cleared, and the connection was disconnected.
Disconnection timeout
Assume that at a certain time point, conn 1 reaches and put it in the current grid. Its Residual Life is 7 seconds. After that, no data is received on conn 1.
After 1 second, tail points to the next grid, and the remaining life of conn 1 is 6 seconds.
After a few seconds, tail points to the grid before conn 1, And conn 1 is about to be disconnected.
Next second, tail points to the original grid of conn 1, clears the data, and disconnects conn 1.
Connection refresh
If you receive the data before disconnecting conn 1, move it to the current grid.
After receiving the data, the service life of conn 1 is prolonged to 7 seconds.
Time continues and conn 1 has a longer life cycle than the first one.
Multiple connections
Each grid in timing wheel is a hash set that can accommodate more than one connection.
For example, conn 1 arrives at the beginning.
Then, conn 2 arrives, and tail has not moved yet. The two connections are located in the same grid and have the same residual life. (It is drawn as a linked list, and the Code contains a hash table .)
After a few seconds, conn 1 receives the data, while conn 2 never receives the data, so conn 1 is moved to the current grid. The service life of conn 1 is longer than that of conn 2.
Code implementation and improvement
We use the EchoServer that has appeared many times to illustrate how to implement timing wheel. Code see http://code.google.com/p/muduo/source/browse/trunk/examples/idleconnection
In specific implementation, the lattice is not a connection, but a special Entry struct. Each Entry contains the weak_ptr of TcpConnection. The Entry destructor checks whether the connection exists (weak_ptr is used). If the connection exists, the connection is closed.
Data structure:
Typedef boost: weak_ptr <muduo: net: TcpConnection> WeakTcpConnectionPtr;
Struct Entry: public muduo: copyable
{
Entry (const WeakTcpConnectionPtr & weakConn)
: WeakConn _ (weakConn)
{
}
~ Entry ()
{
Muduo: net: TcpConnectionPtr conn = weakConn _. lock ();
If (conn)
{
Conn-> shutdown ();
}
}
WeakTcpConnectionPtr weakConn _;
};
Typedef boost: shared_ptr <Entry> EntryPtr;
Typedef boost: weak_ptr <Entry> WeakEntryPtr;
Typedef boost: unordered_set <EntryPtr> Bucket;
Typedef boost: circular_buffer <Bucket> WeakConnectionList;
In implementation, for the sake of simplicity, we will not really move a connection from one grid to another, but use the reference counting method to manage the Entry using shared_ptr. If the data is received from the connection, put the corresponding EntryPtr in this grid, so that its reference count increases. When the reference count of the Entry is reduced to zero, it indicates that it does not appear in any lattice, the connection times out and the Entry destructor is disconnected.
Timing wheel is implemented using boost: circular_buffer. Each Bucket element is a hash set of EntryPtr.
In the constructor, register the callback every second (EventLoop: runEvery () to register EchoServer: onTimer (), and set timing wheel to the appropriate size.
EchoServer: EchoServer (EventLoop * loop,
Const InetAddress & listenAddr,
Int idleSeconds)
: Loop _ (loop ),
Server _ (loop, listenAddr, "EchoServer "),
ConnectionBuckets _ (idleSeconds)
{
Server _. setConnectionCallback (
Boost: bind (& EchoServer: onConnection, this, _ 1 ));
Server _. setMessageCallback (
Boost: bind (& EchoServer: onMessage, this, _ 1, _ 2, _ 3 ));
Loop-> runEvery (1.0, boost: bind (& EchoServer: onTimer, this ));
ConnectionBuckets _. resize (idleSeconds );
}
The implementation of EchoServer: onTimer () has only one line: add an empty Bucket to the end of the team, so that circular_buffer will automatically pop up the first Bucket of the team and parse it. During Bucket parsing, The EntryPtr objects are parsed in sequence, so that we don't have to worry about the Entry reference count. The value semantics of C ++ will help us deal with everything.
Void EchoServer: onTimer ()
{
ConnectionBuckets _. push_back (Bucket ());
}
When the connection is established, create an Entry object and put it at the end of timing wheel. In addition, we also need to save the weak reference of the Entry to the context of TcpConnection, because the Entry is also used when receiving data. (THINKING question: If TcpConnection: setContext stores a strongly referenced EntryPtr, what will happen ?)
Void EchoServer: onConnection (const TcpConnectionPtr & conn)
{
LOG_INFO <"EchoServer-" <conn-> peerAddress (). toHostPort () <"->"
<Conn-> localAddress (). toHostPort () <"is