C + + server Design (iv): time-out management mechanism design

Source: Internet
Author: User
Tags unique id

The first four chapters introduce the design of the system layer, starting from this chapter into the design of the service layer.

Connection Disconnect

In a common server scenario, the client disconnects in a passive shutdown manner. That is, after the client has requested a service for the server, choose to actively shut down the connection to the server. On the server's point of view, the server is the client connection socket read system call returns 0, triggering the shutdown logic, the server is locally passive shutdown connected.

However, in some scenarios, the client has actually disconnected from the server, but the server is not in time to detect a situation where the connection maintained at this time has been disconnected. In this case, the server does not actively release resources related to the connection because of the passive shutdown. These resources that cannot be freed include file descriptors, system memory, and other system scarce resources. If the server system has a large number of similar zombie connections are not processed in a timely manner, will lead to system resource exhaustion, can seriously affect the performance of the server, and even cause the server to crash.

In general, determine if a connection is broken, just call write and return 0, or drop the line. But with long connections, there is a good chance that there will be no data exchange for a long time. Theoretically, this long connection is always connected, but in practice it is difficult to know if there are some faults in the intermediate nodes. Even some firewalls automatically disconnect a connection that has no data interaction for a certain amount of time.

In the TCP mechanism, there is an so_keepalive option in itself. This option allows you to set a 2-hour heartbeat limit. But it can not check the machine power, cable pull out, firewall caused by the disconnection and other such situations. This option also makes it difficult to handle wire breaks at the logical level.

For all the reasons, we need to devise a mechanism to help the server check if the connection is disconnected, proactively shut down the suspected disconnected customer, and release the resources.

Introduction to the heartbeat mechanism

Common software is usually through the heartbeat mechanism to solve the problem of checking the connection. Typically, a client must send a packet to the server every other time, notifying the server that it is still online and transmitting some data that may be necessary. While the server maintains the connection timings for each connection, the timing of the connection is refreshed whenever a valid message is sent by each connection. If the server does not receive the timing of a connection for a period of time, the server determines that the connection is inactivated and performs a forced shutdown and resource release work on the connection.

How the server maintains connection timings for each connection is typically two ways:

    • Each connection saves the last time the data is received, and the time is refreshed whenever the connection receives data. Then use a system-level timer to traverse through all the connections every second to determine the last time the data was received for each connection. Once the time expires, the connection is determined to time out.
    • L set a timer for each connection, set the time-out period, and update the timer whenever the data is received. Once a connected timer is in effect, it is determined that the connection timed out.

In scenario one, although there is only one timer overhead, it is necessary to traverse all the connections each time, and if the number of connections is large, each traversal will have a significant overhead. Threads that perform time-out detection simultaneously may not be the same thread as the connection-managed threads, and when the time-out detection thread reads the last data time that the connection was received, it may be modifying it in another thread. Therefore, we also need to protect the time of the last data received by a read-write lock mechanism.

In scenario two, a timer will be created for each connection, and the time-out of the timer needs to be modified every time the data is received at each connection. This puts the timing management of each connection into the same thread as the management of the connection, avoiding the locking overhead in a multithreaded environment. However, if the number of connections is large, and there will be frequent updating of the timer operation, the timing mechanism of the reactor may cause greater pressure.

In both of these scenarios, the implementation is not difficult, but are more rough, it is possible with the number of connections and updates to the server performance impact, it is necessary to optimize on the basis of these two scenarios.

Timeout Queue Design

In the previous scenario one, the maximum overhead is to traverse the receive time of all connections each time a timeout connection is checked, with an O (n) time complexity. We can optimize it on this basis by some strategy to let the check timeout operation find out all the timeout connections in the shortest possible time.

By analyzing the time-out period, the system sets the time-out to be the same for all connections. In the case where no new data is received, the connection that received the data last time must have timed out before the connection that received the data. This is similar to a FIFO queue. So we tried to manage the timeout connection with a queue.

Figure 4-1 Timeout queue

As shown in 4-1, we created a first-in, first-out queue to manage timeout connections. To ensure the thread safety of the queue, each time the queue is added and deleted, it needs to be locked. In each queue node, we save the last received message time for the corresponding connection. And we do not consider the scenario where the connection and the receipt of a new message result in receiving a message time update.

Whenever a new connection is established, we try to obtain a timeout queue lock, then create a new queue node and save the connection information and the current time in that node. Finally, we add the node corresponding to this new connection to the head of the timeout queue. At this point, the timeout queue after the node has a connection between 1 to 5 nodes, and each node corresponding to the connection receive time increments, that is, the connection 1 receive time is earlier than the connection 2, the connection 2 receive time is earlier than the connection 3, and so on.

The system establishes a timer that checks the timeout queue for connections that have expired in the time interval. First check the end of the timeout queue node, if the node corresponding to the connection of the received time to determine the display does not time out, the entire time-out queue of other nodes corresponding to the connection also did not time out, the timing event directly ended processing. If the corresponding connection to the tail node of the queue has timed out, the connection information is logged and the node is removed from the tail of the queue. It then determines whether the new queue-tail node has timed out, such as a timeout, which is also logged and removed until the new queue-tail node does not time out. Finally, the system will send a timeout notification to all connections that have timed out this time, and eventually forcibly disconnect these timeout connections.

Because connection 1 is received earlier than connection 2, if connection 1 does not time out, Connection 2 is obviously timed out, but if connection 1 has timed out at this time, it is not possible to determine if the subsequent connection timed out, so it is necessary to detect the connection 2 again until a node that has not timed out is detected from the timeout queue, and neither the node nor the

At this point our time-out queue has been able to meet the time-free update. We then add it to the receive time after receiving the new message for the update work. Still the scene, when Connection 3 received a new heartbeat message, we need to reset the timing of the connection. The corresponding operation is actually very simple, get the lock of the timeout queue, locate the node in the timeout queue, move the node back to the head of the queue, and update the last received data time saved in the node. At this point, the connection to the node becomes the most recent connection in all connections in the time-out queue management, and the connection will eventually time out if the current connection is not changed.

There is still a problem with how the connection determines the node in its corresponding timeout queue. If we do not establish some kind of mapping information, we look for a connection node from the timeout queue to traverse all the nodes in the entire time-out queue, and get the connection corresponding node is the update time-out must be the operation. If each connection needs to traverse the time-out queue for each update timeout, and the time-out queue maintains thousands of connections, this can have a significant performance overhead for the entire system.

We can use the std::list of the STL library as the underlying implementation of the time-out queue, and the list data structure guarantees that each node has the same memory space throughout its life cycle. So we can save a pointer to the corresponding node in the timeout queue in each connection object, and when we need to navigate to the time-out node of the connection, we simply take a reference to that pointer, eliminating the overhead of traversing the list.

But there is also a certain danger in this way. Because the pointer refers to the node as the internal data of the timeout queue, we expose the pointer to the connection object, which is equivalent to exposing the data inside the timeout queue, which can have an impact on the security of the entire queue. For example, all accesses to the timeout queue in the original multithreaded environment need to add a lock to ensure thread safety, but it is now possible to bypass the thread lock access or even modify the data in the timeout queue.

In the system implementation, we design a data structure called LINKEDHASHMAP, which maintains the timeout queue, and establishes a key-value pair mapping mechanism to ensure that the corresponding queue node can be obtained by a key.

The Linkedhashmap we designed is similar to HashMap, which also provides key-value pair mapping, but it retains the order in which key-value pairs are inserted, which means that it satisfies both the key-value-to-data-first-in-first-out requirements and the ability to quickly find value data based on the key data map.

The interior of the LINKEDHASHMAP consists of an STL std::list and a std::unordered_map. Whenever a new key value pair is passed in, a node is created in the list header and the key value data is saved. A mapping of the key to the corresponding node iterator in the queue is also created in Unordered_map. So we use list to implement the queue sorted according to the order of insertion, and by key mapping we can quickly find the corresponding node in the queue iterator, to get the data in the queue node.

Figure 4-2 Linkedhashmap Implementation of the timeout queue

In the implementation of the system, we assign a unique ID to each connection object and establish a mapping of the ID to the connection object, through which we can easily get the corresponding connection object. In the specific implementation of time-out management, we use the connection ID instead of the actual connection object.

As shown in 4-2, we use the ID as the LINKEDHASHMAP key, the ID and the last received message time of the corresponding connection as the value, constructs the above linkedhashmap structure. We added data from connection 1 to connection 5 in turn, updated the last received message time for connection 3, and connection 1 was removed due to a timeout.

When we add a new connection, we simply get the ID of the connection and the last received message time, and insert Linkedhashmap, LINKEDHASHMAP will insert the new node into the head of the queue and establish the mapping of the ID to the node. When we need to update the last received message time for a connection, we can simply get the node by ID and modify the last message time and reconnect the node to the queue header. When the node is detected by the system time-out, we can learn the specific timeout of the connection object based on the ID saved in the node, and remove the related data of that ID from list and unordered_map. The above for linkedhashmap insert, UPDATE and delete operations can be broadly guaranteed in the constant time to complete.

Through the above design, we can effectively achieve the time-out management of the connection. At the same time, the whole time-out mechanism only maintains the connection with the actual connected object through an ID value, which ensures the low coupling between modules.

C + + server Design (iv): time-out management mechanism design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.