How to deal with the timeout problem of the port model (IOCP.
Author: que rongwen 2011/7/12
Preface
The completed port (IOCP) is the most complex and the best type of performance in all Windows I/O models. One of the difficulties in IOCP programming is timeout control.
The following uses an HTTP server program as an example.
In fact, timeout control is not very difficult. The problem is that the Windows IOCP model itself does not provide support for timeout (will it be available in later versions ?), So everything must be done by programmers. in addition, timeout control is required for server programs: the HTTP server program must post a WSARecv () operation on the completed port to receive client requests for a new client connection, if the client connection does not send data (so-called malicious connection), the POST request will never be returned from the completion port queue, occupying server resources. if a large number of malicious connections exist, the service will soon be overwhelmed. therefore, the server program must set a time-out period for the request to be shipped to the completion port.
So how to implement timeout control?
There are two ideas:
1. Create a separate thread and round all I/O Request queues at intervals. If a timeout occurs, cancel the I/O shipping request.
Advantages:
Simple: a thread, a loop.
Disadvantages:
Precision and efficiency are difficult to achieve. for example, if the set timeout value is 60 seconds and all sockets are polling once every 60 seconds, a timeout value (60-1 + 60) seconds may occur before it is detected; if the polling frequency is increased, the performance will be affected: during polling, the socket queue must be locked. therefore, setting an appropriate polling interval is a dilemma. in addition, some programs use the min heap minimum heap algorithm to optimize the round robin, which can further improve the efficiency.
2. Set a timer for each I/O shipping request.
Advantages:
High precision, Windows timer can guarantee the accuracy of about 15 milliseconds.
Disadvantages:
Resource consumption is high. Obviously, if a large number of connections exist, the same number of timers are required. fortunately, for applications that require a large number of timers, Windows provides Timer Queue. Compared with the Timer object created by SetTimer (), CreateTimerQueueTimer () is used to create optimized lightweight objects, the system also optimizes the Timer Queue, such as using the thread in the thread pool to execute the timeout callback function. it is not clear how many TimerQueueTimer can be created by a process at most. I have not found any description on MSDN, which may become the bottleneck of the maximum number of connections supported by the Service. (on my own machine (Win7
Home Basic + VS2010) tested. The machine running the code in Appendix 3 for the first time almost lost the response, but there was no error. A few conditional breakpoints are added for the second time. By the time of 30 thousand Timer, the timeout function is executed and the machine response is very fast. therefore, there should be no limit on the number of TimerQueueTimer or a large number. I don't have any authoritative information .)
Both methods are acceptable. The specific implementation depends on the program requirements.
When designing the Que's HTTP Server, I used Timer Queue. As needed, I allocated two TimerQueueTimer for each socket, one sets the Session Timeout (that is, the maximum number of connections that a socket can maintain with the server), and the other sets the dead connection timeout. If a connection is within the specified time, if no data is sent or no data is received, it is determined that it is a dead connection and disabled. The server calls ChangeTimerQueueTimer () to reset the timer each time the data is received or sent successfully. it is a pity that the conditions are limited and have not been tested in a high-pressure environment. only a few days (up to 200 connections, 80 Mb/s bandwidth, several hundred calls per second ChangeTimerQueueTimer () reset timer, the timeout error is about 8 to 15 milliseconds, which is acceptable .)
Notes for HTTP Server programming
1. if an IO request is being processed, make sure that the LPWSAOVERLAPPED pointer of the transmitter is valid. this must be ensured unconditionally during program design, otherwise it will crash. as for how to ensure this, it is a programmer, not an IOCP problem. the structure to which LPWSAOVERLAPPED is to be released can only be performed after I/O operations are returned from the completion of the port queue. that is, only after GetQueuedCompletionStatus () is returned. if the same WSAOVERLAPPED structure is used in multiple I/O requests, you can set a reference count. Each time you return a count minus one from GetQueuedCompletionStatus, it can be released at zero time (it is best to avoid this design ).
2. How do I cancel a shipping I/O request?
The answer is that it cannot be canceled. of course, disabling the handle of the completed port can cancel all I/O requests, but this only applies when the program exits. however, for the HTTP server, disabling the socket can mark all I/O requests related to the socket as failures and return results from GetQueuedCompletionStatus () (the returned values are not necessarily set to FALSE, see the following section )). in this way, as long as the corresponding socket is closed in the timeout callback function and no resources are released, the completion port service thread is returned from GetQueuedCompletionStatus, after all the I/O Requests corresponding to this socket are cleared from the completed port queue, resources can be recycled (mainly transmitted
The LPWSAOVERLAPPED pointer can be deleted with confidence ).
Correction: CancelIoEx (hSocket, NULL) can be used to cancel all pending I/O operations on a socket. of course, as mentioned above, directly disabling the socket handle will also lead to failure of all pending I/O operations, achieving the same effect as canceling.
3. GetQueuedCompletionStatus () function return value (refer to MSDN). The prototype is as follows:
Bool winapi GetQueuedCompletionStatus (
_ In HANDLE CompletionPort,
_ Out LPDWORD lpNumberOfBytes,
_ Out PULONG_PTR lpCompletionKey,
_ Out LPOVERLAPPED * lpOverlapped,
_ In DWORD dwMilliseconds
);
(1) If the I/O operation (WSASend ()/WSARecv () is completed successfully, the return value is TRUE, and lpNumberOfBytes is the number of transmitted bytes. note that the number of transmitted bytes may be smaller than the number of bytes sent/received by your request.
(2) If the other party closes the socket, there are two cases:
(A) I/O operations have completed some of them. For example, if the WSASend () request is sent in 1 kb and 512 bytes have been sent, the return value is TRUE, lpNumberOfBytes points to 512, which is valid for lpOverlapped.
(B) If the I/O operation is not completed, the returned value is FALSE. The value pointed to by lpNumberOfBytes is 0, and the lpCompletionKey and lpOverlapped values are valid.
(3) If our program closes the socket, there is no difference in the case of (2.
(4) If other errors occur, the returned value is FALSE, and lpCompletionKey, lpOverlapped = NULL. In this case, call GetLastError () to view the error information and exit the waiting state.
GetQueuedCompletionStatus () loop.
4. each time you call network functions (such as WSARecv () and WSASend (), you must check the returned values and process them accordingly. network events are complex and may occur in any situation. The program will be robust only when the returned values of each function are detected.
Postscript
I searched and read many related articles on the Internet while learning IOCP. I picked two articles and added them to my post. Thanks to the original author.
Appendix: 1. http://club.itqun.net/showtopic-82514.html
WinEggDrop, a netizen in the post, said very clearly on the 36th floor and agreed
---------------------------------------------
The last post of this discussion mainly describes several methods I have mentioned. on the top floor, it is mainly a timeout detection mechanism. Many server programs need this mechanism, because too many idle connections still use a certain amount of system resources. Some servers, for FTP servers, the maximum number of login connections is also limited. In case of malicious connections, these connections are not regularly disconnected by the system, normal users may not be able to log on to the FTP server (because the maximum number of connections has been reached)
1. Use setsockopt to set SO_RCVTIMEO
This method is simple and easy to use, but its disadvantage is that it is only used for blocked sockets, and sometimes cannot be detected because of the abnormal disconnection of the other party.
2. Before receiving data, use select () and select () to return readable data before calling APIs such as recv.
This method is as simple and easy to use, but its disadvantage is mainly applicable to blocking socket. Generally, it is also available for non-blocking socket, but it is a waste of resources to call an endless loop to continuously detect the return value of select.
3. regularly scan all customer socket methods (the method the landlord is using ). this method records the time of each socket data communication, and then compares it with the current time during scanning. If the time difference is higher than the time limit of the timeout mechanism,
Disconnect the socket.
This method is also very easy to use, as long as you create a thread to regularly scan all customer socket lists. strong applicability. All socket modes are compatible. it should be noted that this method should be well critical, otherwise it will be quite prone to problems (if no critical issue occurs during list scanning when resources are released when the socket is normally disconnected during scanning, therefore, it is very likely that the memory is accessed during scanning ). the disadvantage of this method is that the error of timeout mechanism is relatively high, because if the time of timeout detection is set to N, it is possible to have the error of N-1 seconds. the longer the detection time is set, the longer the error time appears. because the list of all customer sockets needs to be scanned each time, if there are too many sockets, setting this detection time is a "chicken fault ". the detection time is too short, and frequent scanning will inevitably affect system resources and program performance. However, the setting time is too long and the error time is too large.
4. Use the system's Timer
Standard Timer: Use SetTimer () to set Timer and KillTimer () to delete Timer. it is applicable to all systems and all socket models. the disadvantage is that the Timer trigger time is delayed if too many messages are to be processed. NT System Kernel Timer. the advantage is high accuracy, but the disadvantage is that it can only be used in NT systems.
I have tried all the above methods in the previously written server program. In the end, I chose the NT System Kernel Timer method. I don't know if this method is the most efficient, but I prefer this method. I think it is a more efficient method (in fact, it is not efficient and I cannot test it ).
---------------------------------------------
Appendix 2. http://blog.sina.com.cn/s/blog_62b4e3ff0100nu84.html
Study Notes: mysterious IOCP completion port
(15:53:36)
Reprinted
Tags:
It
[What is IOCP]
Is a kernel object in WINDOWS. With this object, the application can receive completion notifications of asynchronous IO.
Here are several roles:
Role 1: asynchronous IO requester thread. Simply put, it is a thread that calls the WSAxxx () function (such as the WSARecv and WSASend function.
Because it is "Asynchronous", when the thread of role 1 sees the result of the WSAxxx () function, it does not know whether the current IO is actually completed.
Note: When WSAxxx returns true successfully, the data has been read or sent (synchronous IO results are obtained ).
To unify the logic, we still need to put it in the role 2 thread to process IO results in a unified manner.
Role 2: asynchronous IO completes the event processing thread. Simply put, it is the thread that calls the GetQueuedCompletionStatus function.
Role 1 ships an asynchronous IO request M, and role 2 threads can certainly get M processing results (nothing more than IO success or failure)
Role 3: operating system. Communicates with role 1 and role 2. The OS receives all asynchronous IO requests from role 1.
The OS processes many asynchronous IO requests queued (actual I/O reads and writes. OS programmers are very good at using the CPU and network to the maximum extent.
OS puts all IO results into {IOCP Completion queue C.
The OS can schedule the running and sleep of role 2 threads, and control the number of threads simultaneously running in role 2.
Role 2 uses the GetQueuedCompletionStatus function to read IO requests completed in {IOCP Completion queue C.
[How many roles and 2 threads need to be created]
The CreateIoCompletionPort () function creates a complete port with the NumberOfConcurrentThreads parameter.
The meaning of this parameter is: the number of roles 2 threads that the programmer expects to run simultaneously. 0 indicates the number of CPUs of the current machine by default.
Programmers can create any number of role 2 threads.
For example, NumberOfConcurrentThreads is set to 2, while 6 roles, 2 threads, 100 threads, or 0 threads are actually created.
How can we understand the differences between the two numbers?
OS strives to maintain the concurrent running of NumberOfConcurrentThreads threads, even if I create 100 role 2 threads.
If {IO result items} in the {IOCP completed queue C} Wait in queue for processing are few, and the role 2 thread can finish processing quickly, there may actually be only one role and two threads working, all other threads are sleeping (even if NumberOfConcurrentThreads is set to 100, only one thread is working ).
If there are many {IO result items} waiting in the queue in {IOCP completed queue C}, it takes a lot of CPU time to process the role 2 thread, in fact, many role 2 threads may be awakened. Of course, the premise is that I have actually created many role 2 threads. In extreme cases, if the role 2 thread exits, {IOCP Completion queue C} may be congested.
In general, why is NumberOfConcurrentThreads set to 2, while actually creating 6 roles and 2 threads?
Considering that our role 2 thread is not just a CPU computing, it may also read and write log files, call Sleep, or access a Mutex object (causing thread scheduling to Sleep ). In this way, the OS will enable some "Reserve Army" role 2 threads to process {IOCP completed queue C }. Therefore, the actual creation of six roles and two threads may be the standby army threads. If our role 2 thread is pure CPU-Intensive Computing (there may be a small number of critical zone access, and we will not give up CPU control easily ), then we only need to create the role 2 thread count = the number of cpus. It is not helpful to create more threads (but it is also good. Maybe the OS keeps them sleeping and serves as a reserve army ).
[How to control the number of bytes in asynchronous read/write]
Or, when the network is normal, {Number of actually sent bytes} (T) is {number of bytes to be sent} (R ). I tried it, from 1 Mbit/s buff to 2 Mbit/s buff... when it was opened to a very large buff, T <R finally appeared.
If our application needs to send a large amount of data at a time, check whether T is less than R. When the number of bytes sent is insufficient, the remaining (unsent) parts should be sent.
How many bytes should WSARecv receive data? If the application layer protocol stipulates that our data length is not fixed, this is a very tricky problem. In general, the application layer protocol stipulates that a piece of data is logically a group of data, divided into the header part and the package part. The header is fixed length, and the package body is variable-length. The header contains the following information: the length of the packet body in bytes. First, we collect a fixed-length packet header, parse the "packet length information", and then we send a WSARecv packet body again. I call this method "packet header two-phase receiving method ".
[How to control the timeout of asynchronous read/write]
Suppose we accept a packet and send out WSARecv {asynchronous IO: X }. This {asynchronous IO: X} may fail to obtain the result for a long time. If the client of the other party maliciously does not send any data. IOCP itself does not provide any timeout control. This timeout can only be controlled by our programmers. After a WSARecv call is sent, we maintain a certain {Data Structure: D} and remember the time at this time. In the future, our program will check this {Data Structure: D} to determine whether the WSARecv call has any results. Of course, the status change of {Data Structure: D} is the responsibility of {role 2 thread.
If {role 2 thread} obtains the {asynchronous IO: X} result through the GetQueuedCompletionStatus call, the status of {Data Structure: D} is changed. If the status of {Data Structure: D} is not changed, the {asynchronous IO: X} is not completed (the client has not sent any data ).
Control timeout is often associated with the number of control bytes. If a malicious client sends only some bytes, We need to handle this situation.
If the Protocol requires 100 bytes and 10 requests are sent from the client at a time, we can safely remove the client. This strategy is a bit harsh. We need a gentle strategy. The remaining 90 bytes may soon arrive due to network issues. We can continue to accept the remaining 90 bytes at the specified time. If it times out, the client will be killed.
[IOCP system resource depletion problem]
If we have 10000 client socket connections, we need to deliver 10000 WSARecv in advance to receive the data they sent.
If each asynchronous read request requires a buffer of 10 KB from the application layer programmer, the user buffer required in total is 10000*10 k = 97 MB of memory. Windows requires that the 97M data be "locked" by the OS, which means that a large amount of OS resources are needed. Therefore, the program may consume resources because 10000 customers connect simultaneously. The WSAENOBUF error is related to this issue.
The solution is to deliver the WSARecv with 0 bytes. The pseudocode is as follows:
WSABUF DataBuf;
DataBuf. len = 0;
DataBuf. buf = 0;
WSARecv (socket, & DataBuf, 1 ,...);
When data arrives, This asynchronous IO will get the result from the role 2 thread. Since it is 0-byte read, it does not touch any data that comes with any socket buffer. At a very small cost (about 10 k for each connection), we can know which client's data has arrived. Don't underestimate that each connection saves such resources, and the total number of connections saved is considerable. If the number of clients is small, this technique is meaningless.
[Killing role 2 threads elegantly]
The PostQueuedCompletionStatus function pushes a record to {IOCP Completion queue C. In this way, the role 2 thread can obtain this "hypocritical or simulated" Asynchronous IO completion event. Why do I need to impersonate a {IOCP complete queue C} entry? Is it useful for programmers to think about it by themselves (meaning it is more useful ). In general, we use it to "gracefully kill role 2 threads ". The pseudocode is as follows:
Typedef struct
{
OVERLAPPED Overlapped;
OP_CODE op_type;
...
} PER_IO_DATA;
PER_IO_DATA * PerIOData =...
PerIOData-> op_type = OP_KILL; // The operation type is to kill the thread.
PostQueuedCompletionStatus (... PerIOData ...);
// If there are N roles and 2 threads, you need to call them N times so that {IOCP completes queue C} can have N such entries.
Role 2 thread:
PER_IO_DATA * PerIOData = 0;
GetQueuedCompletionStatus (... & PerIOData ...);
If (PerIOData-> op_type = OP_KILL) {return;} // The natural return from the thread is an elegant exit thread.
[Handle big data errors]
The error handling of the GetQueuedCompletionStatus function is complicated.
1 If GetQueuedCompletionStatus returns false:
1.1 If the Overlapped pointer is not null
Congratulations, the asynchronous IO you deliver gets the result, but it is the result of failure. Good luck finally came back with a letter.
This may be because the socket connection is disconnected.
1.1.1 If the error code returned by GetLastError is ERROR_OPERATION_ABORTED
Something must have called CancelIO (socket. All asynchronous IO requests related to this socket will be canceled.
1.1.2 If the error code obtained by GetLastError is something else
I/O may fail, for example, the socket connection is disconnected.
1.2 If the Overlapped pointer is null
This is not good news, because it means that the IOCP itself has a major fault. For example, we accidentally put the IOCP handle CloseHandle.
1.2.1 if the error code obtained by GetLastError is WAIT_TIMEOUT
It is possible that the timeout parameter dwMilliseconds set by GetQueuedCompletionStatus is not INFINITE. Let's continue to call GetQueuedCompletionStatus and wait again.
1.2.1 if the error code ERROR_ABANDONED_WAIT_0 obtained by GetLastError is
IOCP itself is finished, and the role 2 thread should find another home, or the Local Self-disconnected.
2 If GetQueuedCompletionStatus returns true:
Congratulations, asynchronous IO is successful.
Obtain detailed information using the lpNumberOfBytes, lpCompletionKey, and lpOverlapped parameters.
LpNumberOfBytes: the number of bytes actually transmitted. (It may be less than the number of bytes to be transmitted)
LpCompletionKey: this is the famous PerHandleData. You can know which socket is connected.
LpOverlapped: this is the famous PER_IO_DATA, associated with an asynchronous IO call,
For example, if you call WSASend (Overlapped parameter = 0x123), you can obtain lpOverlapped = 0x123 again.
Based on this pointer, we can know which WSASend () call result the IO result corresponds.
I thought the error was handled seamlessly until I had a test. I ship 100 WSARecv to a socke. After I deliberately shut down the client, all these asynchronous IO results were obtained in the GetQueuedCompletionStatus function of role 2. To my surprise, GetQueuedCompletionStatus returns TRUE !!!, And the returned value of GetLastError () is 0 !!!
I am glad that the value of lpNumberOfBytes is 0 (otherwise it will be a ghost ). So it's too early to be happy to see that GetQueuedCompletionStatus returns true.
2.1 interpret the lpOverlapped pointer as the PER_IO_DATA data structure. If PerIOData-> op_type = OP_KILL, this may be an IO completion event forged by PostQueuedCompletionStatus.
2.2 determine whether (lpNumberOfBytes = 0 ). If the IO result is indeed the result of a WSAxxx (), rather than the counterfeit PostQueuedCompletionStatus, the socket corresponding to the IO may be disconnected.
2.3 (lpNumberOfBytes> 0), which is the real event of IO completion. There may be a 99.9% chance that the branch will run here.
[Ship multiple asynchronous IO messages to the same socket at a time]
One delivery of multiple WSASend (1234, & Buff1,...); WSASend (1234, & Buff2,...);... it seems no problem.
If multiple WSARecv (1234, & Buff1,...) are shipped at a time; WSARecv (1234, & Buff2,...); it seems that some problems need to be clarified.
First: Windows ensures that the data on the network is put into Buff1 and Buff2 in the order you ship the WSARecv.
If the incoming data on the network is AAAAUUUU, assume that Buff1 is 4 in length and Buff2 is 4 in length,
Then ensure that Buff1 gets AAAA and Buff2 gets uuu.
Second, if there are multiple roles and two threads, it may be due to the "race condition" of thread scheduling ",
A thread first executes Buff2 to complete the processing.
If I print the received data in the role 2 thread, the following result may be printed: UUUUAAAA. This is not a violation of the TCP protocol, but a problem of multithreading. In fact, the solution is very simple. The speaker has to worry about pseudocode.
Typedef struct
{
OVERLAPPED Overlapped;
...
Int Package_Number; // the sequence number of this call for each IO.
...
} PER_IO_DATA;
PER_IO_DATA * PerIOData1 =...
PerIOData1-> Package_Number = 1; // The first call
WSARecv (1234, & Buff1,... PerIOData1 ...);
PER_IO_DATA * PerIOData2 =...
PerIOData1-> Package_Number = 2; // The second call
WSARecv (1234, & Buff2,... periodata2 ...);
We need to maintain a certain data structure. Remember that we have issued two WSARecv.
After receiving the IO result, the program must judge that Buff1 and Buff2 can be spliced in order only after both calls obtain the result from the role 2 thread, that is, AAAAUUUU that conforms to the order. Of course, there are other better methods. Here we only show the basic principles.
Third: Do I have to ship multiple WSARecv requests to the same socket at a time?
This problem is not in conflict with [the problem of IOCP system resource depletion. We assume that when we deliver multiple WSARecv, we have foreseen a large amount of data coming to a socket on the network. According to the network information, this can make full use of the multi-CPU concurrent computing capability. I want one CPU to process Buff1 and the other CPU to process buff2.
If a small number of clients are connected, each connection may suddenly transfer a large amount of data, which may speed up copying data from the Socket buffer to the application Buff (I personally speculate ).
For a large number of client (10000) connections, each connection transmits a small amount of data, which I personally think is meaningless. I think there are only two CPUs, so it won't be easy to spare?
One important reason is that multiple buffers must be shipped to windows. Assume that I expected a socket to transmit 2 MB of data at a time
I don't have a buffer of 2 MB. I only have a buffer of 1 MB. I need to call WSARecv one time, wait until the 1 m data is collected, and then send another
WSARecv. Or I use another method to provide two 1 m buff windows systems.
Fourth: if we really need to ship Multiple Buckets at a time to receive data, is it necessary to call WSARecv multiple times?
Here is a possible alternative, with the pseudocode:
Char * raw1 = new char [BUFF_SIZE];
WSABUF [2] wsabuf;
Wsabuf [0]. buf = raw1;
Wsabuf [0]. len = BUFF_SIZE;
Char * raw2 = new char [BUFF_SIZE];
Wsabuf [1]. buf = raw2;
Wsabuf [1]. len = BUFF_SIZE;
WSARecv (1234, & wsabuf, 2 ...);
// The focus is on Parameter 2, indicating that the number of WSABUF struct is 2. In most IOCP examples, this parameter is set to 1.
I think this method is simpler. I don't know whether it is my own "2" or other people on the Internet "2". Multiple WSARecv are sent at a time, it is also troublesome to collect these scattered IO files. The scatter-gather IO of UNIX systems is similar to this mechanism.
-----------
Appendix 3
Long g_nCalled = 0; void callback TimerCallback (PVOID lpParameter, BOOLEAN TimerOrWaitFired) {InterlockedIncrement (& g_nCalled);} void CreateAsManyTimerAsPossbile () {// create as many TimerQueueTimerCString strMessage (_ T (""); HANDLE hTimerQueue = CreateTimerQueue (); if (NULL = hTimerQueue) {strMessage. format (_ T ("Unable to create timer queue, error code: % d. "), GetLastError ();} else {int nTimerCount = 0; while (1) {HAN DLE hTimer = NULL; if (! CreateTimerQueueTimer (& hTimer, hTimerQueue, TimerCallback, NULL, 100, 0, 0) {strMessage. format (_ T ("Failed to create timer queue timer, current timer count: % d, timer callback called: % d, error code: % d. "), nTimerCount, g_nCalled, GetLastError (); break;} if (++ nTimerCount> = 5000) {// ASSERT (0) ;}} DeleteTimerQueueEx (hTimerQueue, NULL);} AfxMessageBox (strMessage );}