Develop a Winsock application with a large response scale using the completed Port
Author: Anthony Jones & Amol Deshpande
Translation: Liu xiqi
Source: http://msdn.microsoft.com/msdnmag/issues/1000/Winsock/
It is not easy to develop network applications. However, you only need to master several key principles-create and connect a socket and try to connect, then send and receive data. What is really difficult is to write a network application that can accept less than one, and more than thousands of connections. This article will discuss how to use Winsock2 to develop highly scalable Winsock applications on Windows NT and Windows 2000. The focus of this article is on the server side of the client/server model. Of course, many of these points apply to both sides of the model.
API and response Scale
Through the Win32 overlapping I/O mechanism, applications can submit an I/O operation. Overlapping operation requests are completed in the background, while other tasks are requested by the Operation thread at the same time. After the overlap operation is completed, the thread receives the related notifications. This mechanism is particularly useful for time-consuming operations. However, functions such as wsaasyncselect () on Windows 3.1 and select () on UNIX are easy to use, but they cannot meet the needs of response scale. The complete port mechanism is optimized internally for the operating system. On Windows NT and Windows 2000, the overlapping I/O mechanism of the completed ports can be used to truly expand the system response scale.
Complete Port
A completion port is actually a notification queue, in which the operating system puts the notifications of overlapped I/O requests. Once an I/O operation is completed, a worker thread that can process the operation result will receive a notification. After a socket is created, it can be associated with a complete port at any time.
Generally, we create a certain number of worker threads in the application to process these notifications. The number of threads depends on the specific needs of the application. Ideally, the number of threads is equal to the number of processors, but this also requires that no thread should execute blocking operations such as synchronous read/write and wait for Event Notifications to avoid thread blocking. Each thread will be allocated a certain CPU time, during which the thread can run, and the other thread will be allocated a time slice and start execution. If a thread executes a blocking operation, the operating system will deprive it of unused time slices and allow other threads to start execution. That is to say, the previous thread did not fully use its time slice. In this case, the application should prepare other threads to make full use of these time slice.
You can use the port in two steps. First, create a complete port, as shown in the following code:
HANDLE hIocp;hIocp = CreateIoCompletionPort( INVALID_HANDLE_VALUE, NULL, (ULONG_PTR)0, 0);if (hIocp == NULL) { // Error}
After creating a port, associate the socket that uses the port. The method is to call the createiocompletionport () function again. The first parameter filehandle is set to the socket handle, and the second parameter existingcompletionport is set to the created port handle.
The following code creates a socket and associates it with the created Port:
SOCKET s;s = socket(AF_INET, SOCK_STREAM, 0);if (s == INVALID_SOCKET) { // Errorif (CreateIoCompletionPort((HANDLE)s, hIocp, (ULONG_PTR)0, 0) == NULL){// Error}...}
Then, the connection between the socket and the port is completed. Any overlapping operations on this socket will be sent through the completion port. Note that the third parameter in the createiocompletionport () function is used to set a "completion key" related to the socket ). Each time the completion notification arrives, the application can read the corresponding completion key. Therefore, the completion key can be used to pass background information to the socket.
After creating a complete port and associating one or more sockets with it, we need to create several threads to process the completion notification. These threads repeatedly call the getqueuedcompletionstatus () function and return a completion notification.
Next, let's take a look at how the application tracks these overlapping operations. When an application calls an overlapping operation function, the pointer pointing to an overlapped structure should be included in its parameter. After the operation is complete, we can retrieve the pointer through the getqueuedcompletionstatus () function. However, based on the overlapped structure pointed to by this pointer alone, the application cannot tell which operation is completed. To track operations, you can define an overlapped structure and add the required tracing information.
Whenever an overlapping operation function is called, an overlappedplus structure (such as wsasend and wsarecv) is always passed through its lpoverlapped parameter ). This allows you to set certain operation status information for each overlapping call operation. After the operation ends, you can use the getqueuedcompletionstatus () function to obtain your custom structure pointer. Note that the overlapped field is not necessarily the first field of the extended structure. After obtaining a pointer to the overlapped structure, you can use the containing_record macro to retrieve the pointer pointing to the extended structure.
The overlapped structure is defined as follows:
Typedef struct _ overlappedplus {overlapped ol; socket S, sclient; int opcode; wsabuf wbuf; DWORD dwbytes, dwflags; // other useful information} overlappedplus; # define op_read 0 # define op_write 1 # define op_accept 2
Let's take a look at the working thread.
Workerthread code:
DWORD WINAPI WorkerThread(LPVOID lpParam){ ULONG_PTR *PerHandleKey; OVERLAPPED *Overlap; OVERLAPPEDPLUS *OverlapPlus, *newolp; DWORD dwBytesXfered; while (1) { ret = GetQueuedCompletionStatus( hIocp, &dwBytesXfered, (PULONG_PTR)&PerHandleKey, &Overlap, INFINITE); if (ret == 0) { // Operation failed continue; } OverlapPlus = CONTAINING_RECORD(Overlap, OVERLAPPEDPLUS, ol); switch (OverlapPlus->OpCode) { case OP_ACCEPT: // Client socket is contained in OverlapPlus.sclient // Add client to completion port CreateIoCompletionPort( (HANDLE)OverlapPlus->sclient, hIocp, (ULONG_PTR)0, 0); // Need a new OVERLAPPEDPLUS structure // for the newly accepted socket. Perhaps // keep a look aside list of free structures. newolp = AllocateOverlappedPlus(); if (!newolp) { // Error } newolp->s = OverlapPlus->sclient; newolp->OpCode = OP_READ; // This function prepares the data to be sent PrepareSendBuffer(&newolp->wbuf); ret = WSASend( newolp->s, &newolp->wbuf, 1, &newolp->dwBytes, 0, &newolp.ol, NULL); if (ret == SOCKET_ERROR) { if (WSAGetLastError() != WSA_IO_PENDING) { // Error } } // Put structure in look aside list for later use FreeOverlappedPlus(OverlapPlus); // Signal accept thread to issue another AcceptEx SetEvent(hAcceptThread); break; case OP_READ: // Process the data read // ... // Repost the read if necessary, reusing the same // receive buffer as before memset(&OverlapPlus->ol, 0, sizeof(OVERLAPPED)); ret = WSARecv( OverlapPlus->s, &OverlapPlus->wbuf, 1, &OverlapPlus->dwBytes, &OverlapPlus->dwFlags, &OverlapPlus->ol, NULL); if (ret == SOCKET_ERROR) { if (WSAGetLastError() != WSA_IO_PENDING) { // Error } } break; case OP_WRITE: // Process the data sent, etc. break; } // switch } // while} // WorkerThread
The content of each handle key variable is the completion key parameter set when the completion port is associated with the socket; the overlap parameter returns a pointer to the overlappedplus structure used for overlapping operations.
Remember that if the overlap operation fails (that is, the returned value is socket_error and the error is not caused by wsa_io_pending), no completion notification will be sent to the completion port. If the overlap operation is successful or the wsa_io_pending error occurs, the completion port will always receive the completion notification.
Socket architecture for Windows NT and Windows 2000
For developing a Winsock application with a large response scale, it is helpful to have a basic understanding of the socket architecture of Windows NT and Windows 2000. It is a Windows 2000 Winsock architecture:
Unlike other types of operating systems, the transfer protocols for Windows NT and Windows 2000 do not have an interface that is similar to sockets and can communicate directly with applications, instead, it uses a more underlying API called the Transport Driver Interface (TDI ). The core-mode driver of Winsock is responsible for connection and buffer management, so as to provide socket simulation (implemented in the AFD. SYS file) to the application and to communicate with the underlying transmission driver.
Who manages the buffer zone?
As mentioned above, the application communicates with the transport protocol driver through Winsock, while AFD. sys manages the buffer for the application. That is, when an application calls the send () or wsasend () function to send data, AFD. sys will copy the data to its own internal buffer (depending on the so_sndbuf value), and then the send () or wsasend () function will return immediately. In this case, AFD. sys is responsible for sending data in the background. However, if the size of the buffer required by the application exceeds the size of so_sndbuf, The wsasend () function blocks until all data is sent.
The same is true for receiving data from a remote client. As long as you do not need to receive a large amount of data from the application, and it does not exceed the value set by so_rcvbuf, AFD. sys will first copy the data to its internal buffer. When an application calls the Recv () or wsarecv () function, data is copied from the internal buffer to the buffer provided by the application.
In most cases, this architecture works well, especially when applications are written in non-overlapping send () and receive () modes under traditional sockets. But the programmer should be careful that, although so_sndbuf and so_rcvbuf options can be set to 0 through setsockopt (), the programmer must be very clear about AFD. what are the consequences of SYS's internal buffer shutdown? Avoid system crashes caused by the possible copying of the buffer zone when sending and receiving data.
For example, an application disables the buffer by setting so_sndbuf to 0, and then sends a blocking send () call. In this case, the system kernel locks the buffer of the application until the receiver confirms that it has received the entire buffer before sending () calls are returned. It seems that this is a simple method to determine whether your data has been fully received by the other party, but it is not actually the case. Think about it, even if the remote TCP notification data has been received, it does not mean that the data has been successfully sent to the client application. For example, the other party may have insufficient resources, resulting in AFD. sys cannot copy data to an application. Another more important problem is that each thread can only send a call once at a time, which is extremely inefficient.
Set so_rcvbuf to 0 and disable AFD. the received buffer of SYS cannot improve the performance, which only forces the received data to be buffered at a lower level than Winsock. When you send a receive call, we also need to copy the buffer zone, so you would not succeed in avoiding the conspiracy of copying the buffer zone.
Now we should be clear that disabling the buffer is not a good idea for most applications. As long as the application needs to keep several wsarecvs overlapping calls on a connection at any time, there is usually no need to close the receiving buffer. If AFD. sys is always available in a buffer provided by the application, it does not need to use an internal buffer.
High-performance server applications can disable the sending buffer without compromising performance. However, such an application must be very careful to ensure that it always sends multiple overlapping sending calls, rather than sending the next one after an overlapping sending is completed. If the application is in the order of sending and sending the next one, it will waste the gap between two sending attempts. In short, it is necessary to ensure that after the transmission driver sends a buffer, you can immediately switch to another buffer zone.
Resource restrictions
Robustness is the primary goal when designing any server application. That is to say,
Your application must be able to cope with any unexpected problems, such as the peak number of concurrent customer requests, the temporary shortage of available memory, and other short-term phenomena. This requires the program designers to pay attention to the resource restrictions in Windows NT and 2000 systems and handle emergencies with ease.
The most basic resource you can directly control is the network bandwidth. Generally, applications using User Datagram Protocol (UDP) may pay more attention to bandwidth restrictions to minimize packet loss. However, when using TCP connections, the server must be carefully controlled to prevent network bandwidth overload from exceeding a certain period of time. Otherwise, a large number of packets may need to be resold or cause a large number of connection interruptions. The bandwidth management method should be determined based on different applications, which is beyond the scope discussed in this article.
The use of virtual memory must also be managed very carefully. By carefully applying for and releasing memory, or applying lookaside lists (a high-speed cache) technology, you can reuse the allocated memory, it will help to control the memory overhead of the server application (the original article is "Let the server application leave a little footprint "), this prevents the operating system from frequently switching the physical memory applied by the application to the virtual memory (the original article is "enabling the operating system to always keep more application address space in the memory "). You can also use the Win32 API setworkingsetsize () to allocate more physical memory to your applications.
When using WinSock, you may encounter two other non-direct resource insufficiency situations. One is the limit of the locked memory page. If you disable the buffer of AFD. sys, when the application sends and receives data, all pages in the application buffer will be locked to the physical memory. This is because the kernel driver needs to access the memory, during which these pages cannot be exchanged. If the operating system needs to allocate some paging physical memory to other applications, and there is not enough memory, the problem will occur. Our goal is to prevent writing a sick program that locks all the physical memory and crashes the system. That is to say, when your program locks the memory, do not exceed the system's memory paging limit.
On Windows NT and 2000 systems, the total memory that all applications can lock is about 1/8 of the physical memory (but this is only a rough estimate, not the basis for your computing memory ). If your application does not pay attention to this, when you issue too many overlapping sending and receiving calls and I/O is not completed, the error error_insufficient_resources may occasionally occur. In this case, you must avoid over-locking the memory. At the same time, note that the system will lock the entire memory page that contains your buffer, so there is a price when the buffer is close to the page boundary (the translator understands that if the buffer just exceeded the page boundary, even if it is 1 byte, the page where the extra byte is located will also be locked ).
Another restriction is that your program may encounter insufficient resources in the system without paging pool. The so-called non-Paging pool is a memory area that will never be swapped out. This memory is used to store data that can be accessed by various kernel components, some kernel components cannot access those page spaces that are exchanged. Drivers for Windows NT and 2000 can allocate memory from this particular non-Paging pool.
When an application creates a socket (OR opens a file in a similar way), the kernel will not allocate a certain amount of memory in the paging pool, and when the socket is bound or connected, the kernel will not reallocate some memory in the paging pool. When you observe this behavior, you will find that if you send some I/O requests (such as sending and receiving data ), you will not allocate more memory in the paging pool (for example, to track a pending I/O operation, you may need to add a custom structure for this operation, as mentioned above ). In the end, this may cause some problems. The operating system will limit the amount of non-Paging memory.
On Windows NT and 2000 operating systems, the number of non-Paging memory allocated to each connection is different, and Windows may be different in future versions. In order to make the application have a longer life cycle, you should not calculate the specific demand for memory in the non-Paging pool.
Your program must prevent consumption to the limit of the non-Paging pool. When there is too much space in the paging pool in the system, some kernel drivers that have nothing to do with your applications will go crazy and even cause system crashes, this is especially likely to happen (and unpredictable) when there are third-party devices or drivers in the system ). At the same time, you also need to remember that there may be other applications that consume non-Paging pools on the same computer, so when designing your applications, we should be especially conservative and cautious in estimating the amount of resources.
It is very complicated to handle the problem of insufficient resources, because you will not receive any special error code in the case above. Generally, you can only receive general wsaenobufs or error_insufficient_resources errors. To handle these errors, first adjust the working configuration of your application to a reasonable maximum value, see the http://msdn.microsoft.com/msdnmag/issues/1000/Bugslayer/Bugslayer1000.asp for memory optimization. If errors continue, check whether the network bandwidth is insufficient. Then, make sure that you have not sent too many sending and receiving calls at the same time. Finally, if you still receive the error of insufficient resources, it is likely that you have encountered the problem of insufficient memory pool without paging. To release the non-Paging memory pool space, close a considerable number of connections in the application and wait for the system to pass through and correct this instantaneous error.
Accept connection requests
One of the most common tasks for a server is to accept connection requests from clients. The only API that uses overlapping I/O to accept connections on a socket is the acceptex () function. Interestingly, the normally synchronous acceptance function accept () returns a new socket, while the acceptex () function requires another socket as one of its parameters. This is because acceptex () is an overlapping operation, so you need to create a socket beforehand (but do not bind or connect to it) and pass this socket through the parameter to acceptex (). The following is a typical pseudocode for using acceptex:
Do {-Wait until the previous acceptex is complete-create a new socket and associate it with the completion port-set the background structure and so on-issue an acceptex request} while (true );
As a highly responsive server, it must make enough acceptex calls, waiting for a client connection request to respond immediately. As for how many acceptex requests are sent, it depends on the communication traffic type that your server program expects. For example, if the connection rate is high (because the connection duration is short or the traffic peak occurs ), the acceptex that needs to be waited for is certainly more than the client connection that occasionally enters. It is wise to use an application to analyze traffic conditions and adjust the number of acceptex waits, rather than fixed to a certain number.
For Windows2000, Winsock provides some mechanisms to help you determine whether the number of acceptex is sufficient. This is to create an event when creating a listening socket. Use the wsaeventselect () API and register the fd_accept Event Notification to associate the socket with this event. Once the system receives a connection request, if no acceptex () in the system is waiting to accept the connection, the above event will receive a signal. With this event, you can determine whether you have issued enough acceptex () or detect an abnormal customer request (as described below ). This mechanism is not applicable to Windows NT 4.0.
One of the major advantages of using acceptex () is that you can accept client connection requests and receive data (by transmitting the lpoutputbuffer parameter) through one call. That is to say, if the client transmits data while sending a connection, your acceptex () call can return immediately after the connection is created and the client data is received. This may be useful, but it may also cause problems, because acceptex () must be returned only when all client data is received. Specifically, if you pass the lpoutputbuffer parameter while sending an acceptex () call, acceptex () is no longer an atomic operation, but a two-step process: accepting client connections, waiting for receiving data. When a mechanism is missing to notify your application of this situation: "The connection has been established and is waiting for client data ", this means that the client may only send connection requests but not data. If your server receives too many such connections, it will reject more valid client requests. This is a common form of denial-of-service (DoS) attacks.
To prevent such attacks, the connection receiving thread should check the sockets waiting in acceptex () from time to time by calling the getsockopt () function (the option parameter is so_connect_time. The option value of the getsockopt () function is set to the time when the socket is connected, or to-1 (indicating that no connection has been established for the socket ). In this case, the wsaeventselect () feature can be well utilized for this check. If the connection has been established but the data has not been received for a long time, terminate the connection by closing the socket provided to acceptex () as a parameter. Note: In most non-emergency situations, if the socket has been passed to acceptex () and is waiting, but the connection has not yet been established, your application should not close them. This is because even if these sockets are closed, for the sake of improving system performance, before the connection enters, or before the socket itself is closed, the data structure in the corresponding kernel mode will not be cleared cleanly.
The thread that sends an acceptex () call seems to be the same as the thread that completes the Port Association operation and processes other I/O completion notifications. However, do not forget to avoid blocking operations in the thread. One side of Winsock2's layered structure is that calling the upper-layer architecture of socket () or wsasocket () API may be very important (the Translator does not quite understand the original meaning, sorry ). Each acceptex () call requires the creation of a new socket, so it is best to have an independent thread dedicated to calling acceptex (), not involved in other I/O processing. You can also use this thread to execute other tasks, such as event records.
The last note about acceptex (): To implement these APIs, you do not need the Winsock2 implementation provided by other providers. This is also applicable to other Microsoft-specific APIs, such as transmitfile () and getacceptexsockaddrs (), and other APIs that may be added to the new Windows version. on Windows NT and 2000, these APIs are provided in Microsoft's underlying provider DLL (mswsock. DLL. the LIB compilation connection is called, or the pointer of the function is dynamically obtained through wsaioctl () (the option parameter is sio_get_extension_function_pointer.
If the function is called directly without obtaining the function pointer in advance (that is, the function is statically connected to mswsock. Lib during compilation and called directly in the Program), the performance will be greatly affected. Because acceptex () is placed outside the Winsock2 architecture, it is forced to obtain the function pointer through wsaioctl () each time it is called. To avoid this performance loss, applications that use these APIs should call wsaioctl () to directly obtain the function pointer from the underlying provider.
See socket architecture:
Transmitfile and transmitpackets
Winsock provides two functions specially optimized for file and memory data transmission. The transmitfile () API function can be used in both Windows NT 4.0 and Windows 2000, while transmitpackets () will be implemented in future Windows versions.
Transmitfile () is used to transmit the file content through Winsock. Generally, you can call createfile () to open a file and call readfile () and wsasend () repeatedly until the data is sent. However, this method is inefficient because every call to readfile () and wsasend () involves a conversion from user mode to kernel mode. If you replace it with transmitfile (), you only need to give it a handle to the opened file and the number of bytes to be sent. The involved mode conversion operation will only call createfile () it occurs once when the file is opened, and then again when transmitfile. In this way, the efficiency is much higher.
Transmitpackets () is more advanced than transmitfile (). It allows users to send multiple specified files and memory buffers only once. The function prototype is as follows:
BOOL TransmitPackets( SOCKET hSocket, LPTRANSMIT_PACKET_ELEMENT lpPacketArray, DWORD nElementCount, DWORD nSendSize, LPOVERLAPPED lpOverlapped, DWORD dwFlags );
Here, lppacketarray is a structure array. Each element can be a file handle or a memory buffer. The structure is defined as follows:
typedef struct _TRANSMIT_PACKETS_ELEMENT { DWORD dwElFlags; DWORD cLength; union { struct { LARGE_INTEGER nFileOffset; HANDLE hFile; }; PVOID pBuffer; };} TRANSMIT_FILE_BUFFERS;
Each field is self-descriptive ).
Dwelflags field: Specifies whether the current element is a file handle or a memory buffer (specified by the constant tf_element_file and tf_element_memory respectively );
Clength field: specifies the number of bytes that will be sent from the data source. (if it is a file, the value 0 indicates that the entire file is sent );
Untitled consortium in the structure: memory buffer (and possible offset) that contains the file handle ).
Another advantage of using these two APIS is that you can reuse the socket handle by specifying the tf_reuse_socket and tf_disconnect flag. Every time the API completes data transmission, it will disconnect at the transport layer level, so that this socket can be re-provided to acceptex. Using this optimized programming method will reduce the pressure on the thread dedicated to the operation to create a socket (as mentioned above ).
Both APIs share a common weakness: in Windows NT Workstation or Windows 2000 Professional Edition, a function can only process two call requests at a time, full support is only available for Windows NT, Windows 2000 Server, Windows 2000 Advanced Server, or Windows 2000 data center.
Put them together
In the preceding sections, we discuss the functions, methods, and possible resource bottlenecks required for developing high-performance, large-response applications. What do these mean to you? In fact, it depends on how you construct your server and client. When you can better control the server and client design, the more you can avoid bottlenecks.
Let's look at a demonstration environment. We need to design a server to respond to client connections, send requests, receive data, and disconnect. Then, the server will need to create a listening socket, associate it with a completed port, and create a working thread for each CPU. Create another thread dedicated to issuing acceptex (). We know that the client will send data immediately after a connection request is sent, so it will be easier if we are ready to receive the buffer. Of course, do not forget to poll the socket used in the acceptex () call from time to time (using the so_connect_time option parameter) to ensure that there is no malicious timeout connection.
There is an important issue in this design. We should consider the number of acceptex () waiting times. This is because every time an acceptex () is issued, we need to provide a receiving buffer for it at the same time, so there will be a lot of locked pages in the memory (as mentioned above, each overlap operation consumes a small part of the non-Paging memory pool and locks all the involved buffers ). There is no definite answer to this question. The best way is to make this value adjustable. Through repeated performance tests, you can get the best value in a typical application environment.
Well, after you make a clear estimate, the following is the problem of sending data. The focus is on how many concurrent connections you want the server to process at the same time. Generally, the server should limit the number of concurrent connections and the number of sending and calling requests waiting for processing. The more concurrent connections, the more non-Paging memory pools are consumed. The more sending and calling requests waiting for processing, the more pages the memory is locked (Be careful not to exceed the limit ). This also requires repeated tests to know the answer.
In the preceding environment, you do not need to disable the buffer of a single socket, because only one operation to receive data is performed in acceptex, it is not too difficult to provide the receiving buffer for each incoming connection. However, if the interaction between the client and the server changes, the client needs to send more data after sending the data once. In this case, disabling the receiving buffer is not good, unless you want to ensure that each connection sends an overlapping Receiving call to receive more data.
Conclusion
Developing a Winsock server with a large response scale is not terrible. In fact, it is to set up a listening socket, accept connection requests, and perform overlapping sending and receiving calls. By setting a reasonable number of overlapping calls for waiting to prevent unused non-Paging memory pools, this is the main challenge. Based on the principles discussed earlier, you can develop server applications with large response sizes.