A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Another advantage of the overlapping I/O model is that Microsoft provides some unique extension functions for the overlapping I/O model. When overlapping I/O models are used, you can use different notification completion methods.
The overlapping I/O model with event object notifications cannot be scaled, because the I/O model can support up to six or four sockets at a time for each thread that sends a WSAWaitForMultipleEvents call. If you want this model to manage more than 64 sockets at the same time, you must create additional worker threads to wait for more event objects. Because the operating system can process limited event objects at the same time, the I/O Model Based on Event objects is not scalable.
Using the overlapping I/O model of completion routine notification is not the best choice for developing high-performance servers for the following reasons. First, many extension functions do not allow the use of APC (Asyncroneus Procedure Call, asynchronous process Call) to complete the notification. Second, because of the unique processing mechanism of APC in the system, the application threads may wait infinitely without being notified of completion. When a thread is in the "warning status", all pending APC instances are processed in the FIFO order. Now, in this case, the server has established a connection and WSARecv containing the completion routine pointer is called to deliver an overlapping I/O Request. When data arrives (that is, when I/O is completed), the routine is executed and WSARecv is called again to throw another overlapping I/O Request. The I/O operation thrown by an APC takes some time to complete, so another completion routine may be waiting for execution during this period (for example, when the WSARecv is not completed yet, another new customer accesses concurrent data), because more data needs to be read (the data sent from the previous customer has not been read ). As long as there is still "pending" (uncollected) data on the socket (which ships WSARecv), it will cause the call thread to be congested for a long time.
The overlapping I/O model based on completed port notifications is an I/O model that truly supports high scalability provided by the Windows NT System. In the previous chapter, we discussed several common I/O models of Winsock, and explained that port completion is the best choice when dealing with large-scale customer connections, because it provides the best scalability.
Performance Test Result 1 for different Winsock I/O models is shown in. Among them, the server uses Pentium 4 1.7 GHz Xeon CPU, 233 m memory; the client has three PCs, which are configured with Pentium 2 128 MHz, 350 MB memory, Pentium 2 MHz, 128 MB memory, Itanium 733 MHz, 1 GB memory. The operating systems installed on servers and clients are Windows XP.
Figure 1 Performance Comparison of Different I/O models
1. Analysis of the test results provided in figure 1 shows that the blocking mode has the worst performance in the I/O model used. In this test program, the server creates two threads for each customer: one for receiving data processing and the other for sending data processing. The common problem in multiple tests is that the blocking mode cannot cope with large-scale customer connections because it consumes too much system resources to create threads. Therefore, when the server creates too many threads and then calls the CreateThread function, the error ERROR_NOT_ENOUGH_MEMORY will be returned, indicating that the memory is insufficient. The customers who send the connection request receive the WSAECONNREFUSED error message, indicating that the connection attempt is rejected.
Let's take a look at the listener function listen. Its prototype is as follows:
WINSOCK_API_LINKAGE int WSAAPI listen (SOCKET s, int backlog );
The parameter 1 s has been bound to the address listening socket.
Parameter 2 backlog specifies the maximum queue length waiting for connection.
The backdog parameter is very important because there may be several connection requests to the server at the same time. For example, if the backlog parameter is 2, when three clients send connection requests at the same time, the first two will be placed in a "waiting for processing" queue so that applications can provide services for them in sequence. The third connection request will cause a WSAECONNREFUSED error. Once the Server accepts a connection request, the connection request is deleted from the queue to continue receiving connection requests from other customers. That is, when a connection request arrives, the queue is full, and the customer will receive a WSAECONNREFUSED error. The size of the backlog parameter is limited, which is determined by the Protocol provider.
In the congestion mode, the concurrent processing capacity is extremely difficult to break through due to system resource restrictions.
2. Non-blocking mode provides better performance than blocking mode, but takes too much CPU processing time. The test Server puts all customers' socket categories in the FD_SET set collection, then calls the select function to filter the sockets with events in the corresponding set, and updates the collection. Next, call the FD_ISSET macro to determine whether a socket is in the FD_SET set. As the number of customer connections increases, the limitations of this model are becoming increasingly apparent. To determine whether a socket has a network event, you need to perform a traversal of the collection FD_SET! Iterative search is used to scan the FD_SET updated by the select statement to improve the performance. The bottleneck is that the server must be able to quickly scan information about sockets with network events in the FD_SET set. To solve this problem, you can use more complex scanning algorithms, such as hash search, which is highly efficient. You must also note that the usage of non-Paging pools (that is, the memory allocated directly in the physical memory) is extremely high. This is because AFD (Ancillary
Function Driver, composed of afd. sys provides underlying drivers that support Windows Sockets applications. afd runs in kernel mode. sys driver mainly manages Winsock TCP/IP communication) and TCP will use I/O cache, because the server's Data Reading speed is limited, relative to the CPU processing speed, i/O is basically zero-byte throughput.
3. The WSAAsyncSelect Model Based on Windows message mechanism can process a certain number of customer connections, but the scalability is not very good. Because the message pump will soon be congested, reducing the speed of message processing. In several tests, the server can only process about 1/3 of client connections. Too many client connection requests will return the error code WSAECONNREFUSED, indicating that the server cannot process the FD_ACCEPT message in time, resulting in connection failure. In this way, the connection requests to be processed in the listening queue will not be full. However, the data in the table above shows that the average throughput of established connections is extremely low (even for customers who have limited bit rates ).
4. The WSAEventSelect Model Based on Event Notification performs exceptionally well. In most tests, the server can basically process all customer connections and maintain a high data throughput. The disadvantage of this model is that every time there is a new connection, the thread pool needs to be dynamically managed because each thread can only wait for 64 event objects. When the number of customer connections exceeds 64, a new thread needs to be created. In the last test, after over 45,000 customer connections are established, the system response speed becomes very slow. At this time, a large number of threads are created to process large-scale customer connections, occupying excessive system resources. The maximum number of 791 threads is reached, and the server cannot accept more connections. The reason is that WSAENOBUFS has no available buffer space and the socket cannot be created. In addition, the client program has reached the limit and cannot maintain established connections.
The overlapping I/O model using event notification is similar to the WSAEventSelect model in scalability. Both models depend on the thread pool waiting for Event Notifications. switching between a large number of thread contexts is a common constraint for customer communication. The testing results of the overlapping I/O model and the WSAEventSelect model are very similar and both of them perform well until the number of threads exceeds the limit.
5. Finally, we will test the performance of the overlapping I/O model based on the completed port notification. The data in the above table shows that it is the best performance in all I/O models. The memory usage (including the user paging pool and non-Paging pool) and supported customer connections are basically the same as the overlapping I/O Model Based on Event notification and the WSAEventSelect model. The difference lies in CPU usage. The completed port model only consumes 60% of the CPU, but the other two models (Event Notification-based overlapping I/O model and WSAEventSelect model) maintain the same number of connections) more CPU usage. Another obvious advantage of port completion is that it maintains a larger throughput.
After analyzing the above models, we can find that the defect of the client-server data communication mechanism is a bottleneck. In the above test, the server is designed to only respond in a simple way, that is, to send the data sent from the client back. The client (even if there is a bit rate limit) keeps sending data to the server, this causes a large amount of data to be blocked on the socket corresponding to the client on the server (both the TCP buffer and the AFD single socket buffer are on non-Paging pools ). In the last three models with better performance, only one input operation can be performed at a time, which means that a large amount of data remains in the "pending" status most of the time. You can modify the server program so that it can receive data asynchronously, so that once the data reaches, the data needs to be cached. The disadvantage of this solution is that a customer receives a large amount of data asynchronously when sending data continuously. This will cause other customers to be unable to access, because neither the call thread nor the worker thread can handle other events or complete notifications. Generally, when a non-blocking asynchronous receiving function is called, WSAEWOULDBLOCK is returned first, and data is transmitted intermittently without the continuous receiving method.
From the above test results, we can see that the WSAEventSelect model and the overlapping I/O model are the best performance. In the two event-based models, it is cumbersome to create a thread pool to wait for the event to complete the notification and perform subsequent processing, but it does not affect the performance of medium-sized servers. When the number of threads increases with the number of client connections, the CPU will spend a lot of time on thread context switching, which will affect the server's scalability, because after the number of connections reaches a certain level, then saturated. The port model provides the best scalability. Because of the low CPU usage, the model supports the most customer connections than other models.
I/O Model Selection
Through the test and analysis of various models in the previous section, it is clear how to select the I/O model that best suits your application. Compared with developing a simple multi-threaded lock-mode application, other I/O models require more complex programming. Therefore, the following principles apply to the choice of client and server application development models.
If you plan to develop a client application to manage one or more sockets at the same time, we recommend that you use the overlapping I/O or WSAEventSelect model to improve performance to a certain extent. However, if a Windows-based application needs to manage window messages, the WSAAsyncSelect model may be the best choice, because WSAAsyncSelect itself is based on the Windows message model. With this model, the program must have the message processing function.
2. Server Side
If a server application is developed and multiple sockets need to be controlled at a given time, we recommend that you use the overlapping I/O model, which is also from the performance perspective. However, if the server provides services for a large number of I/O requests at any given time, you should consider using I/O to complete the port model to achieve better performance.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service