Performance analysis of Windows five IO models

Last Update:2018-07-26 Source: Internet

Author: User

Tags apc memory usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Another advantage of the

overlapping I/O models is that Microsoft provides some unique extension functions for overlapping I/O models. When using the overlapped I/O model, you can choose to use a different completion notification method. &NBSP

Overlapping I/O models with event object notifications are not scalable because the I/O model can support up to 6 4 sockets at a time for each thread that emits a wsawaitformultipleevents call. If you want the model to manage more than 64 sockets at the same time, you must create additional worker threads to wait for more event objects. Because the operating system can handle a limited number of event objects at the same time, the I/O model based on the event object is not scalable.
Overlapping I/O models with completed routine notifications are not the best choice for developing high-performance servers for several reasons. First, many extension features do not allow the use of APC (Asyncroneus Procedure Call, asynchronous procedure calls) to complete notifications. Second, due to the unique processing mechanism of APC within the system, application threads can wait indefinitely without getting complete notifications. When a thread is in a "warning state," All pending APCs are processed in a first-in, first-out order (FIFO). Now consider a situation in which the server has established a connection and a wsarecv that contains the completion routine pointer has posted an overlapping I/O request. When data arrives (that is, I/O is complete), the completion routine executes and calls WSARecv again to throw another overlapping I/O request. An APC throws an I/O operation that takes a certain amount of time to complete. So there may be another completion routine waiting to be executed (for example, when the WSARECV has not yet been received, there is a new client access concurrency) because there is more data to read (the data from the previous customer has not been read). As long as the "pending" (not received) data on that socket (posted WSARecv) can cause the calling thread to block for a long time.
The overlapping I/O model based on completion port notifications is a truly scalable I/O model provided by Windows NT systems. In the previous chapter, we explored several common I/O models of Winsock, and explained that the completion port is the best choice when dealing with a large client connection because it provides the best scalability.
Performance test results for different Winsock I/O models are shown in Figure 1. Where the server uses the cpu,768m memory of the Pentium 4 1.7 GHz Xeon, the client has 3 PCs, the configuration is Pentium 2 233MHz, 128 MB memory, Pentium 2 MHz, 128 MB memory, Itanium 733 MHz, 1 GB of memory. The server, client-installed operating system is Windows XP.

Figure 1 Performance comparisons for different I/O models

1. Analysis of the test results provided in Chart 1 shows that the blocking mode has the worst performance in the I/O model used. In this test program, the server creates two threads for each client: one responsible for processing the data, and one responsible for handling the sending of the data. The common problem with multiple tests is that blocking mode is difficult to handle with a large customer connection because it consumes too much system resources on the creation thread. Therefore, when the server creates too many threads and then calls the CreateThread function, it returns the Error_not_enough_memory error, which indicates that there is not enough memory. The customer who issued the connection request received a wsaeconnrefused error indicating that the attempt to connect was denied.
Let's take a look at the listener function listen, whose prototype is as follows:
Winsock_api_linkage int Wsaapi Listen (SOCKET s, int backlog);
Parameter a listening socket where the address is already bound.
Parameter Two backlog specifies the maximum queue length that is waiting for a connection.
Parameter Backdog is important because it is entirely possible to have several connection requests to the server at the same time. For example, assuming that the backlog parameter is 2 with three clients issuing connection requests at the same time, the first two are placed in a wait-processing queue so that the application can serve them sequentially. A third connection request would result in a wsaeconnrefused error. Once the server accepts a connection request, the connection request is deleted from the queue so that it can continue to receive connection requests from other customers. That is, when a connection request arrives, the queue is full, and the customer receives a wsaeconnrefused error. There is a limit to the size of the backlog parameter itself, which is determined by the provider of the agreement.
Therefore, in blocking mode, due to the limitation of system resources, the concurrent processing quantity is very difficult to break through.

2. Non-blocking mode performance is slightly better than blocking mode, but it takes up too much CPU processing time. The test server puts all the customer's socket classifications into the Fd_set collection, and then calls the Select function to filter the socket in the corresponding collection for which an event occurred and update the collection. Next, call the Fd_isset macro to determine whether a socket is in the Fd_set collection that was originally joined. With the increasing number of customer connections, the limitations of this model are gradually appearing. To determine whether a socket has a network event, you need to perform a traversal of the collection Fd_set! The performance can be improved by using iterative search to scan the fd_set of the select Update. The bottleneck is that the server must be able to quickly scan for information about the sockets in the Fd_set collection that have network events occurring. For this problem, you can use more sophisticated scanning algorithms, such as hash search, which is highly efficient. Another issue to note is the extremely high use of non-paged pools (that is, memory allocated directly in physical memory). This is because AfD (ancillary Function Driver, the underlying driver provided by Afd.sys to support the Windows Sockets application, which runs in kernel mode Afd.sys driver main management Winsock tcp/ IP traffic) and TCP will use I/O caching, because the speed at which the server reads data is limited, I/O is essentially 0 bytes of throughput relative to CPU processing speed.

3. The WSAAsyncSelect model based on the Windows Messaging mechanism can handle a certain number of customer connections, but the scalability is not very good. Because the message pump will soon block, reducing the speed of message processing. In several tests, the server can handle only about 1/3 of the client connections. Too many client connection requests will return the error hint code wsaeconnrefused, stating that the server was unable to process the fd_accept message in a timely manner causing the connection to fail, so that pending connection requests in the listening queue are not full. However, the data in the table above shows that the average throughput of those connections that have been established is extremely low (even for customers who have limited bit rates).

4. The WSAEventSelect model based on event notifications is surprisingly good. In all tests, most of the time, the server is basically able to handle all of the customer connections and maintain high data throughput. The disadvantage of this model is that whenever there is a new connection, the dynamic management of the thread pool is required because each thread can only wait for 64 event objects. New threads need to be created when there are more than 64 customer connections and new customer access. In the last Test, when more than 45,000 customer connections were established, the system response rate became very slow. This is because a large number of threads have been created to handle large client connections, consuming too much system resources. 791 threads basically reached the limit, the server can no longer accept more connections, because WSAENOBUFS: No buffer space available, the socket could not be created. In addition, the client program reached its limit and could not maintain the connection already established.
Overlapping I/O models using event notifications and WSAEventSelect models are similar in scalability. Both models rely on the thread pool that waits for event notifications, and the switching of a large number of thread contexts is a common constraint when handling customer communications. The overlapping I/O model and the WSAEventSelect model test results are very similar, and they all perform well until the number of threads exceeds the limit.

5. Finally, the performance test for the overlapped I/O model based on the completion port notification, as the data in the previous table shows, is the best performance in all I/O models. Memory usage, including user paging pools and non-paged pools, and supported client connections are essentially the same as the overlapped I/O model and the WSAEventSelect model based on event notifications. The real difference is in the CPU footprint. The completion port model consumes only 60% of the CPU, but while maintaining the same size of connectivity, the other two models (overlapping I/O models based on event notifications and WSAEventSelect models) occupy more CPUs. Another obvious advantage of completing the port is that it maintains greater throughput.
After analyzing the above models, it can be found that the defect of the data communication mechanism of client and server is a bottleneck. In the above test, the server was designed to simply respond by sending the data sent back by the client only. The client (even with the bit rate limit) keeps sending data to the server, which causes a large amount of data to block on the socket on the server that corresponds to the client (either a TCP buffer or a AfD single socket buffer, which is on the non-paged pool). In the last three better models, only one accept input operation can be performed at the same time, which means that at most times there is still a lot of data in the "Pending" state. You can modify the server program to accept data asynchronously, so that once the data is available, the data needs to be cached. The disadvantage of this scenario is that when a customer sends data sequentially, it receives a large amount of data asynchronously. This can cause other clients to be inaccessible because neither the calling thread nor the worker thread can handle other events or complete the notification. Typically, a non-blocking asynchronous receive function is invoked to first return wsaewouldblock, and then the data is intermittently transmitted without taking a sequential reception.
From the above test results, we can see that the WSAEventSelect model and the overlapped I/O model are the best performance. In the two models based on event notification, it is cumbersome to create a thread pool to wait for the event to complete notification and follow up, but it does not affect the good performance of the midsize server. When the number of threads increases with the number of client connections, the CPU spends a lot of time on the context switch of the thread, which affects the scalability of the server because it is saturated after a certain amount of connectivity. Completion of the port model provides the best scalability because of the low CPU utilization and the maximum number of customer connections supported by other models.
Choice of I/O model
By testing the various models in the previous section, it is clear how to choose the I/O model that best suits your application. Each of the other I/O models requires more complex programming work than the development of a simple, multithreaded, lock-mode application. Therefore, for the client and server application development model selection, there are the following principles.
1. Client
If you plan to develop a client application that manages one or more sockets at the same time, it is recommended that you use overlapping I/O or WSAEventSelect models to improve performance to some extent. However, if you are developing a Windows based application to manage window messages, the WSAAsyncSelect model is probably the best option, because WSAAsyncSelect itself is a reference from the Windows Messaging model. Using this model, the program needs to have message processing function.
2. Server-side
If you are developing a server application, to control multiple sockets at a given time, it is recommended that you use an overlapping I/O model, which is also considered from a performance perspective. However, if the server is servicing a large number of I/O requests at any given time, consider using the I/O completion port model for better performance.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More