Analysis of Several Winsock I/O models (Comprehensive Analysis)

Source: Internet
Author: User
Tags apc

Summary 

Socket is the basis of communication and the basic interface supporting network protocol data communication.WinsocketProvides some interestingI/OModel to facilitate applicationProgramOne "Asynchronous" method is used to manage the communication on one or more sockets at a time. These models includeSelect(Select ),Wsaasynselect(Asynchronous selection ),Wsaeventselect(Event Selection ),Overlapped I/O(OverlappingI/O) AndCompletion port(Complete port ).

① Select model:

The select model is one of the most widely used models in Winsock. The core is the select function, which can be used to determine whether data exists on the socket or whether data can be written to a socket. This function can effectively prevent the application from being blocked when the socket is in blocking mode; at the same time, it can also prevent the generation of a large number of wsaewouldblock error select. The advantage is that it can perform multiple connections and I/O from multiple sockets of a single thread. This avoids the increasing number of threads that are congested with sockets and multiple connections.

② Wsaasyncselect model:

Because it is based on messages, the key is the wsaasyncselect function, which sends socket messages to the hwnd window, and then processes the corresponding fd_read, fd_write and other messages. Advantage: The wsaasyncselect and wsaeventselect models provide asynchronous notifications for reading and writing data, but they do not provide asynchronous data transmission, while overlapping and complete Ports provide asynchronous data transmission. In addition, it can process many connections at the same time with low system overhead, while the select model also needs to establish the fd_set structure. Disadvantage: you must use a window to receive messages. If you process thousands of sockets, you cannot handle them.

③ Wsaeventselect model:

This is also a network event notification based on time, but unlike wsaasyncselect, it is mainly completed by the event object handle, rather than through the window. Advantage: No window is required. Disadvantage: You can only wait for 64 events at a time, so it is necessary to organize a thread pool when processing multiple sockets. Therefore, the scalability is inferior to the subsequent completion port.

④ Overlapping model:

This model enables the program to achieve better system performance. The basic design principle is to allow applications to use overlapping data structures to deliver one or more I/O requests at a time. The application can provide services to these submitted requests after they are completed. It is divided into two implementation methods: Use in events, and complete routines.

⑤ Complete Port:

Port completion provides the best scalability, which can often make the system achieve the best performance. It is the first choice for processing thousands of sockets. Essentially, to complete the port model, you need to create a Windows Port object that manages overlapping I/O requests through a specified number of threads, to provide services for completed overlapping I/O requests.

Five TypesI/OModel Performance Analysis

Another advantage of the overlapping I/O model is that Microsoft provides some unique extension functions for the overlapping I/O model. When overlapping I/O models are used, you can use different notification completion methods.

The overlapping I/O model of event object notification cannot be scaled, because for each thread that sends the wsawaitformultipleevents call, the I/O model supports up to 6 4 sockets at a time. If you want this model to manage more than 64 sockets at the same time, you must create additional worker threads to wait for more event objects. Because the operating system can process limited event objects at the same time, the I/O Model Based on Event objects is not scalable.

The overlapping I/O model of routine notification is not the best choice for developing high-performance servers for the following reasons. First, many expansion functions do not allow the use of APC (asyncroneus procedure call, asynchronous process call) to complete the notification. Second, because of the APC processing mechanism exclusive to the system, the application thread may wait infinitely without being notified of completion. When a thread is in the "warning status", all pending APC instances are processed in the FIFO order. Now, in this case, the server has established a connection and wsarecv containing the completion routine pointer is called to deliver an overlapping I/O Request. When data arrives (that is, when I/O is completed), the routine is executed and wsarecv is called again to throw another overlapping I/O Request. The I/O operation thrown by an APC takes some time to complete, so another completion routine may be waiting for execution during this period (for example, when the wsarecv is not completed yet, another new customer accesses concurrent data), because more data needs to be read (the data sent by the previous customer has not yet been read ). As long as there is still "pending" (uncollected) data on the socket (which ships wsarecv), it will cause long congestion of the calling thread.

The overlapping I/O model based on port notification is a truly scalable I/O model provided by the Windows NT System. In the previous chapter, we discussed several common I/O models of WinSock, and explained that port completion is the best choice when dealing with large-scale customer connections, because it provides the best scalability.

Performance Test Result 1 for different Winsock I/O models is shown. The server uses the pentium4 1.7 GHz Xeon CPU and 768 m memory. The client has three PCs, which are configured with Pentium 2 233 MHz, 128 MB memory, and Pentium 2 350 MHz, 128 MB memory, itanium733 MHz, 1 GB memory. The operating systems installed on servers and clients are Windows XP.

1. Analysis of the test results provided in figure 1 shows that the blocking mode has the worst performance in the I/O model used. In this test program, the server creates two threads for each customer: one for receiving data processing and the other for sending data processing. The common problem in multiple tests is that the blocking mode cannot cope with large-scale customer connections because it consumes too much system resources to create threads. Therefore, when the server creates too many threads and then calls the createthread function, the error error_not_enough_memory will be returned, indicating that the memory is insufficient. The customers who send the connection request receive the error message wsaeconnrefused, indicating that the connection attempt is rejected.

Let's take a look at the listener function listen. Its prototype is as follows:

Winsock_api_linkageint wsaapi listen (sockets, int backlog );

The parameter 1 S has been bound to the address listening socket.

Parameter 2 backlog specifies the maximum queue length waiting for connection.

The backdog parameter is very important because callback may have several connection requests to the server at the same time. For example, if the backlog parameter is 2 and three clients send connection requests at the same time, the first two will be placed in a "waiting for processing" queue so that applications can provide services for them in sequence. The third connection request will cause a wsaeconnrefused error. Once the Server accepts a connection request, the connection request is deleted from the queue to continue receiving connection requests from other customers. That is, when a connection request arrives, the queue is full, and the customer will receive a wsaeconnrefused error. The size of the backlog parameter is limited, which is determined by the Protocol provider.

Therefore, in the blocking mode, the concurrent processing capacity is extremely difficult to break through due to system resource restrictions.

2. Non-blocking mode provides better performance than blocking mode, but takes too much CPU processing time. The test server places all customers' socket classes in the fd_set set, then calls the select function to filter out the sockets with events in the corresponding set, and updates the set. Next, call the fd_isset macro to determine whether a socket is in the fd_set set. As the number of customer connections increases, the limitations of this model are becoming increasingly apparent. To determine whether a socket has a network event, you need to perform a traversal of the collection fd_set! Iterative search is used to scan the fd_set updated by the SELECT statement, which improves the performance. The bottleneck is that the server must be able to quickly scan information about sockets with network events in the fd_set set. To solve this problem, you can use more complex ScanningAlgorithmSuch as hash search, the efficiency is extremely high. You must also note that non-Paging pools (that is, internal memory allocated directly in the physical memory) are extremely high. This is because AFD (Ancillary Function Driver, composed of AFD. sys provides underlying drivers that support windowssockets applications, and runs AFD in kernel mode. SYS driver mainly manages winsocktcp/ip communication) and TCP will use I/O cache, because the speed of the server to read data is limited, relative to the CPU processing speed, i/O is based on zero-byte throughput.

3. The wsaasyncselect Model Based on the message consumption mechanism of Windows can process a certain number of customer connections, but the scalability is not very good. Because the message pump will soon be congested, reducing the speed of message processing. In several tests, the server can only process about 1/3 of client connections. Too many client connection requests will return the error code wsaeconnrefused, indicating that the server cannot process the fd_accept message in time, resulting in connection failure. In this way, the connection requests to be processed in the listening queue will not be full. However, the data in the table above shows that the average throughput of established connections is extremely low (even for customers who have limited bit rates ).

4. The wsaeventselect Model Based on Event Notification performs exceptionally well. In most tests, the server can basically process all customer connections and maintain a high data throughput. The disadvantage of this model is that every time there is a new connection, the thread pool needs to be dynamically managed because each thread can only wait for 64 event objects. When the number of customer connections exceeds 64, a new thread needs to be created. In the last test, after over 45,000 customer connections are established, the system response speed becomes very slow. At this time, a large number of threads are created to process large-scale customer connections, occupying excessive system resources. The maximum number of 791 threads is reached, and the server cannot accept more connections. The reason is that wsaenobufs has no available buffer space and the socket cannot be created. In addition, the client program has reached the limit and cannot maintain established connections.

The overlapping I/O model using event notification is similar to the wsaeventselect model in scalability. Both models depend on the thread pool waiting for Event Notifications. switching between a large number of thread contexts is a common constraint for customer communication. The testing results of the overlapping I/O model and the wsaeventselect model are very similar and both of them perform well until the number of threads exceeds the limit.

5. Finally, we will test the performance of the overlapping I/O model based on the completed port notification. From the data in the above table, we can see that it is the best performance in all I/O models. The memory usage (including the user paging pool and non-Paging pool) and supported customer connections are basically the same as the overlapping I/O Model Based on Event notification and the wsaeventselect model. The difference lies in CPU usage. The completed port model only consumes 60% of the CPU, but the other two models (Event Notification-based overlapping I/O model and wsaeventselect model) maintain the same number of connections) more CPU usage. Another obvious advantage of port completion is that it maintains a larger throughput.

After analyzing the above models, we can find that there is a bottleneck in the client-server data communication mechanism. In the above test, the server is designed to only respond in a simple way, that is, to send the data sent from the client back. The client (even if there is a bit rate limit) keeps sending data to the server, this causes a large amount of data to be blocked on the socket corresponding to the client on the server (both the TCP buffer and the AFD single socket buffer are on non-Paging pools ). In the last three models with better performance, only one input operation can be performed at a time, which means that a large amount of data remains in the "pending" status most of the time. You can modify the server program so that it can receive data asynchronously, so that once the data reaches, the data needs to be cached. The disadvantage of this solution is that a customer receives a large amount of data asynchronously when sending data continuously. This will cause other customers to be unable to access because neither the call thread nor the worker thread can handle other events or complete notifications. Generally, when a non-blocking asynchronous receiving function is called, wsaewouldblock is returned first, and data is transmitted intermittently without the continuous receiving method.

From the above test results, we can see that the wsaeventselect model and the overlapping I/O model are the best performance. In the two event-based models, it is cumbersome to create a thread pool to wait for the event to complete the notification and perform subsequent processing, but it does not affect the performance of medium-sized servers. When the number of threads increases with the number of client connections, the CPU will spend a lot of time on thread context switching, which will affect the server's scalability, because after the number of connections reaches a certain level, then saturated. The port model provides the best scalability. Because of the low CPU usage, the model supports the most customer connections than other models.

I/O Model Selection

Through the test and analysis of various models in the previous section, it is clear how to select the I/O model that best suits your application. Compared with developing a simple multi-thread lock-mode application, other I/O models require more complex programming. Therefore, the following principles apply to the choice of client and server application development models.

1. Client

If you plan to develop a client application to manage one or more sockets at the same time, we recommend that you use the overlapping I/O or wsaeventselect model to improve performance to a certain extent. However, if a Windows-based application needs to manage window messages, wsaasyncselect is probably the best choice, because wsaasyncselect itself is based on the Windows message model. With this model, the program must have the message processing function.

2. Server Side

If a server application is developed and multiple sockets need to be controlled at a given time, we recommend that you use the overlapping I/O model, which is also from the performance perspective. However, if the server provides services for a large number of I/O requests at any given time, you should consider using I/O to complete the port model to achieve better performance.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.