I/O completed port (Windows core programming)

Source: Internet
Author: User

One Service ApplicationProgramThe structure can be in two ways:

    • In serial mode, a single thread waits for a customer to send a request (usually through the network ). When a request comes, the thread wakes up to process the customer's request.
    • In the concurrency model, a single thread waits for the customer to send a request, and then creates a new thread to process the request. When a new thread processes a customer request, the initial thread loops back and waits for another customer request. The thread processing the customer request ends after processing.

 

The problem with the serial model is that it cannot process many simultaneous requests well and is only applicable to the simplest service programs. Ping the server is a good example of a Serial Server.

 

Therefore, the concurrency model is the most common. It creates a new thread for each request. In addition, by adding hardware capabilities, it is easy to improve its performance.

 

When the concurrency model is implemented on NT, the Microsoft NT team noticed that the performance of these applications was not as high as expected. Especially when multiple threads are running. Because all these threads are runable (not suspended or waiting for something), microsoft realized that the NT kernel spent too much time converting the context of the running thread ), however, the time that is really left for the threads to do their own work is compressed.

 

// Here is an example of mine. It took me four hours to go to the terracotta warriors in the afternoon, the terracotta warriors stayed for only 40 minutes. This example is exaggerated, but it is helpful for understanding.

 

To make nt a powerful server environment, Microsoft needs to solve this problem. The solution is a kernel object called the I/O completion port, which is introduced in nt3.5 for the first time. The theoretical basis of the I/O completion port is that there must be an upper limit on the number of threads running in parallel. 500 concurrent client requests do not mean 500 running threads. But what is the proper number of concurrent threads? As long as the number of threads that can be run exceeds the number of CPUs, the operating system must take time to switch the thread context.

 

An inefficiency of the parallel model is that a new thread is created for each customer request. Creating a thread has less overhead than creating a process, but it is far from having no overhead. If a thread pool is created during application initialization and these threads are idle during application execution, the program performance can be further improved. The thread pool is used to complete the I/O port.
The I/O completion port may be the most complex Kernel Object provided by Win32. To create an I/O completion port, call createiocompletionport:

 

Handle createiocompletionport (handle hfilehandle, handle hexistingcompletionport, DWORD dwcompletionkey, DWORD dwnumberofconcurrentthreads );

 

The first three parameters are only useful when the complete port is associated with the device. If no device is associated and only the port is created, the first three parameters can be: invalid_handle_value, null, 0. The last parameter indicates the maximum number of threads that can be run simultaneously on the I/O port. If the value is 0, the default value is the number of CPUs on the machine. However, you can experiment with several different values to determine which value has the best performance. By the way, this function is the only Win32 function that creates a kernel object without the lpsecurity_attributes parameter. This is because the completion port is only applied to one process.

When you create an I/O completion port, the kernel actually creates five different data structures.

 

The first one isDevice List. All devices associated with the completion port will appear in this list. The structure is:

 

Hdevice Dwcompletionkey

When createiocompletionport is called to associate a device, the table items are added. When the device handle is disabled, the table items are deleted.

The device can be a file, socket, mail slot, or pipe. The completion key can be customized.

 

The second data structure isI/O Completion queue. When a device's asynchronous I/O request is complete, the system checks whether the device is associated with a complete port. If yes, the system adds the completed I/O request item to the I/O Completion queue of the completed port. Each table item in the queue provides the number of transmitted bytes, a 32-bit completion key, a pointer to the overlapped structure of the I/O request, and an error code.

Dwbytestransferred Dwcompletionkey Poverlapped Dwerror

When the I/O request is complete or when postqueuedcompletionstatus is called, the table item is added. When a table item is deleted in the "waiting for thread queue", the table item is deleted.

When the service application is initialized, it should create an I/O completion port, and then create a thread pool to process customer requests. The problem is how many threads should exist in the pool. This is a difficult question to answer. The standard answer is to multiply the number of CPUs on the computer by 2.

All threads in the pool should execute the same thread function. Generally, this thread function enters a loop after some initialization.Service Process Termination. In a loop, the thread sleep itself and waits for the completion of the device I/O Request to complete the port. This is achieved through getqueuedcompletionstatus:

 

Bool getqueuedcompletionstatus (handle hcompletionport, lpdword lpdwnumberofbytestransferred, lpdword lpdwcompletionkey, lpoverlapped * lpoverlapped, DWORD dwmilliseconds );

 

The first parameter specifies the port to be monitored by the thread. Many service applications only use one I/O completion port, and all I/O request Completion notifications are sent to this port. To put it simply, getqueuedcompletionstatus puts the calling thread into sleep until one of the I/O completion queues of the specified completion port appears or until timeout.

 

The third data structure of the I/O completion port is Waiting thread queue .

 

Dwthreadid

When a thread in the thread pool calls getqueuedcompletionstatus, the ID of the calling thread is placed in the waiting thread queue. In this way, the I/O completed port object always knows which thread is waiting for processing the completed I/O Request. When an item appears in the Completion queue, the completion port will wake up a thread in the wait queue and pass all the information through the parameters.

Pay attention to how to handle the returned results of getqueuedcompletionstatus:



Code 1 DWORD dwnumberofbytestransferred, dwcompletionkey;
2 Lpoverlapped;
3 .
4 .
5 .
6 Bool Fok = Getqueuedcompletionstatus (hiocompport,
7 & Dwnumberofbytestransferred, & Dwcompletionkey, & Lpoverlapped, 1000 );
8 DWORD dwerror = Getlasterror ();
9 If (FOK)
10 {
11 Successful
12 }
13 Else
14 {
15 If (Lpoverlapped ! = Null)
16 {
17 I / O request failed. dwerror contains error code
18 }
19 Else
20 {
21 If (Dwerror = Wait_timeout)
22 {
23 Timeout
24 }
25 Else
26 {
27 Getqueuedcompletionstatus is called incorrectly. dwerror contains the error code.
28 }
29 }
30 }

 


The table items in the I/O Completion queue are deleted in FIFO mode. However, the thread that calls getqueuedcompletionstatus is awakened in LIFO mode. The reason is to improve performance. For example, there are four threads in the thread queue. If an I/O item appears, the last thread that calls getqueuedcompletionstatus is awakened to process this item. After processing, it calls getqueuedcompletionstatus again to enter the waiting thread queue. If another I/O completion item appears, the same thread will be awakened to process this new item. As long as I/O requests are completed slowly enough so that one thread can process them, the system will always wake up the same thread, and the other three threads will continue to sleep. Use LIFOAlgorithmThe memory resources (such as stack space) of threads not scheduled can be swapped to the disk and cleared from the processor cache. This means that there are multiple threads waiting for a complete port, and there is no harm.

Now let's discuss why I/O finished ports are so useful. First, when you create an I/O completion port, you specify the number of threads that can run concurrently. As mentioned above, we should set this value to the number of CPUs on the computer. When the completed I/O entry enters the queue, the I/O completion port will wake up the waiting thread. However, to complete the port, only the specified number of threads are called. Therefore, if two I/O requests are completed and two threads are waiting for the call to getqueuedcompletionstatus, the I/O completion port only wakes up one thread, the other thread will continue to sleep. After a thread finishes processing an item, it calls getqueuedcompletionstatus again. If there are other table items to be processed, it will wake up the same thread to process the remaining table items.

 

If you think about it carefully, you will find that there is a problem: If the completion port can only allow a specified number of threads to wake up concurrently, why should there be excessive threads waiting in the thread pool?

The I/O completion port is very intelligent. When the completion port wakes up a thread, it places the thread ID in the fourth data structure associated with it-Release thread listMedium:

Dwthreadid

This enables the completion port to remember which thread it wakes up and allows it to monitor the execution of these threads. If a release thread calls a function to wait for itself, and the port detects this situation, it updates its internal data structure and moves the thread ID from the release thread listPause thread list(The last data structure of the I/O completed port ):

Dwthreadid

The goal of the completion port is to make the number of threads in the release thread list the same as the number of concurrent threads specified when it was created. If a release thread enters the waiting state for some reason, the list of released threads becomes smaller, and another waiting thread is released when the port is completed. If a paused thread wakes up, it leaves the paused thread list and re-enters the released thread list. This means that the number of threads in the release thread list may be greater than the maximum number of concurrent threads allowed.

Now let's combine these. Assume that the instance runs on a dual-CPU computer. We have created a complete port that allows up to two threads to wake up concurrently, and four threads to wait for the complete I/O Request. If there are three complete I/O requests in the port queue, only two threads wake up to process these requests. This reduces the number of runable threads and saves context switching time. Now, if the first running thread calls sleep, waitforsingleobject, and other functions that make it unable to run, I/O will immediately wake up 3rd threads after detecting the port.

In the end, the first thread will run again. This makes the number of running threads greater than the number of CPUs in the system. However, the completion port will realize this again, and will not wake up other threads until the number of threads does not exceed the number of CPUs. It is assumed that the number of running threads exceeds the maximum value for a short period of time. When the thread calls getqueuedcompletionstatus cyclically again, the number will decrease. This shows why the number of threads in the thread pool is greater than the number of concurrent threads on the completed port.

Now we will discuss the number of threads in the thread pool. First, when the service application is initialized, you need to create a minimum number of threads so that you do not have to create and release threads at runtime. Remember that it is a waste of CPU time to create and release a thread, so it is best to reduce this kind of thing. Second, you need to set the maximum number of threads, because too many threads will waste system resources.

You may want to experiment with different threads. The IIS server uses a rather complex algorithm to manage its thread pool. The maximum number of threads created by IIS is dynamic. During IIS initialization, up to 10 threads can be created for each CPU. However, based on the customer's request, this maximum value may increase. The maximum value set by IIS is twice the memory size on the computer. (* Jeffrey asked the IIS team how they got the maximum formula and was told that it was correct. You should also find a "feeling right" formula for your application .)

We have discussed how to increase the maximum number of threads in the pool. When the number changes, the new thread will not be immediately added to the pool. A new thread is created only when all threads in the pool are busy when a customer requests arrive. (Assuming that the number of existing threads is smaller than the current maximum value) IIS uses a counter to know how many threads are busy. Before getqueuedcompletionstatus is called, the counter is increased. After getqueuedcompletionstatus is returned, the counter is reduced. (* You can use the interlockedincrement and interlockeddecrement functions to achieve this)

One important thing to remember is that you should have at least one thread in the pool to accept incoming customer requests.

 

Simulate an I/O Request

Bool postqueuedcompletionstatus (handle hcompletionport, DWORD dwnumberofbytestransferred, DWORD dwcompletionkey, lpoverlapped );

This function allows you to manually add a complete I/O request to an I/O Completion queue of a completed port. This is very useful and enables you to communicate with all threads in the pool. For example, if you want to terminate a service application, you need to clean all threads out. However, if the thread waits on the completion port and no I/O request arrives, the thread will not wake up. By calling postqueuedcompletionstatus once for each thread in the pool, the thread will wake up to view the returned value of getqueuedcompletionstatus. If the application is terminated, it can be cleared and ended correctly.

be careful when using this technology. The preceding example works because all threads in the pool are terminated and getqueuedcompletionstatus is not called again. However, if you want to notify the thread of an event and let them recycle back and call getqueuedcompletionstatus, there may be problems. This is because the thread is awakened in the LIFO order. Therefore, you must use some additional thread synchronization technology in the application to ensure that each thread has the opportunity to see the simulated I/O table items. Otherwise, a thread may receive the same notifications several times.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.