The following is an excerpt from msdn I/O Completion Ports and smallfool translation. For the original article, see csdn documentation center.ArticleI/O Completion Ports,
Http://dev.csdn.net/Develop/article/29%5C29240.shtm.
I/O completion port is a mechanism through which the application Program At startup, a thread pool is created first, and the application uses the thread pool to process asynchronous I/O requests. These threads are created for the sole purpose of processing I/O requests. For applications that process a large number of concurrent asynchronous I/O requests, the use of the complete port (s) is compared to the creation of a thread when an I/O Request occurs) it can be faster and more efficient.
The createiocompletionport function associates an I/O completion port with one or more file handles. When the asynchronous I/O operations started on a file handle related to the completed port are completed, an I/O completed package will enter the queue of the completed port. For multiple file handles, this mechanism can be used to place the synchronization points of multiple file handles in a single object. (In other words, if we need to synchronize each handle file, we usually need multiple objects (such as event for synchronization ), i/O complete port is used to implement asynchronous operations. We can associate multiple files. Each time an asynchronous operation in a file is completed, a complete package is put into the queue, in this way, we can use this to synchronize all file handles)
Call the getqueuedcompletionstatus function. A thread waits for a complete package to enter the queue of the completed port, instead of waiting for the asynchronous I/O Request to complete. Threads (s) will block their running on the completion port (the queue is released in the FIFO order ). This means that when a finished package enters the queue of the finished port, the system will release the thread that has recently been blocked on the finished port.
When getqueuedcompletionstatus is called, the thread will establish a connection with a specified completion port, continuing the existence cycle of the thread, or being specified with a different completion port, or release the connection with the completion port. A thread can only be associated with up to one completion port.
The most important feature of port completion is the concurrency. The concurrency of the completed port can be specified when the completed port is created. This concurrency limit the number of runnable threads associated with the completion port. When the total number of runable threads associated with the completion port reaches this concurrency, the system will block any subsequent thread execution associated with the completion port, until the number of runable threads associated with the port is reduced to less than the concurrent threads. The most effective assumption is that there is a finished packet waiting in the queue, but not waiting to be satisfied, because at this time the finished port reaches the limit of its concurrency. At this time, when a running thread calls getqueuedcompletionstatus, it immediately removes the completion package from the queue. In this way, there is no environment switch, because the running thread will take the complete package from the queue continuously and other threads will not be able to run.
The best choice for concurrency is the number of CPUs in your computer. If your transaction processing takes a long computing time, a large amount of concurrency can allow more threads to run. Although it takes longer to complete each transaction, more transactions can be processed at the same time. For applications, it is easy to obtain the best results by testing the concurrency.
The postqueuedcompletionstatus function allows applications to queue custom dedicated I/O completion packages without starting an asynchronous I/O operation. This is useful for the worker thread that notifies external events.
When there are no more references for a completed port, you need to release the completed port. The completed port handle and all file handles associated with the completed port need to be released. Call closehandle to release the handle of the completed port.
The followingCodeA simple thread pool is created using Io to complete the port.
/************************************ **********************************/
/* test iocompleteport. */
/********************************** ***********************************/< /P>
DWORD winapi iocpworkthread (pvoid pparam)
{< br> handle completeport = (handle) pparam;
pvoid userparam;
work_item_proc userproc;
lpoverlapped poverlapped;
for (;)
{< br> bool Bret = getqueuedcompletionstatus (
completeport,
(lpdword) & userparam,
(lpdword) & userproc,
& poverlapped,
infinite);
_ Assert (BRET );
If (userproc = NULL) // quit signal.
Break;
// Execute user's Proc.
Userproc (userparam );
}
Return 0;
}
Void testiocompleteport (bool bwaitmode, long threadnum)
{
Handle completeport;
Overlapped = {0, 0, 0, 0, null };
Completeport = createiocompletionport (
Invalid_handle_value,
Null,
Null,
0 );
// Create threads.
For (INT I = 0; I <threadnum; I ++)
{
Handle hthread = createthread (null,
0,
Iocpworkthread,
Completeport,
0,
Null );
Closehandle (hthread );
}
Completeevent = createevent (null, false, false, null );
Begintime = gettickcount ();
Itemcount = 20;
for (I = 0; I <20; I ++)
{< br> postqueuedcompletionstatus (
completeport,
(DWORD) bwaitmode,
(DWORD) userproc1,
& overlapped);
}< br>
waitforsingleobject (completeevent, infinite);
closehandle (completeevent );
// Destroy all threads.
For (I = 0; I <threadnum; I ++)
{
Postqueuedcompletionstatus (
Completeport,
Null,
Null,
& Overlapped );
}
Sleep (1000); // wait all thread exit.
Closehandle (completeport );
}
Bibliography
1. msdn Library
2. Windows Advanced Programming Guide
3. Windows core programming
4. Windows 2000 Device Driver Design Guide
Asynchronous Io, APC, Io completion port, thread pool and high-performance server four Thread Pool
Thread Pool
The following is an excerpt from msdn thread pooling.
Threads created by many applications spend a lot of time in sleep to wait for an event. Some threads are regularly woken up after they enter the sleep state to change or update status information in polling mode. The thread pool allows you to use threads more effectively. It provides a worker thread pool managed by the system for your applications. There should be at least one thread to listen for all the pending operations placed in the thread pool. After the wait operation is completed, a worker thread will be in the thread pool to execute the corresponding callback function.
You can also put a work project that is not waiting for an operation in the thread pool, use the queueuserworkitem function to complete the work, and pass the work project function to the thread pool through a parameter. After a work item is put in the thread pool, it cannot be canceled.
Timer-queue timers and registered wait operations are also implemented using thread pools. Their callback functions are also placed in the thread pool. You can also use the bindiocompletioncallback function to deliver an asynchronous Io operation. On the IO completion port, the callback function is also executed by the thread pool thread.
When the queueuserworkitem function or bindiocompletioncallback function is called for the first time, the thread pool is automatically created, or when timer-queue timers or registered wait operations are placed in the callback function, the thread pool can also be created. The number of threads that can be created by the thread pool is limited by the available memory. Each thread uses the default initial stack size and runs on the default priority.
There are two types of threads in the thread pool: Io thread and non-io thread. The IO thread is waiting in the alarm status, and the work item is put into the IO thread as the APC. If your work project requires thread execution in the warning state, you should put it in the IO thread.
Non-I/O worker threads are waiting on the I/O completion port. Using Non-I/O threads is more efficient than I/O threads. That is to say, if possible, try to use non-I/O threads. Io threads and non-io threads do not exit until asynchronous Io operations are completed. However, do not send asynchronous IO requests that take a long time in non-io threads.
The correct method to use the thread pool is that the work project function and all functions it will call must be thread pool safe. Secure functions should not assume that the thread is a one-time thread or a permanent thread. Generally, you should avoid local thread storage and asynchronous Io calls that require permanent threads, such as the regpolicychangekeyvalue function. To execute such a function in a permanent thread, you can pass the wt_executeinpersistentthread option to queueuserworkitem.
Note that the thread pool is not compatible with the single-thread Suite (STA) model of COM.
To better explain the superiority of the thread pool implemented by the operating system, we first try to implement a simple thread pool model.
The Code is as follows:
/*************************************** *********************************/
/* Test our own thread pool .*/
/*************************************** *********************************/
Typedef struct _ thread_pool
{
Handle quitevent;
Handle workitemsemaphore;
Long workitemcount;
List_entry workitemheader;
Critical_section workitemlock;
Long threadnum;
Handle * threadsarray;
} Thread_pool, * pthread_pool;
Typedef void (* work_item_proc) (pvoid PARAM );
Typedef struct _ work_item
{
List_entry list;
Work_item_proc userproc;
Pvoid userparam;
} Work_item, * pwork_item;
DWORD winapi workerthread (pvoid pparam)
{
Pthread_pool pthreadpool = (pthread_pool) pparam;
Handle events [2];
Events [0] = pthreadpool-> quitevent;
Events [1] = pthreadpool-> workitemsemaphore;
For (;;)
{
DWORD dwret = waitformultipleobjects (2, events, false, infinite );
If (dwret = wait_object_0)
Break;
//
// Execute user's Proc.
//
Else if (dwret = wait_object_0 + 1)
{
Pwork_item pworkitem;
Plist_entry plist;
Entercriticalsection (& pthreadpool-> workitemlock );
_ Assert (! Islistempty (& pthreadpool-> workitemheader ));
Plist = removeheadlist (& pthreadpool-> workitemheader );
Leavecriticalsection (& pthreadpool-> workitemlock );
Pworkitem = containing_record (plist, work_item, list );
Pworkitem-> userproc (pworkitem-> userparam );
Interlockeddecrement (& pthreadpool-> workitemcount );
Free (pworkitem );
}
Else
{
_ Assert (0 );
Break;
}
}
Return 0;
}
Bool initializethreadpool (pthread_pool pthreadpool, long threadnum)
{
Pthreadpool-> quitevent = createevent (null, true, false, null );
Pthreadpool-> workitemsemaphore = createsemaphore (null, 0, 0x7fffffff, null );
Pthreadpool-> workitemcount = 0;
Initializelisthead (& pthreadpool-> workitemheader );
Initializecriticalsection (& pthreadpool-> workitemlock );
Pthreadpool-> threadnum = threadnum;
Pthreadpool-> threadsarray = (handle *) malloc (sizeof (handle) * threadnum );
For (INT I = 0; I <threadnum; I ++)
{
Pthreadpool-> threadsarray [I] = createthread (null, 0, workerthread, pthreadpool, 0, null );
}
Return true;
}
Void destroythreadpool (pthread_pool pthreadpool)
{
Setevent (pthreadpool-> quitevent );
For (INT I = 0; I <pthreadpool-> threadnum; I ++)
{
Waitforsingleobject (pthreadpool-> threadsarray [I], infinite );
Closehandle (pthreadpool-> threadsarray [I]);
}
Free (pthreadpool-> threadsarray );
Closehandle (pthreadpool-> quitevent );
Closehandle (pthreadpool-> workitemsemaphore );
Deletecriticalsection (& pthreadpool-> workitemlock );
while (! Islistempty (& pthreadpool-> workitemheader)
{< br> pwork_item pworkitem;
plist_entry plist;
plist = removeheadlist (& pthreadpool-> workitemheader);
pworkitem = containing_record (plist, work_item, list );
free (pworkitem);
}< BR >}
Bool postworkitem (pthread_pool pthreadpool, work_item_proc userproc, pvoid userparam)
{
Pwork_item pworkitem = (pwork_item) malloc (sizeof (work_item ));
If (pworkitem = NULL)
Return false;
Pworkitem-> userproc = userproc;
Pworkitem-> userparam = userparam;
Entercriticalsection (& pthreadpool-> workitemlock );
Inserttaillist (& pthreadpool-> workitemheader, & pworkitem-> list );
Leavecriticalsection (& pthreadpool-> workitemlock );
Interlockedincrement (& pthreadpool-> workitemcount );
Releasesemaphore (pthreadpool-> workitemsemaphore, 1, null );
Return true;
}
Void userproc1 (pvoid dwparam)
{
Workitem (dwparam );
}
Void testsimplethreadpool (bool bwaitmode, long threadnum)
{
Thread_pool threadpool;
Initializethreadpool (& threadpool, threadnum );
Completeevent = createevent (null, false, false, null );
Begintime = gettickcount ();
Itemcount = 20;
For (INT I = 0; I <20; I ++)
{
Postworkitem (& threadpool, userproc1, (pvoid) bwaitmode );
}
Waitforsingleobject (completeevent, infinite );
Closehandle (completeevent );
Destroythreadpool (& threadpool );
}
We put the work item in a queue and use a semaphore to notify the thread pool. Any thread in the thread pool fetches the work item for execution. After the execution is complete, the thread returns to the thread pool, wait for a new work item.
The number of threads in the thread pool is fixed. Pre-created and permanent threads are destroyed until the thread pool is destroyed.
Threads in the thread pool have equal and Random Access to work projects, and there is no special way to ensure that a thread has special priority in obtaining work project opportunities.
Moreover, the number of threads that can run concurrently at the same time is not limited. In fact, in our demo code for executing computing tasks, all threads are concurrently executed.
Next, let's take a look at how the thread pool provided by the system operates to complete the same task.
/*************************************** *********************************/
/* Queueworkitem test .*/
/*************************************** *********************************/
DWORD begintime;
Long itemcount;
Handle completeevent;
Int compute ()
{
Srand (begintime );
For (INT I = 0; I <20*1000*1000; I ++)
Rand ();
Return rand ();
}
DWORD winapi workitem (lpvoid lpparameter)
{
Bool bwaitmode = (bool) lpparameter;
If (bwaitmode)
Sleep (1000 );
Else
Compute ();
If (interlockeddecrement (& itemcount) = 0)
{
Printf ("time Total % d Second. \ n", gettickcount ()-begintime );
Setevent (completeevent );
}
Return 0;
}
Void testworkitem (bool bwaitmode, DWORD flag)
{
Completeevent = createevent (null, false, false, null );
Begintime = gettickcount ();
Itemcount = 20;
For (INT I = 0; I <20; I ++)
{
Queueuserworkitem (workitem, (pvoid) bwaitmode, flag );
}
Waitforsingleobject (completeevent, infinite );
Closehandle (completeevent );
}
Very simple, right? We only need to focus on our callback functions. However, compared with our simple simulation, the thread pool provided by the system has more advantages.
First, the number of threads in the thread pool is dynamically adjusted. Second, the thread pool uses the IO to complete the port feature, which can limit the number of concurrent running threads. By default, the number of CPUs is limited, which can reduce thread switching. It selects the threads that have been executed recently and puts them into execution again, thus avoiding unnecessary thread switching.
The policy behind the thread pool provided by the system will be discussed in the next section.
Bibliography
1. msdn Library
2. Windows Advanced Programming Guide
3. Windows core programming
4. Windows 2000 Device Driver Design Guide
Body
Performance indicators and high-performance channels for asynchronous Io, APC, Io Completion Ports, thread pools, and high-performance servers
Server performance indicators
As a network server program, performance is always the first indicator. Performance can be defined as the number of tasks that can be processed in a given hardware condition and time. The server design that maximizes the use of hardware performance is a good design.
The average service should also be considered for well-designed servers. For each client, the server should provide the average service to each client, A client cannot be left out of service for a long period of time, resulting in "hunger.
Scalability. That is to say, as the hardware capability increases, the server performance will grow linearly.
High Performance
The computing of an actual server is very complicated, often mixed with I/O computing and CPU computing. Io computing refers to the computing model in which I/O is the main component, such as the file server and mail server. It combines a large number of network I/O and file I/O; CPU computing refers to the absence or absence of IO in computing tasks, such as encryption/decryption, encoding/decoding, and mathematical computing.
In CPU computing, the single-line and multi-thread models have the same effect. "In a single processor computer, the concurrent execution speed of CPU-based tasks cannot be faster than that of serial execution, but we can see that, the extra overhead for Windows NT offline creation and switching is very small; for very short computing, concurrent execution is only 10% slower than serial execution, and with the increase of the computing length, these two times are very close."
It can be seen that for pure CPU computing, the multi-threaded model is not suitable if there is only one CPU. Consider a service that executes intensive CPU computing. If dozens of such threads run concurrently, frequent Task Switching leads to unnecessary performance loss.
In programming implementation, the single-thread model computing model is inconvenient for Server programming. Therefore, it is more appropriate to use the thread pool working model for CPU computing. The queueuserworkitem function is very suitable for placing a CPU computing into a thread pool. Thread Pool implementation will try to reduce unnecessary thread switching and control the number of concurrent threads to the number of CPUs.
What we really need to care about is Io computing. Generally, network server programs are accompanied by a large amount of Io computing. The way to improve performance is to avoid waiting for the end of Io, resulting in idle CPU. Try to use hardware capabilities to allow one or more Io devices to run concurrently with the CPU. The asynchronous Io, APC, and Io Completion Ports described above can achieve this goal.
For network servers, if the number of concurrent requests on the client is relatively small, a simple multi-threaded model can be used. If a thread is suspended because it waits for the completion of the I/O operation, the operating system will schedule another ready thread to run, resulting in concurrent execution. Classic Network Server logic mostly adopts multi-thread/multi-process mode. When a client initiates a connection to the server, the server will create a thread, let this new thread handle subsequent transactions. This programming method that represents a client object with a special thread/process is very intuitive and easy to understand.
For large network server programs, this method has limitations. First, the cost of creating a thread/process and destroying a thread/process is very high, especially when the server uses TCP "short connection" or UDP communication, for example, in HTTP, the client initiates a connection and sends a request. After the server responds to the request, the connection is closed. If the HTTP server is designed in the Classic mode, too frequent thread creation/destruction will have a bad impact on the performance.
Second, even if a TCP "persistent connection" is adopted in a protocol, the client keeps this connection after connecting to the server. The classic design method also has disadvantages. If the client has a large number of concurrent requests and many clients wait for the server to respond at the same time, too many threads will be executed concurrently. Frequent thread switching will use part of the computing power. In fact, if the number of concurrent threads is too large, physical memory is often used up too early, and most of the time is spent on thread switching, because thread switching will also cause memory paging. Eventually, the server performance drops sharply,
The thread pool is the only solution for a network server that needs to handle concurrent requests from a large number of clients at the same time. The thread pool can not only avoid frequent thread creation and thread destruction, but also process a large number of concurrent client requests with a small number of threads.
It is worth noting that we do not recommend any of the above techniques for designing a low-pressure network server program. It is unwise and stupid to make things complicated when a simple design can complete the task.