In-depth analysis of overlapped I/O models

Source: Internet
Author: User
Tags readfile
Summary: overlapped I/O, also known as asynchronous I/O, is an asynchronous I/O model. Asynchronous I/O is different from Synchronous I/O, Program Suspended until I/O processing is complete. Asynchronous I/O: calls a function to tell the OS to perform I/O operations. After I/O is completed, the system returns immediately, send you a notification message. Overlapped I/O is just a model. It can be composed of kernel objects (hand), event kernel objects (hevent), asynchronous process calls (APCs), and Completion Ports (I/O completion). The purpose of overlapped I/O design is to replace the multi-thread function (multithreading has a synchronization mechanism and error processing, among the thousands of threads I/O, thread context switching consumes a lot of CPU resources ). The overlapped I/O model allows the OS to transmit data for you, complete Context switching, and notify you after processing. Changes from processing in a program to operating system processing. It is also processed by threads internally. Overlapped Data Structure: typedef struct _ overlapped {DWORD internal; usually retained. When getoverlappedresult () returns false and gatlasterror () does not return error_io_pendino, this state is set to the system state. DWORD internalhigh; usually retained. When getoverlappedresult () returns false, it is the length of the transmitted data. DWORD offset; Specifies the location of the file from which data is transmitted. The file location starts with the relative file. The Byte offset. Before calling the readfile or writefile function The process is called to ignore this when you read and write the named pipe and communication device. Members; DWORD offsethigh; Specifies the High-Level characters of the byte offset for starting data transmission, read and write the named pipe, and pass Call the process to ignore this Member when sending a message to the device; Handle hevent; Identifies an event. When data transmission is complete, it is set to the signal status and readfile is called. Writefile connectnamedpipe transactnamedpipe Function Before, call the process to set this member. Related functions Createevent resetevent getoverlappedresult Waitforsingleobject cwinthread getlasterror } Overlapped, * lpoverlapped; two important functions: 1. Identify each operation that is being overlapped. 2. shared areas are provided between programs and systems. Parameters can be transmitted in two directions within the region. Overlapped and data buffer release problems: in a request, it cannot be released. It can be released only after the I/O request is complete. If multiple overlapped requests are sent, each overlapped read/write operation must contain a file location (socket). In addition, if multiple disks exist, the I/O execution order cannot be guaranteed. (Each overlapped is an independent request operation ). Kernel Object (hand) Implementation: Example: Use the overlapped model to read a disk file. 1. The device handle is treated as a synchronization object, and readfile sets the device handle to no signal. Readfile asynchronous I/O bytes must be specified in the overlapped structure. 2. Complete I/O and set the information status. Indicates a signal. 3. waitforsingleobject or waitformultipleobject judgment or asynchronous device calls the getoverlappedresult function. Int main () {bool RC; handle hfile; DWORD numread; overlapped overlap; char Buf [read_size]; char szpath [max_path]; checkosversion (); getwindowsdirectory (szpath, sizeof (szpath); strcat (szpath, "\ winhlp32.exe"); hfile = createfile (szpath, generic_read, file_cmd_read | file_cmd_write, null, open_existing, empty, null ); if (hfile = invalid_handle_value) {printf ("cocould not open % s \ n", szpath); Return-1 ;}memset (& overlap, 0, sizeof (overlap); overlap. offset = 1500; rc = readfile (hfile, Buf, read_size, & numread, & overlap); printf ("issued Read Request \ n"); If (RC) {printf ("request was returned immediately \ n");} else {If (getlasterror () = error_io_pending) {printf ("request queued, waiting... \ n "); waitforsingleobject (hfile, infinite); printf (" request completed. \ n "); rc = getoverlappedresult (hfile, & overlap, & numread, false); printf (" result was % d \ n ", RC );} else {printf ("error reading file \ n") ;}} closehandle (hfile); Return exit_success ;} Event Kernel Object (hevent ): Kernel Object (hand) implementation problems: the overlapped operation cannot be distinguished. For handle of the same file, when the system has multiple asynchronous operations (one side reads the file header and one side writes the end of the file, if one operation is completed, a signal is generated, which cannot be distinguished .), It is difficult to call getoverlappedresult for each overlapped in progress. Implementation Scheme of the event Kernel Object (hevent): overlapped member heven identifies the event kernel object. Createevent: creates an event for each request and initializes the hevent member of each request (multiple read/write requests to the same file, and one event object is bound to each operation ). Call waitformultipleobject and so on (or all. In addition, the event object must be manually reset. Use automatic reset (set before waiting for event, Waitforsingleobject () And the waitformultipleobjects () function will never return ). Auto reset event Waitforsingleobject () And waitformultipleobjects () will wait for the event to signal state, and then automatically reset it to non-signal state, so that only one thread waiting for this event will be awakened. Manual reset event you must call resetevent () to reset the event. Several threads may be waiting for the same event, so that when the event changes to the signal state, all the waiting threads can run. The setevent () function is used to set the event object to the signal state, and the resetevent () function resets the event object to the non-signal state. Both of them must use the event object handle as a parameter. Example: int main () {int I; bool RC; char szpath [max_path]; checkosversion (); getwindowsdirectory (szpath, sizeof (szpath); strcat (szpath, "\ winhlp32.exe"); ghfile = createfile (szpath, generic_read, file_cmd_read | file_cmd_write, null, open_existing, handle, null); If (ghfile = invalid_handle_value) {printf ("cocould not open % s \ n", szpath); Return-1 ;}for (I = 0; I <max_requests; I + +) {Queuerequest (I, I * 16384, read_size);} printf ("queued !! \ N "); mtverify (waitformultipleobjects (max_requests, ghevents, true, infinite )! = Wait_failed); for (I = 0; I <max_requests; I ++) {DWORD dwnumread; rc = getoverlappedresult (ghfile, & goverlapped [I], & dwnumread, false ); printf ("read # % d returned % d. % d bytes were read. \ n ", I, RC, dwnumread); closehandle (goverlapped [I]. hevent);} closehandle (ghfile); Return exit_success;} int queuerequest (INT nindex, DWORD dwlocation, DWORD dwamount) {int I; bool RC; DWORD dwnumread; DWORD err; Mtverify (ghevents [nindex] = createevent (null, // No security true, // manual reset-extremely important! False, // initially set event to non-signaled state null // No Name); goverlapped [nindex]. hevent = ghevents [nindex]; goverlapped [nindex]. offset = dwlocation; for (I = 0; I <max_try_count; I ++) {rc = readfile (ghfile, gbuffers [nindex], dwamount, & dwnumread, & goverlapped [nindex]); If (RC) {printf ("read # % d completed immediately. \ n ", nindex); Return true;} err = getlasterror (); If (ERR = error_io_pending) {// asynchronous I/O is still in progress printf ("read # % d queued for overlapped I/O. \ n ", nindex); Return true;} If (ERR = error_invalid_user_buffer | err = error_not_enough_quota | err = error_not_enough_memory) {sleep (50 ); // wait around and try later continue;} break;} printf ("readfile failed. \ n "); Return-1 ;} Asynchronous process call (APCs ): Problem with the event Kernel Object (hevent): When the event kernel object uses waitformultipleobjects, it can only wait for 64 objects. You need to create two other data groups and bind them with goverlapped [nindex]. hevent = ghevents [nindex. Asynchronous process call (APCs) Implementation Scheme: asynchronous process call, callback function, the system calls this callback function after an overlapped I/O is completed. The OS calls the callback function only when there is a signal (device handle) (many APCs may be waiting for processing) and sends it the error code for completing the I/O Request, the address of the number of bytes transmitted and the overlapped structure. Five functions can set the Signal Status: 1. sleepex2. waitforsingleobjectex3. wait until the main function calls waitforsingleobjectex. APCs is processed. Call the callback function fileiocompletionroutine void winapi fileiocompletionroutine (DWORD dwerrorcode, // completion code DWORD complete, // number of bytes transferred lpoverlapped // pointer to structure with I/O informa Tion) {int nindex = (INT) (lpoverlapped-> hevent); printf ("read # % d returned % d. % d bytes were read. \ n ", nindex, dwerrorcode, dwnumberofbytestransfered); If (++ ncompletioncount = max_requests) setevent (ghevent); // cause the wait to terminate} int main () {int I; char szpath [max_path]; checkosversion (); mtverify (ghevent = createevent (null, // No security true, // manual reset-extremely importan T! False, // initially set event to non-signaled state null // No Name); getwindowsdirectory (szpath, sizeof (szpath); strcat (szpath, "\ winhlp32.exe"); ghfile = createfile (szpath, generic_read, file_cmd_read | file_cmd_write, null, open_existing, handle, null); If (ghfile = invalid_handle_value) {printf ("cocould not open % s \ n", szpath); Return-1 ;}for (I = 0; I <max_requests; I ++) {Queuerequest (I, I * 16384, read_size);} printf ("queued !! \ N "); For (;) {dword rc; rc = waitforsingleobjectex (ghevent, infinite, true); If (rc = wait_object_0) break; mtverify (rc = wait_io_completion);} closehandle (ghfile); Return exit_success;} int queuerequest (INT nindex, DWORD dwlocation, DWORD dwamount) {int I; bool RC; DWORD err; goverlapped [nindex]. hevent = (handle) nindex; goverlapped [nindex]. offset = dwlocation; for (I = 0; I <max_try_count; I ++) {rc = readfileex (ghfile, gbuffers [nindex], dwamount, & goverlapped [nindex], fileiocompletionroutine); If (RC) {printf ("read # % d queued for overlapped I/O. \ n ", nindex); Return true;} err = getlasterror (); If (ERR = error_invalid_user_buffer | err = role | err = error_not_enough_memory) {sleep (50); // wait around und and try later continue;} break;} printf ("readfileex failed. \ n "); Return-1 ;} Complete port (I/O completion ): Asynchronous Process calling (APCs): Only the thread that sends the overlapped request can provide the callback function (a specific thread is required to serve a specific I/O request ). Advantage of completing port (I/O completion): The number of handle is not limited, and thousands of connections can be processed. I/O completion port allows one thread to temporarily save a request and the other thread will provide actual services for it. Concurrency model and thread pool: in a typical concurrency model, the server creates a thread for each client. If many clients send requests at the same time, these threads run, and the CPU switches one by one, the CPU has spent more time in thread switching, and the thread does not get much CPU time. In the end, how many threads should be created is more appropriate. In Microsoft's help document, there should be two CPUs. But under ideal conditions, it is best not to switch the thread, but to reuse it like a thread pool. The I/O completion port uses the thread pool. Understanding and use: Step 1: Before using the complete port, call the createiocompletionport function to create a port object. Definition:
Handle createiocompletionport (
Handle filehandle,
Handle existingcompletionport,
DWORD completionkey,
DWORD numberofconcurrentthreads
);
Filehandle: the handle of a file or device. If the value is invalid_handle_value, a port is generated that is not related to any file handle. (Various handles, files, sockets that can be used to complete port contact) existingcompletionport: NULL generates a new port; otherwise, handle is added to this port. Completionkey: the user-defined value that is handed over to the service thread. When using the getqueuedcompletionstatus function, we can obtain the completion key (the memory block applied for) in the contact function ). In getqueuedcompletionstatus, you can obtain the memory block without stopping it and use it. Numberofconcurrentthreads: The numberofconcurrentthreads parameter is used to specify the number of threads that can be concurrently on a completion port. Ideally, only one thread is run on a processor, which can avoid overhead of thread context switching. If the value of this parameter is 0, it indicates that the number of threads in the system is the same as the number of processors. We can use the following Code To create an I/O complete port. The secret behind the port creation is as follows: 1. create a complete port createiocompletionport (invalid_handle_value, 0, 0, dwnumberofconcurrentthreads); 2. device List. The port is associated with one or more devices. Createiocompletionport (hdevice, hcompport, dwcompkey, 0 ); Step 2: Based on the number of processors, create two worker threads (createthread (null, 0, serverworkerthread, completionport, 0, & threadid) for the CPU. At the same time, the server calls wsasocket, bind, listen, wsaaccept, then, call createiocompletionport (handle) Accept, completionport ...) bind a socket handle with a complete port. The port is associated with one or more devices. Therefore, based on sockets, the port is sent and requested for I/O processing. Then, you can rely on the completion port to receive notifications about the completion of I/O operations. Check the program again: wsarecv (accept, & (periodata-> databuf), 1, & recvbytes, & flags, & (periodata-> overlapped), null) start to call, as mentioned above, since it is asynchronous I/O, the wsasend and wsarecv calls will return immediately. System processing: After a device's asynchronous I/O request is complete, the system checks whether the device is associated with a complete port. If yes, the system adds the completed I/O Request column to the I/O Completion queue of the completed port. Then, we need to retrieve the call result from the Completion queue (A overlapped structure is required to receive the call result ). How can I know that the processed result already exists in this queue? Call the getqueuedcompletionstatus function. Working thread and completion port: different from Asynchronous process call (the system calls this callback function after an overlapped I/O is completed. The OS calls the callback function only when there is a signal (device handle) (many APCs may be waiting for processing). getqueuedcompletionstatus calls the getqueuedcompletionstatus function within the working thread. Getqueuedcompletionstatus (handle completionport, lpdword complete, lpdword lpcompletionkey, lpoverlapped * lpoverlapped, DWORD dwmilliseconds); completionport: Specifies the completion port to be monitored by the thread. Many service applications only use an I/O port. All notifications after I/O requests are completed are sent to this port. Lpnumberofbytestransferred: Number of transmitted data bytes lpcompletionkey: the single-handle Data Pointer of the completed port. This pointer will obtain the memory we applied for in createiocompletionport. Lpoverlapped: overlapping I/O Request structure. This structure also points to the memory block we applied for when overlapping requests, and at the same time with lpcompletionkey, we can also use this memory block to store any data we want to save. Dwmilliseconds: The maximum waiting time (millisecond). If timeout occurs, lpoverlapped is set to null, and the function returns false. the getqueuedcompletionstatus function and the hidden secret: getqueuedcompletionstatus suspends the call thread until one or more messages appear in the I/O Completion queue of the specified port expires. (Records appear in the I/0 Completion queue) ID of the calling thread when getqueuedcompletionstatus is called (CPU * 2 threads, thread ID of each serverworkerthread) it is put into the waiting thread queue. The waiting thread queue is simple, but the IDs of these threads are saved. The completion port puts the ID of a thread queue in the release thread list according to the principle of "back-in-first-out. In this way, the kernel object of the I/O completed port will know which threads are waiting for processing completed I/O requests. When one entry appears in the I/O Completion queue of the port, the completion port will wake up (the sleep state becomes the scheduling state) and wait for a thread in the thread queue. The thread will get the information in the completed I/O items: the number of bytes transmitted, the completion key (Single Handle data structure) and the overlapped structure address. The thread returns this information through getqueuedcompletionstatus, wait for CPU scheduling. Getqueuedcompletionstatus may be returned for multiple reasons. If the pass fails, the function returns false, and getlasterror returns an error (error_invalid_handle). If the call times out, false is returned. getlasterror returns wait_timeout, if an I/O completed queue is deleted, and the table is a successfully completed I/O Request, true is returned. The thread that calls getqueuedcompletionstatus is wake-up in the FIFO mode. For example, there are four threads waiting. If there is an I/O, the last thread that calls getqueuedcompletionstatus is woken up for processing. After processing, call getqueuedcompletionstatus to enter the waiting thread queue. In-depth analysis of the principle of completing the scheduling of the port thread pool: Suppose we are running on a 2-CPU machine. Specify two concurrent threads when creating the completed port. create four working threads to join the thread pool and wait for the I/O request to be completed. Complete the port Queue (first in first out) there are three requests to complete I/O: the working thread is running and four working threads are created. When getqueuedcompletionstatus is called, The Calling thread enters the sleep state. If at this time, there are three items in the I/O Completion queue, and the ID of the calling thread is put in the queue of the waiting thread ,():

Waiting thread Queue (post-in-first-out)

Queue

Outbound queue

Thread

Thread B

Thread C

Thread d

I/O completion port Kernel Object (3rd parameter level thread Queue), so you know which threads are waiting for processing completed I/O requests. When one entry appears in the I/O Completion queue of the port, the completed port will wake up (the sleep status will change to the scheduling status) waiting for a thread in the thread Queue (as mentioned earlier, waiting for the thread queue is post-import, first-out ). So thread D will get the information in completed I/O items: the number of bytes transmitted, the completion key (Single Handle data structure) and the overlapped structure address. The thread returns this information through getqueuedcompletionstatus. We have previously specified that the number of concurrent threads is 2, so the I/O completion port wakes up two threads, thread D and thread C, and the other two continue to sleep (thread B, thread a), until thread D finishes processing and finds that there are still items to be processed in the table item, wake up the same thread to continue processing.

Waiting thread Queue (post-in-first-out)

Queue

Outbound queue

Thread

Thread B

Release thread queue

Thread C

Thread d

Thread concurrency: The concurrency limit the number of runable threads associated with the completion port, which is similar to the valve. When the total number of runable threads associated with the completion port reaches this concurrency, the system will block any subsequent thread execution associated with the completion port, until the number of runable threads associated with the port is reduced to less than the number of concurrent threads. This explains why the running threads in the thread pool may be more than the set concurrent threads. Its role: the most effective assumption is that a complete packet is waiting in the queue, but not waiting to be satisfied, because the completion port reaches the limit of its concurrency. At this time, when a running thread calls getqueuedcompletionstatus, it immediately removes the completion package from the queue. In this way, there is no environment switch, because the running thread will continuously remove the complete package from the queue, and other threads will not be able to run. . NOTE: If all threads in the pool are busy, the customer request may be rejected. Therefore, adjust this parameter to achieve optimal performance. Thread concurrency: D thread hangs, joins the pause thread, and then joins the release thread queue after waking up.

Thread C

Thread B

Thread

Outbound queue

Queue

Waiting thread Queue (post-in-first-out)

Release thread queue

Pause a thread

Thread d

Secure exit of the thread: postqueudcompletionstatus function, which can be used to send a custom structure address containing the overlapped member variable, which contains a status variable. When the status variable is the exit flag, the thread executes the clear action and then exits. Note: 1. Before wsasend and wsarecv operations, clear the overlapped struct using memset.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.