Complete port and high-performance server program development
Email: kruglinski_at_gmail_dot_com
Blog: kruglinski.blogchina.com
As early as two years ago, I was very skilled in using the port technology, but I never had the opportunity to use it in any project. During this time, I saw this technology being over-hyped, if it is too mysterious, I want to write an article explaining how it works. I want to tell you that it is not as profound as the legend has ever said! If you have any errors, please correct them. For more information, please indicate the source and author. Thank you!
Take a file transfer server as an example. On my machine, it can provide file download services for many clients at the same time with only two threads, the performance of the Program increases linearly with the number of CPUs in the random machine. I try to make it clear and easy to understand. Although the program is small, it uses some new features of NT 5, overlapping Io, when the port and thread pool are completed, the server program based on this model should have the best performance on the NT System.
First. as the basis for port completion, we should understand overlapping Io, which requires you to understand the concepts of kernel objects and operating systems, what are signal/non-signal states, what is the wait function, what is the side effect of success wait, what is thread suspension, etc. If these general commands are not understood, you should first take a look at the windows core programming related content. if you have understood this, overlapping Io is not difficult for you.
You can think of overlapping IO in this way. Now you have entered a server/client environment. Please do not confuse the concept. Here the server refers to the operating system, the client refers to your program (which performs Io operations). When you perform Io operations (send, Recv, writefile, readfile ....) when you send an IO request to the server (operating system), the server completes the operations you need, and then you have nothing to do, when the server completes the IO request, it will notify you, of course, you can do anything during this period, a common technique is after sending overlapping IO requests, in a loop, the program calls the update interface of peekmessage, translatemessage, and dispatchmessage, and CALLS getoverlappedresult to wait for the server to complete the IO operation, more efficiently, I/O completion routines are used to process the results returned by the server (operating system). However, not every function that supports overlapping Io operations supports completion routines, such as the transmitfile function.
Example 1. One overlapping write operation (getoverlappedresult method ):
1. Fill in an overlapped Structure
2. Perform a write operation and specify the overlapping operation parameters (the pointer to the overlapped Structure Variable above)
3. do other things (such as updating the interface)
4. getoverlappedresult
5. If the IO request is not completed and there is no error, return to stage 3.
6. Process Io operation results
Example 2. One overlapping write operation (completing the routine method ):
1. Fill in an overlapped Structure
2. Perform a write operation, specify the overlapping operation parameters (the pointer to the overlapped Structure Variable above), and specify the completion routine
3. do other things (such as updating the interface)
4. When the completion routine is called, it indicates that the IO operation has been completed or encountered an error. Now you can process the operation result.
If you have understood the above concepts, it will be very close to the IO completion port. Of course, this is just a common overlapping operation and it is very efficient, however, it will be very complicated to combine multiple threads to perform overlapping Io operations on a file or socket. It is usually difficult for programmers to grasp this complexity. the completed port is designed to give full play to the performance of combining multithreading and overlapping Io operations. many people say it is complicated, in fact, if you implement a multi-threaded program that performs overlapping Io operations on a file or socket (note that multiple threads perform overlapping Io operations on a handle or socket, rather than starting a thread to perform overlapping Io operations on a handle, you will find that the completion of the port actually simplifies the complexity of using overlapping IO in multiple threads, and the performance is higher, where is the performance High? The following is a description.
We may have written such a server program:
Example 3. Main Program:
1. Listen to a port
2. Waiting for connection
3. When there is a connection
4. Start a thread to process the client.
5. Return to 2
Service thread:
1. Read client requests
2. If the client no longer has a request, run
3. Process requests
4. return operation results
5. Return to 1
6. Exit the thread
This is the simplest network server model. Let's optimize it.
Example 4. Main Program:
1. Open a thread pool with the maximum number of threads that the machine can afford, and the threads are all suspended (suspend ).
1. Listen to a port
2. Waiting for connection
3. When there is a connection
4. A resume thread in the thread pool processes the client.
5. Return to 2
The service thread is the same as that in the example 3 model. It only returns to the thread pool after the thread finishes processing all client requests, instead of exiting. it suspends again to give up the CPU time and waits for the next client to serve. of course, during this period, the thread suspends itself because of the IO operation (the operation of the service thread, maybe other blocking operations), but does not return to the thread pool, that is, it can only serve one client at a time.
This may be the most efficient server model you can think! Compared with the first server model, it has many less user-State context switches to the kernel-state, reflecting more quickly. Maybe you think this is insignificant, this shows that you lack the knowledge of large-scale high-performance server programs (such as online game servers). What if your server programs want to serve tens of millions of clients? This is why the Microsoft Windows NT development team adds a thread pool to systems above NT 5.
Think about what kind of model can make one thread serve multiple clients! It is necessary to jump out of a fixed Thinking Mode for every connection startup thread to serve it, we split the minimum unit of the thread service into separate read or write operations (note that reading or writing is not a read or write operation), rather than all the read and write operations during a client's disconnection from the client. each thread uses overlapping Io for read/write operations. After a read/write request is shipped, the thread returns to the thread pool and waits for it to serve other clients. when the operation is completed or fails, the thread returns the processing result, then return to the thread pool.
Take a look at this server model:
Example 5. Main Program:
1. Open a thread pool with two times the number of CPUs in the machine, and the threads are all suspended (suspend). They are waiting for the result of processing an overlapping Io operation.
1. Listen to a port
2. Waiting for connection
3. When there is a connection
4. Deliver an overlapping read operation READ command
5. Return to 2
Service thread:
1. If the read is complete, process the read content (such as the http get command); otherwise, execute 3
2. Deliver an overlapping write operation (for example, return the webpage required by the http get command)
3. If a write operation is completed, you can deliver another overlapping read operation to read the next request from the client or close the connection (for example, the connection is closed every time a webpage is sent in the HTTP protocol)
4. Obtain the next overlapping Io operation result. If the IO operation is not completed or there is no IO operation, return to the thread pool.
Assuming that this is a web server program, we can see that the worker thread runs in the smallest unit of work with read or write, and an overlapping read operation is performed in the main program.
When the read operation is completed, a worker thread in a thread pool is activated to obtain the operation result, process the get or POST command, and then send a webpage content. Sending is also an overlapping operation, then process the IO operation results of other clients. If there is no other things to be processed, return to the thread pool and wait. we can see that this model can be used to send and receive messages, or is not a thread.
When the sending operation is complete, a worker thread pool in the thread pool is activated. It closes the connection (HTTP Protocol) and then processes other IO operation results, if nothing else needs to be processed, return to the thread pool and wait.
Let's take a look at how a thread serves multiple clients in this model. It is also an example of simulating a web server:
If there are two threads in the system, threada and threadb, they are all waiting for the result of processing an overlapping Io operation.
When a client connects to clienta, the main program delivers an overlapping read operation and waits for the next client to connect. When the read operation is completed, threada is activated and it receives an http get command, threada then sends a webpage to clienta using the overlapping write operation, and then immediately returns to the thread pool to wait for processing the result of the next Io operation. At this time, the sending operation is not completed, and another client is connected to clientb, the main program then delivers an overlapping read operation. When the read operation is completed, threada (which may also be threadb) is activated again. It repeats the same step and receives a GET command, A web page is sent to clientb using the overlapped write operation. This time, when it does not have time to return to the thread pool, another connection is connected to clientc, and the main program delivers another overlapping read operation, when the read operation is complete, threadb is activated (because threada has not returned to the thread pool). It receives an http get command, and threadb sends a webpage to clientc using the overlapping write operation, then threadb returns to the thread pool, and threada also returns to the thread pool.
As you can imagine, three pending sending operations are the pages sent by threada to clienta and clientb, And the pages sent by threadb to clientc, which are processed by the operating system kernel. threada and threadb have now returned to the thread pool and can continue to serve any other client.
When the overlapping write operation on clienta has been completed, threada (or threadb) is activated again to close the connection with clienta, but has not returned to the thread pool, at the same time, the overlapping write operations sent to clientb are also completed, and threadb is activated (because threada has not returned to the thread pool). It closes the connection with clientb and then returns to the thread pool, at this time, the write operation of clientc is completed, and threadb is activated again (because threada still does not return to the thread pool). It closes the connection with clientc, and threada returns to the thread pool, threadb also returns to the thread pool. all the services of the three clients are completed. during the service process, operations such as "establish connection", "read data", "Write Data", and "close connection" are logically continuous and actually separated.
Until now, the two threads have processed three read operations and three write operations, and the state machine in these read/write operations is complicated, the simulation is simplified by me. In fact, the status is much more complex than this. However, the more client requests, the higher the performance of this server model than the previous two models. using the complete port, we can easily implement such a server model.
Microsoft's IIS Web server uses this server model. Many people say that Apache server is better than IIS, and I doubt what it is, unless the Apache server can divide threads, for smaller unit services, I think it is impossible! This complete port model has taken a single read or write operation as the smallest service unit. I think the performance of IIS is much higher than that of other web servers under the same machine configuration, this is also analyzed from the implementation mechanism. If the performance difference may be in different operating systems, maybe the Linux kernel is better than the Windows Kernel. Has anyone really studied it? We are still hyping this together.
The concepts of state machines are used in many aspects, including TCPIP, compilation principles, and openggl. I am still familiar with discrete mathematics, I think you can understand it if you spend more time reading it. finally, it is a simple file transfer server program code that uses only two threads (only one CPU in my machine) to serve multiple clients. when debugging, I use it to provide file download services for six NC clients at the same time. Of course, there will be no problems with more, simply using the thread pool of NT 5 and the completion port technology can have such high performance, not to mention the performance of IIS!
I hope you don't get stuck in the framework of this program. CTRL + C and CTRL + V are meaningless. You need to understand its essence. the program is compiled using Visual C ++ 6.0 SP5 + 2003 Platform SDK, and the test run is lowered in Windows XP Professional. the minimum requirement for running a program is Windows 2000.
/*************************************** *****************************
Created: 2005/12/24
Created:
Modified: 2005/12/24
Filename: D:/vcwork/iocomp. cpp
File Path: D:/vcwork/iocomp
File base: iocomp
File Ext: CPP
Author: kruglinski (kruglinski_at_gmail_dot_com)
Purpose: a high-performance file download service program using port technology.
**************************************** *****************************/
# DEFINE _ win32_winnt 0x0500
# Include
# Include
# Include
# Include // an input/output stream Program increases by 70 KB
# Include
# Include
# Include
# Include
Using namespace STD;
# Pragma comment (Lib, "ws2_32.lib ")
# Pragma comment (Lib, "mswsock. lib ")
Const int max_buffer_size = 1024;
Const int pre_send_size = 1024;
Const int maid = 3000;
Const int pre_dot_timer = quit_time_out/80;
Typedef Enum {iotransfile, iosend, iorecv, ioquit} io_type;
Typedef struct
{
Socket hsocket;
Sockaddr_in clientaddr;
} Pre_socket_data, * ppre_socket_data;
Typedef struct
{
Overlapped OA;
Wsabuf databuf;
Char buffer [max_buffer_size];
Io_type iotype;
} Pre_io_data, * ppre_io_data;
Typedef vector socketdatavector;
Typedef vector iodatavector;
Socketdatavector gsockdatavec;
Iodatavector giodatavec;
Critical_section csprotection;
Char * timenow (void)
{
Time_t t = time (null );
TM * localtm = localtime (& T );
Static char timemsg [512] = {0 };
Strftime (timemsg, 512, "% Z: % B % d % x, % Y", localtm );
Return timemsg;
}
Bool transfile (ppre_io_data piodata, ppre_socket_data psocketdata, DWORD dwnamelen)
{
// This statement is for NC. You can modify it.
Piodata-& gt; buffer [dwNameLen-1] = '/0 ';
Handle hfile = createfile (piodata-> buffer, generic_read, 0, null, open_existing, 0, null );
Bool Bret = false;
If (hfile! = Invalid_handle_value)
{
Cout <"transmit file" <buffer <"to client" <piodata-> iotype = iotransfile;
Memset (& piodata-> OA, 0, sizeof (overlapped ));
* Reinterpret_cast (piodata-> buffer) = hfile;
Transmitfile (psocketdata-> hsocket, hfile, getfilesize (hfile, null), pre_send_size, reinterpret_cast (piodata), null, tf_use_system_thread );
Bret = wsagetlasterror () = wsa_io_pending;
}
Else
Cout <"transmit file" <"error:" <
Return Bret;
}
DWORD winapi threadproc (lpvoid iocphandle)
{
DWORD dwrecv = 0;
DWORD dwflags = 0;
Handle hiocp = reinterpret_cast (iocphandle );
DWORD dwtranscount = 0;
Ppre_io_data ppreiodata = NULL;
Ppre_socket_data pprehandledata = NULL;
While (true)
{
If (getqueuedcompletionstatus (hiocp, & dwtranscount,
Reinterpret_cast (& pprehandledata ),
Reinterpret_cast (& ppreiodata), infinite ))
{
If (0 = dwtranscount & ioquit! = Ppreiodata-> iotype)
{
Cout <"client :"
<Clientaddr. sin_addr)
<":" <Clientaddr. sin_port)
<"Is closed" <
Closesocket (pprehandledata-> hsocket );
Entercriticalsection (& csprotection );
Iodatavector: iterator itriodelete = find (giodatavec. Begin (), giodatavec. End (), ppreiodata );
Socketdatavector: iterator itrsockdelete = find (gsockdatavec. Begin (), gsockdatavec. End (), pprehandledata );
Delete * itriodelete;
Delete * itrsockdelete;
Giodatavec. Erase (itriodelete );
Gsockdatavec. Erase (itrsockdelete );
Leavecriticalsection (& csprotection );
Continue;
}
Switch (ppreiodata-> iotype ){
Case iotransfile:
Cout <"client :"
<Clientaddr. sin_addr)
<":" <Clientaddr. sin_port)
<"Transmit finished" <closehandle (* reinterpret_cast (ppreiodata-> buffer ));
Goto lrerecv;
Case iosend:
Cout <"client :"
<Clientaddr. sin_addr)
<":" <Clientaddr. sin_port)
<"Send finished" <
Lrerecv:
Ppreiodata-> iotype = iorecv;
Ppreiodata-> databuf. Len = max_buffer_size;
Memset (& ppreiodata-> OA, 0, sizeof (overlapped ));
Wsarecv (pprehandledata-> hsocket, & ppreiodata-> databuf, 1,
& Dwrecv, & dwflags,
Reinterpret_cast (ppreiodata), null );
Break;
Case iorecv:
Cout <"client :"
<Clientaddr. sin_addr)
<":" <Clientaddr. sin_port)
<"Recv finished" <ppreiodata-> iotype = iosend;
If (! Transfile (ppreiodata, pprehandledata, dwtranscount ))
{
Memset (& ppreiodata-> OA, 0, sizeof (overlapped ));
Strcpy (ppreiodata-> databuf. Buf, "file transmit error! /R/N ");
Ppreiodata-> databuf. Len = strlen (ppreiodata-> databuf. BUF );
Wsasend (pprehandledata-> hsocket, & ppreiodata-> databuf, 1,
& Dwrecv, dwflags,
Reinterpret_cast (ppreiodata), null );
}
Break;
Case ioquit:
Goto lquit;
Default:
;
}
}
}
Lquit:
Return 0;
}
Handle hiocp = NULL;
Socket hlisten = NULL;
Bool winapi shutdownhandler (DWORD dwctrltype)
{
Pre_socket_data presockdata = {0 };
Pre_io_data preiodata = {0 };
Preiodata. iotype = ioquit;
If (hiocp)
{
Postqueuedcompletionstatus (hiocp, 1,
Reinterpret_cast (& presockdata ),
Reinterpret_cast (& preiodata ));
Cout <"shutdown at" <
// Let the CPU time out and let the thread exit
For (int t = 0; t <80; t + = 1)
{
Sleep (pre_dot_timer );
Cout <".";
}
Closehandle (hiocp );
}
Int I = 0;
For (; I {
Ppre_socket_data psockdata = gsockdatavec [I];
Closesocket (psockdata-> hsocket );
Delete psockdata;
}
For (I = 0; I {
Ppre_io_data piodata = giodatavec [I];
Delete piodata;
}
Deletecriticalsection (& csprotection );
If (hlisten)
Closesocket (hlisten );
Wsacleanup ();
Exit (0 );
Return true;
}
Long winapi myexceptionfilter (struct _ exception_pointers * exceptioninfo)
{
Shutdownhandler (0 );
Return exception_execute_handler;
}
U_short defport = 8182;
Int main (INT argc, char ** argv)
{
If (argc = 2)
Defport = atoi (argv [1]);
Initializecriticalsection (& csprotection );
Setunhandledexceptionfilter (myexceptionfilter );
Setconsolectrlhandler (shutdownhandler, true );
Hiocp = createiocompletionport (invalid_handle_value, null, 0, 0 );
Wsadata DATA = {0 };
Wsastartup (0x0202, & data );
Hlisten = socket (af_inet, sock_stream, ipproto_tcp );
If (invalid_socket = hlisten)
{
Shutdownhandler (0 );
}
Sockaddr_in ADDR = {0 };
ADDR. sin_family = af_inet;
ADDR. sin_port = htons (defport );
If (BIND (hlisten, reinterpret_cast (& ADDR ),
Sizeof (ADDR) = socket_error)
{
Shutdownhandler (0 );
}
If (Listen (hlisten, 256) = socket_error)
Shutdownhandler (0 );
System_info Si = {0 };
Getsysteminfo (& Si );
Si. dwnumberofprocessors <= 1;
For (INT I = 0; I {
Queueuserworkitem (threadproc, hiocp, wt_executelongfunction );
}
Cout <"Startup at" <"work on port" <"press Ctrl + C to shutdown" <
While (true)
{
Int namelen = sizeof (ADDR );
Memset (& ADDR, 0, sizeof (ADDR ));
Socket haccept = accept (hlisten, reinterpret_cast (& ADDR), & namelen );
If (haccept! = Invalid_socket)
{
Cout <"accept a client:" <"<
Ppre_socket_data pprehandledata = new pre_socket_data;
Pprehandledata-> hsocket = haccept;
Memcpy (& pprehandledata-> clientaddr, & ADDR, sizeof (ADDR ));
Createiocompletionport (reinterpret_cast (haccept ),
Hiocp, reinterpret_cast (pprehandledata), 0 );
Ppre_io_data ppreiodata = new (nothrow) pre_io_data;
If (ppreiodata)
{
Entercriticalsection (& csprotection );
Gsockdatavec. push_back (pprehandledata );
Giodatavec. push_back (ppreiodata );
Leavecriticalsection (& csprotection );
Memset (ppreiodata, 0, sizeof (pre_io_data ));
Ppreiodata-> iotype = iorecv;
Ppreiodata-> databuf. Len = max_buffer_size;
Ppreiodata-> databuf. Buf = ppreiodata-> buffer;
DWORD dwrecv = 0;
DWORD dwflags = 0;
Wsarecv (haccept, & ppreiodata-> databuf, 1,
& Dwrecv, & dwflags,
Reinterpret_cast (ppreiodata), null );
}
Else
{
Delete pprehandledata;
Closesocket (haccept );
}
}
}
Return 0;
}
References:
Msdn 2001
Windows Network Programming
Windows core programming
TCP/IP details
Http://kruglinski.bokee.com/4000210.html