Summary
When writing a high-load server program with a large number of connections, the classic multi-thread mode and select mode are no longer applicable.
They should be discarded and epoll/kqueue/dev_poll should be used to capture I/O events. Finally, we briefly introduce AIO.
Origin
Network services are often inefficient or even completely paralyzed when handling tens of thousands of client connections.
C10K problem. With the rapid development of the Internet, more and more network services are facing the C10K problem.
It is necessary for website developers to have a certain understanding of the C10K problem. The main references in this article are:
<Http://www.kegel.com/c10k.html> http://www.kegel.com/c10k.htmls.
The biggest feature of the C10K problem is: the poor design of the program, its performance and connection and machine performance are often related
Non-linear. For example, if you have not considered the C10K problem, a classic select-based program can
The old server can handle 1000 concurrent throughput, which is twice the performance of the new server.
2000 throughput.
This is because the consumption of a large number of operations is linearly related to the current number of connections at the time of the policy. Will cause a single task
The relationship between resource consumption and the current number of connections is O (n ). The service program needs
In line I/O processing, the accumulated resource consumption will be considerable, which will obviously cause the system throughput to fail and machine performance.
Yes. To solve this problem, you must change the policy for connecting to provide services.
Basic Policy
There are two main strategies: 1. How does the application software work with the operating system to acquire and schedule I/O events?
I/O operations on sockets; 2. the way in which the application software processes the relationship between tasks and threads/processes. Master
There must be three solutions: Blocking I/O, non-blocking I/O, and asynchronous I/O. The latter mainly has one process for each task and one line for each task.
Thread, single thread, multi-task shared thread pool, and some more complex variant solutions. Common classic policies are as follows:
1. Serve one client with each thread/process, and use blocking I/O
This is a common strategy for applets and java, and is also a common choice for interactive persistent connection applications (such as BBS ).
This strategy is difficult to meet the needs of high-performance programs. The advantage is that it is extremely simple to implement, and it is easy to embed complex interactive compaction.
. Apache and ftpd all work in this mode.
2. Serve serving clients with single thread, and use nonblocking I/O
And readiness notification
This is a classic model and is implemented by programs such as datapipe. The advantage is that the implementation is simple, convenient for transplantation, and
Sufficient performance is provided. The disadvantage is that the multi-CPU machine cannot be fully utilized. Especially the program itself is not complex
Business logic.
3. Serve serving clients with each thread, and use nonblocking I/O and
Readiness notification
A simple improvement to classic model 2 has the disadvantage of being prone to bugs in multi-thread concurrency, and some operating systems do not support multithreading.
Operate readiness notification.
4. Serve serving clients with each thread, and use asynchronous I/O
On OS with AI/O support, it can provide quite high performance. However, the AI/O programming model is quite different from the classic model.
Big, it is basically difficult to write a framework that supports both AI/O and classic models, reducing program portability. In
On Windows, this is basically the only option.
This article mainly discusses the details of model 2, that is, how the application software processes Socket I/O under Model 2.
Select and poll
The typical process of the original synchronous blocking I/O model is as follows:
Typical process of synchronous blocking I/O model
From the application point of view, the read call will last for a long time, and the application needs to be implemented in multiple threads.
Concurrent access problems. This improves non-blocking I/O synchronization:
The typical single-threaded server program structure is often as follows:
Do {
Get Readiness Notification of all sockets
Dispatch ready handles to corresponding handlers
If (readable ){
Read the socket
If (read done)
Handler process the request
}
If (writable)
Write response
If (nothing to do)
Close socket
} While (True)
Typical non-blocking I/O model process:
Typical process of asynchronous blocking I/O model
The key part is readiness notification. Find out which socket has an I/O event.
Generally, the first thing we learned from textbooks and example programs is to use select for implementation. Select is defined as follows:
Int select (int n, fd_set * rd_fds, fd_set * wr_fds, fd_set * ex_fds, struct
Timeval * timeout );
Select uses the fd_set structure. From the man page, we can know that the handle that fd_set can hold is consistent with FD_SETSIZE.
Off. In fact, fd_set is a bit flag Array under * nix. Each bit indicates whether the fd corresponding to the underlying mark is in
In fd_set. Fd_set can only accommodate the handles with numbers less than FD_SETSIZE.
The default value of FD_SETSIZE is 1024. If a large handle is put into fd_set, the program will crash when the array is out of bounds.
Drop. By default, the maximum handle number of a process cannot exceed 1024. However, you can run the ulimit-n command
The/setrlimit function expands this limit. If unfortunately a program is compiled in an FD_SETSIZE = 1024 environment
Translation: If you encounter ulimit-n> 1024 again during the runtime, you can only pray for God's blessing that it will not collapse.
In the ACE environment, ACE_Select_Reactor provides special protection measures for this, but recv_n still exists.
This function indirectly uses select, which requires attention.
To address the fd_set problem, * nix provides the poll function as a substitute for select. The Poll interface is as follows:
Int poll (struct pollfd * ufds, unsigned int nfds, int timeout );
The 1st ufds parameter is a pollfd array provided by the user. The size of the array is determined by the user.
FD_SETSIZE is troublesome. Ufds is a complete alternative to fd_set. The porting from select to poll is very effective.
. So far, at least we can write a working program for the C10K.
However, when the number of connections between Select and Poll increases, the performance decreases sharply. There are two reasons: first, the operating system
For each select/poll operation, you need to re-create a list of concerned events of the current thread, and set the line
Hanging on this complex waiting queue, which is quite time consuming. Second, the application software returns
Then, you also need to perform a scan of the input handle list to dispatch, which is also very time-consuming. Both of these are
The number of concurrency is related, and the density of I/O events is also related to the number of concurrency, resulting in the CPU usage and the number of concurrency is similar to O (n2)
.
Epoll, kqueue,/dev/poll
For the above reasons, * nix hackers have developed epoll, kqueue,/dev/poll to help
Everybody, let's kneel down for three minutes to thank these great gods. Epoll is a linux solution, and kqueue is
Freebsd's solution:/dev/poll is the oldest Solaris solution, and the difficulty of use increases sequentially.
To put it simply, these Apis do two things: 1. Avoid setting the kernel analysis parameter every time you call select/poll.
The overhead of the event wait structure. The kernel maintains a list of long-term event concerns. The application modifies
List and capture I/O events. 2. Avoid enabling the application to scan the entire handle table after the select/poll returns
Sale, the Kernel directly returns the specific event list to the application.
Before getting started with a specific api, let's take a look at edge trigger and level trigger)
. Edge trigger refers to an I/O event that occurs every time the status changes. A conditional trigger is triggered when the condition is met.
Generates an io event. For example, we assume that after a long silence, 100 characters are generated.
In this case, a read ready notification application is generated regardless of the edge trigger and condition trigger.
Read in sequence. The application reads 50 bytes and then calls the api again to wait for the io event. At this time, the api triggered by the condition will
Because there are still 50 bytes of readable data, a read ready notification is returned immediately. Edge triggering
The api will wait for a long time because the readable status does not change.
Therefore, when using edge-triggered APIs, you must read the socket to return EWOULDBLOCK every time. Otherwise
This socket is useless. When using conditional-triggered APIs, do not pay attention if the application does not need to write
A writable socket event. Otherwise, a write ready notification is returned infinitely. Everybody
The commonly used select is conditional trigger. I used to pay attention to socket write events for a long time.
CPU 100% error.
Epoll is called as follows:
Int epoll_create (int size)
Int epoll_ctl (int epfd, int op, int fd, struct epoll_event * event)
Int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int
Timeout)
Epoll_create creates the follow event table in the kernel, which is equivalent to creating fd_set.
Epoll_ctl modifies this table, which is equivalent to FD_SET and other operations.
Epoll_wait waits for an I/O event to occur, which is equivalent to the select/poll function.
Epoll is an upgraded version of select/poll, and the supported events are completely consistent. Epoll also supports edge touch
Sending and conditional triggering. Generally, edge triggering has better performance. Here is a simple example:
Struct epoll_event ev, * events;
Int kdpfd = epoll_create (100 );
Ev. events = EPOLLIN | EPOLLET; // pay attention to this EPOLLET, which specifies edge triggering
Ev. data. fd = listener;
Epoll_ctl (kdpfd, EPOLL_CTL_ADD, listener, & ev );
For (;;){
Nfds = epoll_wait (kdpfd, events, maxevents,-1 );
For (n = 0; n <nfds; ++ n ){
If (events [n]. data. fd = listener ){
Client = accept (listener, (struct sockaddr *) & local,
& Addrlen );
If (client <0 ){
Perror ("accept ");
Continue;
}
Setnonblocking (client );
Ev. events = EPOLLIN | EPOLLET;
Ev. data. fd = client;
If (epoll_ctl (kdpfd, EPOLL_CTL_ADD, client, & ev) <0 ){
Fprintf (stderr, "epoll set insertion error: fd = % d0,
Client );
Return-1;
}
}
Else
Do_use_fd (events [n]. data. fd );
}
}
Brief Introduction to kqueue and/dev/poll
Kqueue is the darling of freebsd. kqueue is actually a kernel event queue with rich functions. It does not
Only select/poll upgrades, and can handle multiple events such as signal, directory structure change, and process.
Kqueue is edge-triggered.
/Dev/poll is the product of Solaris and is the first in this series of high-performance APIs. Kernel provides a feature
Special Device File/dev/poll. The application opens this file to obtain the fd_set handle, and write
Pollfd to modify it. A special ioctl call is used to replace select. Because it appeared earlier
The/dev/poll interface seems clumsy and ridiculous now.
C ++ development: ACE 5.5 and later versions provide ACE_Dev_Poll_Reactor, which encapsulates epoll and/dev/poll.
Api, which must be enabled by defining ACE_HAS_EPOLL and ACE_HAS_DEV_POLL in config. h respectively.
Java Development: Selector of JDK 1.6 provides epoll support and JDK1.4 supports/dev/poll
Hold. You only need to select a JDK version that is high enough.
Asynchronous I/O and Windows
Unlike the classic model, asynchronous I/O provides another idea. Unlike traditional synchronous I/O, asynchronous I/O allows
Process to initiate many I/O operations without blocking or waiting for any operation to complete. Later or after receiving the I/O operation
The process can retrieve the results of the I/O operation.
The asynchronous non-blocking I/O model is a model that processes overlapping I/O. Read requests are returned immediately.
The read request has been initiated successfully. When the read operation is completed in the background, the application then performs other processing operations
. When the read response arrives, a signal is generated or a thread-based callback function is executed.
This I/O processing process. Typical process of asynchronous I/O model:
Typical process of asynchronous non-blocking I/O model
For file operations, AIO has the following benefits: The application sends concurrent requests to multiple broken Disks
After being handed over to the operating system, the operating system has the opportunity to merge and re-sort these requests.
It is impossible-unless a thread with the same number of requests is created.
Linux Kernel 2.6 provides limited AIO support-only supports file systems. Libc may be able to connect
To simulate socket AIO, but this does not make sense for performance. In general, Linux aio is not mature yet.
Windows supports AIO very well, including IOCP queue and IPCP callback, and even provides user-level asynchronous calling.
Use the APC function. In Windows, AIO is the only high-performance solution available. For more information, see MSDN.
This article from the CSDN blog, reproduced please indicate the source: http://blog.csdn.net/wpper/archive/2009/02/23/3924330.aspx