Epoll basic concepts and c10k Problems

Source: Internet
Author: User

Epoll: edge and Level Trigger polling (epoll)

Edge trigger and Level Trigger)

Edge trigger refers to an I/O event that occurs every time the status changes. A conditional trigger is triggered when the condition is met.
Generates an IO event. For example, we assume that after a long silence, 100 characters are generated.
In this case, a read ready notification application is generated regardless of the edge trigger and condition trigger.
Read in sequence. The application reads 50 bytes and then calls the API again to wait for the IO event. At this time, the API triggered by the condition will
Because there are still 50 bytes of readable data, a read ready notification is returned immediately. Edge triggering
The API will wait for a long time because the readable status does not change.

Therefore, when using edge-triggered APIs, you must read the socket to return ewouldblock every time. Otherwise
This socket is useless. When using conditional-triggered APIs, do not pay attention if the application does not need to write
A writable socket event. Otherwise, a write ready notification is returned infinitely. Everybody
The commonly used select is conditional trigger. I used to pay attention to socket write events for a long time.
CPU 100% error.

======================================

From: http://www.rosoo.net/a/201103/11088.html

In addition: c10k.pdf

Summary: When writing a high-load server program with a large number of connections, the classic multi-thread mode and select mode are no longer applicable. They should be discarded and epoll/kqueue/dev_poll should be used to capture I/O events. Finally, we briefly introduce AIO.

Origin:Network services often suffer from low efficiency or even complete paralysis when handling tens of thousands of client connections, which is called the c10k problem. With the rapid development of the Internet, more and more network services are facing the c10k problem. As a large website developer, it is necessary to have a certain understanding of the c10k problem. (The main references in this article are http://www.kegel.com/c10k.htmls .) The biggest feature of the c10k problem is that the performance of poorly designed programs and the relationship between the connections and machine performance is often non-linear. For example, if you have not considered the c10k problem, a classic select-based program can handle 1000 concurrent throughput on the old server, it often cannot process 2000 of the throughput of concurrency on new servers with 2x performance. This is because the consumption of a large number of operations is linearly related to the current number of connections at the time of the policy. The relationship between the resource consumption of a single task and the current number of connections is O (n ). The service program needs to perform I/O processing on tens of thousands of sockets at the same time, and the accumulated resource consumption will be considerable, which obviously causes the system throughput to not match the machine performance. To solve this problem, you must change the policy for connecting to provide services.

Basic Policy:

There are two main strategies: 1. how does the application work with the operating system to obtain I/O events and schedule I/O operations on multiple sockets? 2. how does the application process the relationship between tasks and threads/processes. The former provides three solutions: Blocking I/O, non-blocking I/O, and asynchronous I/O, the latter mainly includes 1 process per task, 1 thread per task, single thread, multi-task sharing thread pool, and some more complex variant solutions. Common classic policies are as follows:
1. serve one client with each thread/process, and use blocking I/o this is a common strategy for applets and Java, and is also a common choice for interactive persistent connection applications (such as BBS ). This strategy is difficult to meet the needs of high-performance programs. The advantage is that it is extremely simple to implement and is easy to embed complex interaction logic. Apache and ftpd all work in this mode.
2. Serve serving clients with single thread, and use nonblocking I/O and readiness notification. This is a classic model, and datapipe and other programs are implemented in this way. The advantage is that the implementation is simple, convenient for transplantation, and sufficient performance can be provided. The disadvantage is that the machine with multiple CPUs cannot be fully utilized. Especially when the program itself does not have complex business logic.
3. serve concurrent clients with each thread, and use nonblocking I/O and readiness notification for classic model 2. The disadvantage is that it is easy to generate bugs in multi-thread concurrency, some operating systems do not support multi-threaded readiness notification.
4. Serve serving clients with each thread, and use asynchronous I/O on OS supported by AI/O, which can provide high performance. However, the AI/o programming model is quite different from the classic model. It is basically difficult to write a framework that supports both the AI/O and classic models, reducing the portability of the program. In Windows, this is basically the only option.
This article mainly discusses the details of model 2, that is, how the application software processes socket I/O under Model 2.
Select and poll
The typical process of the original synchronous blocking I/O model is as follows:
Typical process of synchronous blocking I/O model
From the application point of view, the read call will last for a long time, and the application needs to be implemented in multiple threads.
Concurrent access problems. This improves non-blocking I/O synchronization:
The typical single-threaded server program structure is often as follows:
Do {
Get readiness notification of all sockets
Dispatch ready handles to corresponding handlers
If (readable ){
Read the socket
If (read done)
Handler process the request
}
If (writable)
Write Response
If (nothing to do)
Close socket
} While (true)
Typical non-blocking I/O model process:
Typical process of asynchronous blocking I/O model
The key part is readiness notification. Find out which socket has an I/O event.
Generally, the first thing we learned from textbooks and example programs is to use select for implementation. Select is defined as follows:
Int select (int n, fd_set * rd_fds, fd_set * wr_fds, fd_set * ex_fds, struct timeval * timeout );
Select uses the fd_set structure. From the man page, we can know that the handle that fd_set can hold is related to fd_setsize. In fact, fd_set is a bit flag Array under * nix. Each bit indicates whether the FD of the corresponding subscript is in fd_set. Fd_set can only accommodate the handles with numbers less than fd_setsize.
The default value of fd_setsize is 1024. If a large handle is placed into fd_set, the program will collapse when the array is out of bounds. By default, the maximum handle number of a process cannot exceed 1024, but this limit can be extended through the ulimit-N command/setrlimit function. If, unfortunately, a program is compiled in an fd_setsize = 1024 environment and ulimit-N> 1024 is encountered during the runtime, it will only pray for God's blessing not to collapse.
In the ace environment, ace_select_reactor provides special protection measures for this, but some functions such as recv_n indirectly use select, which requires attention.
To address the fd_set problem, * nix provides the poll function as a substitute for select. The poll interface is as follows:
Int poll (struct pollfd * ufds, unsigned int NFDs, int timeout );
The 1st ufds parameter is a pollfd array provided by the user. The array size is determined by the user. This avoids the trouble caused by fd_setsize. Ufds is a complete alternative to fd_set. It is convenient to port data from select to poll. So far, at least we can write a working program for the c10k.
However, when the number of connections between select and poll increases, the performance decreases sharply. There are two reasons: first, the operating system needs to re-create a list of concerned events of the current thread for each select/poll operation, and hanging the thread on this complicated waiting queue, which is quite time consuming. Second, the application software also needs to perform a scan to dispatch the input handle list after the select/poll return, which is also time-consuming. These two tasks are related to the number of concurrent tasks, and the density of I/O events is also related to the number of concurrent tasks. As a result, the CPU usage and concurrency are approximately in the relationship of O (n2.
Epoll, kqueue,/dev/poll
For the above reasons, * nix hackers have developed epoll, kqueue,/dev/poll to help you. Let's kneel down and thank these experts for three minutes. Epoll is a Linux solution, kqueue is a freebsd solution, And/dev/poll is the oldest Solaris solution. The difficulty increases sequentially.
To put it simply, these Apis do two things: 1. this avoids the overhead of using the kernel analysis parameter to establish the event wait structure each time you call select/poll. The kernel maintains a long list of event concerns, the application modifies the list and captures I/O events through a handle. 2. After Select/poll is returned, the application scans the entire handle table for sale, and the kernel directly returns the specific event list to the application.
Before getting started with a specific API, you should first understand the concepts of edge trigger and Level Trigger. Edge trigger refers to an IO event that occurs whenever the state changes. Conditional trigger refers to an IO event as long as the condition is met. For example, if 100 bytes are generated after a long silence period, a read ready notification application is read regardless of the edge trigger and conditional trigger. The application reads 50 bytes and then calls the API again to wait for the IO event. At this time, the API triggered by the condition will immediately return the user a read ready notification because there are 50 bytes of readable. The API triggered by edge nodes will wait for a long time because the readable status does not change.
Therefore, when using edge-triggered APIs, you must read the socket to return ewouldblock every time. Otherwise, the socket will be discarded. When the conditional-triggered API is used, if the application does not need to write, do not pay attention to the socket writable event. Otherwise, a write ready notification will be returned infinitely. The commonly used SELECT statement is conditional trigger. In the past, I had to pay attention to socket write events for a long time, resulting in 100% CPU usage.
Epoll is called as follows:
Int epoll_create (INT size)
Int epoll_ctl (INT epfd, int op, int FD, struct epoll_event * event)
Int epoll_wait (INT epfd, struct epoll_event * events, int maxevents, int timeout)
Epoll_create creates the follow event table in the kernel, which is equivalent to creating fd_set.
Epoll_ctl modifies this table, which is equivalent to fd_set and other operations.
Epoll_wait waits for an I/O event to occur, which is equivalent to the select/poll function.
Epoll is an upgraded version of select/poll, and the supported events are completely consistent. In addition, epoll supports both edge triggering and conditional triggering. Generally, edge triggering has better performance. Here is a simple example:
Struct epoll_event eV, * events;
Int kdpfd = epoll_create (100 );
Ev. Events = epollin | epollet; // pay attention to this epollet, which specifies edge triggering
Ev. Data. FD = listener;
Epoll_ctl (kdpfd, epoll_ctl_add, listener, & eV );
For (;;){
NFDs = epoll_wait (kdpfd, events, maxevents,-1 );
For (n = 0; n <NFDs; ++ N ){
If (events [N]. Data. FD = listener ){
Client = accept (listener, (struct sockaddr *) & local,
& Addrlen );
If (client <0 ){
Perror ("accept ");
Continue;
}
Setnonblocking (client );
Ev. Events = epollin | epollet;
Ev. Data. FD = client;
If (epoll_ctl (kdpfd, epoll_ctl_add, client, & eV) <0 ){
Fprintf (stderr, "epoll set insertion error: FD = % D0,
Client );
Return-1;
}
} Else
Do_use_fd (events [N]. Data. FD );
}
}
Brief Introduction to kqueue and/dev/poll
Kqueue is the darling of FreeBSD. kqueue is actually a kernel event queue with rich functions. It is not just an upgrade of select/poll, it can also handle multiple events such as signal, directory structure change, and process.
Kqueue is edge-triggered.
/Dev/poll is the product of Solaris and is the first in this series of high-performance APIs. Kernel provides a special device file/dev/poll. The application opens this file to obtain the fd_set handle and write it to pollfd to modify it. A special IOCTL call is used to replace select. The/dev/poll interface seems clumsy and ridiculous because it appeared earlier.
C ++ development: Ace 5.5 and later versions provide ace_dev_poll_reactor, which encapsulates epoll and/dev/poll APIs. You must define ace_has_epoll and ace_has_dev_poll in config. h to enable them. Java Development: Selector of JDK 1.6 provides epoll support and jdk1.4 supports/dev/poll. You only need to select a JDK version that is high enough.
Asynchronous I/O and windows
Unlike the classic model, asynchronous I/O provides another idea. Unlike traditional synchronous I/O, asynchronous I/O allows a process to initiate many I/O operations without blocking or waiting for any operation to complete. The process can retrieve the results of an I/O operation later or when it receives a notification that the I/O operation is complete.
The asynchronous non-blocking I/O model is a model that processes overlapping I/O. The read request is returned immediately, indicating that the read request has been initiated successfully. When the read operation is completed in the background, the application then performs other processing operations. When the read response arrives, a signal is generated or a thread-based callback function is executed to complete the I/O processing. Typical process of asynchronous I/O model:
Typical process of asynchronous non-blocking I/O model
For file operations, AIO has the following benefits: After the application submits multiple broken disk requests to the operating system concurrently, the operating system has the opportunity to merge and re-sort these requests, which is impossible for Synchronous calls-unless the threads with the same number of requests are created. Linux kernel 2.6 provides limited AIO support-only supports file systems. Libc may be able to simulate socket AIO through a thread, but this is meaningless to performance. In general, Aio in Linux is not mature yet, and Windows supports AIO well. There are two methods: iocp queue and ipcp callback, and even user-level asynchronous call of APC. In Windows, AIO is the only high-performance solution available. For more information, see msdn.

See http://www.kegel.com/c10k.html the c10k problem http://www.yuanma.org/data/2007/1203/article_2906.htm for detailsC10k problems in Network Programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.