c10k problems
Server applications It's a very old, well-known problem, and the idea is that a single server can support concurrent 10K connections, which may be kept alive.
To solve this problem, there are two main ideas: one is to allocate a separate process/thread for each connection processing, and the other is to process several connections simultaneously using the same process/threads.
Each process/thread processes a connection
This idea is the most direct one. However, because the application process/thread consumes considerable system resources, and the management of multi-process/threads can be stressful for the system, this scheme is not well-extensible.
Therefore, this idea is not feasible when the server resources are not rich enough, even if the resources are rich enough and not efficient enough.
Problem: Resource consumption is excessive and scalability is poor.
Each process/thread handles multiple connections at the same time traditional thinking
The simplest way to do this is to cycle through each connection, one for each socket, which is possible when all sockets have data.
But when the application reads the file data for a socket that is not ready, the entire application blocks here to wait for the file handle, even if the other file handle is ready and cannot be processed down.
- Idea: direct loop processing of multiple connections.
- Problem: An unsuccessful file handle can block the entire application.
Select
To solve the above blocking problem, the idea is very simple, if I read the file handle before, check the state of it, ready to be processed, not ready will not be processed, this does not solve the problem?
Then there is the select scheme. A fd_set struct is used to tell the kernel to monitor multiple file handles at the same time, and the call returns when there is a specified change in the state of the file handle (for example, a handle becomes available by unavailability) or time-out. The app can then use Fd_isset to view the state of which file handle changed individually.
In doing so, small-scale connection problems are minor, but when the number of connections is large (a lot of file handles), checking the status one by one is slow. Therefore, select often has a managed handle cap (fd_setsize). At the same time, on use, because only one field records attention and events, the Fd_set struct is reinitialized before each call.
int Select(int Nfds, Fd_set *Readfds, Fd_set *Writefds, Fd_set *Exceptfds, struct Timeval *Timeout);
- Idea: There is a connection request arrived at the re-check processing.
- Problem: Handle upper bound + repeat initialization + one-by-one troubleshooting all file handle states are inefficient.
Poll
Poll mainly solves the first two issues of select: Passing an POLLFD array to the kernel to eliminate the upper limit of the file handle for the event of concern, and using different fields to annotate events and events to avoid duplication of initialization.
int Poll(struct POLLFD *FDS, nfds_t Nfds, int Timeout);
- Idea: Design new data structures to provide efficient use.
- Issue: Troubleshooting all file handles individually is inefficient.
Epoll
Since it is not efficient to troubleshoot all file handles individually, it is natural that if the call returns, it will only provide the application with a file handle that has changed the state (most likely the data ready), so it is not much more efficient to troubleshoot.
Epoll uses this design for large-scale application scenarios.
Experiments show that when the number of file handles exceeds 10, the Epoll performance will outperform select and poll, and when the number of file handles reaches 10K, the Epoll has exceeded the select and poll two orders of magnitude.
int epoll_wait(int EPFD, struct epoll_event *Events, int maxevents, int Timeout);
- Idea: Only file handles that return status changes.
- Problem: Rely on a specific platform (Linux).
libevent
Cross-platform, encapsulate the call of the underlying platform, provide a unified API, but the bottom layer on different platforms automatically select the appropriate call.
c10k to c10m
With the evolution of technology, Epoll has been able to deal with c10k problems better, but if we want to further expand, such as supporting 10M-scale concurrent connections, the original technology can do nothing.
So, where is the new bottleneck?
From the previous evolutionary process, we can see that the fundamental idea is to effectively block, so that the CPU can dry the core of the task.
When you connect a lot, you first need a lot of process/threads to do things. At the same time the application process/thread in the system may be in a large number of ready state, need the system to continue to quickly switch, and we know that the system context switching is a cost. Although the Linux system scheduling algorithm has been designed to be very efficient, but for 10M such a large-scale scenario is still insufficient.
So we have two bottlenecks, one is the process/thread as the processing unit or too heavy, and the other is the cost of the system scheduling is too high.
Naturally, we would think that if there was a more lightweight process/thread as a processing unit, and that their dispatch could be done very quickly (preferably without a lock), it would be perfect.
Such techniques now have some implementations in some languages, which are coroutine, or collaborative routines. Specifically, the Coroutine (co-process) model in Python, Lua, and the Goroutine (go) model in the Go language are similar concepts. In fact, multiple languages (even C languages) can implement similar models.
They are implemented in an attempt to implement multiple tasks with a small set of threads, and once a task is blocked, it is possible to continue running other tasks with the same thread, avoiding a lot of context switching. The system resources that each of the processes monopolize are often only the stack part. Moreover, the transition between the various processes is often the user through the code to explicitly specify (similar to various callback), do not need to participate in the kernel, can be easily implemented asynchronously.
Reference documents
- http://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-architects/
reprint Please specify: http://blog.csdn.net/yeasy/article/details/43152115
Technological change caused by c10k problem