Linux Network Programming [5] non-blocking communication epoll

Source: Internet
Author: User
Tags epoll
Document directory
  • PPC
  • TPC
  • Select
  • Poll
  • Epoll

Epoll Introduction

Epoll is introduced in the linux 2.6 kernel. It replaces the previous select/poll model and can fully support large-scale concurrent network programs in linux.

Comparison between Epoll and other concurrent network programs in linux PPC

A typical Apache model, Process Per Connection, creates a Process for each new Connection for relevant processing.

TPC

Thread Per Connection creates a Thread for each new Connection for related processing. Obviously, the overhead of PPC and TPC is large, especially in the case of a large number of connections.

Select

1. maximum number of concurrent threads, because the FD opened by a process (file descriptor) is limited, which is set by FD_SETSIZE. The default value is 1024/2048, therefore, the maximum concurrency of the Select model is limited accordingly. Modify FD_SETSIZE by yourself? Although the idea is good, let's take a look at the following...

2. efficiency problems: each select call will linearly scan all FD sets, which will result in a linear decline in efficiency. The consequence of increasing the FD_SETSIZE is that everyone is coming slowly. What? All timeout ??!!

3. How does the kernel notify the user of the FD message when copying the kernel/user space memory? On this issue, select adopts the memory copy method.

Poll

Basically, the efficiency and select <1> support for a process to open a large number of socket Descriptors (FD)

Epoll

1. the most intolerable thing about select is that the FD opened by a process has certain limitations, which are set by FD_SETSIZE. The default value is 2048. For IM servers that need to support tens of thousands of connections, there are obviously too few. At this time, you can choose to modify this macro and then re-compile the kernel, but the materials also pointed out that this will bring about a decline in network efficiency; second, you can select a multi-process solution (the traditional Apache solution). However, although the cost of creating a process on linux is relatively small, it cannot be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so this is not a perfect solution. However, epoll
Without this limit, the FD ceiling supported by it is the maximum number of files that can be opened. This number is generally much larger than 2048 supported by select. For example, the number of machines with 1 GB of memory is about 0.1 million. You can check the number of machines with cat/proc/sys/fs/file-max, in general, this number has a lot to do with the system memory.

2. IO efficiency does not decrease linearly as the number of FD increases

Another critical weakness of traditional select/poll is that when you have a large set of sockets, due to network latency, only some of the sockets at any time are "active, each call to select/poll will linearly scan all sets, resulting in a linear decline in efficiency. However, epoll does not have this problem. It only operates on "active" sockets-this is because epoll is implemented based on the callback function on each fd in kernel implementation. Therefore, only the "active" socket will take the initiative to call the callback function, other idle status socket will not, in this regard, epoll implements a "pseudo" AIO, this is because the driver is in the OS kernel. In some
In benchmark, if all the sockets are basically active-for example, in a high-speed LAN environment, epoll is much less efficient than select/poll, but if epoll_ctl is called too much, the efficiency is slightly reduced. However, once idle connections is used to simulate the WAN environment, epoll is far more efficient than select/poll.

3. Use mmap to accelerate message transmission between the kernel and user space
This actually involves the specific implementation of epoll. Both select, poll, and epoll require the kernel to notify users of FD messages. It is important to avoid unnecessary memory copies, epoll is implemented through the same memory of the user space MMAP kernel. If you start to focus on epoll from the 2.5 kernel like me, you will not forget the manual MMAP step.

4. kernel fine-tuning
This is not the advantage of epoll, but the advantage of the entire Linux platform. Maybe you can doubt the Linux platform, but you cannot avoid the Linux platform giving you the ability to fine-tune the kernel. For example, the Kernel TCP/IP protocol stack uses the memory pool to manage the sk_buff structure. You can dynamically adjust this memory pool (skb_head_pool) during running) by echo XXXX>/proc/sys/NET/CORE/hot_list_length. For example, the listen function's 2nd parameters (TCP completes the length of the packet queue after three handshakes) can also be dynamically adjusted based on the memory size of your platform. You can even try the latest napi NIC driver architecture on a special system with a large number of data packets but a small data packet size. Is the same,
The two and three disadvantages of select are not modified.

Epoll details Data Structure

All functions used by epoll are declared in the SYS/epoll. h header file. The following briefly describes the data structures and functions used:
The data structure used:
Typedef Union epoll_data {
Void * PTR;
Int FD;
_ Uint32_t u32;
_ Uint64_t u64;
} Epoll_data_t;
Struct epoll_event {
_ Uint32_t events;/* epoll events */
Epoll_data_t data;/* User Data variable */
};
The epoll_event struct is used to register events of interest and return events to be processed, while the epoll_data consortium is used to store data related to a file descriptor that triggers events. For example, if a client connects to the server, the server can obtain the socket file descriptor corresponding to the client by calling the accept function, and assign the file descriptor to the FD field of epoll_data, so that subsequent read/write operations can be performed on this file descriptor. The events field of the epoll_event struct indicates the events of interest and triggered events. The possible values are as follows:
Epollin: indicates that the corresponding file descriptor can be read;
Epollout: indicates that the corresponding file descriptor can be written;
Epollpri: indicates that the corresponding file descriptor has urgent data readable;
Epollerr: indicates that the corresponding file descriptor is incorrect;
Epollhup: indicates that the corresponding file descriptor is hung up;
Epollet: indicates that an event occurs in the corresponding file descriptor;

Function

1. epoll_create Function
Function declaration: int epoll_create (INT size)
This function generates a file descriptor dedicated to epoll. The parameter specifies the maximum range of the generated descriptor.

2. epoll_ctl Function
Function declaration: int epoll_ctl (INT epfd, int op, int FD, struct epoll_event * event)
This function is used to control events on a file descriptor. You can register events, modify events, and delete events.
Parameters:
Epfd: The epoll-specific file descriptor generated by epoll_create;
OP: the operation to be performed. The possible values include epoll_ctl_add registration, epoll_ctl_mod modification, and epoll_ctl_del deletion;
FD: the associated file descriptor;
Event: pointer to epoll_event;
If the call is successful, 0 is returned. If the call is unsuccessful,-1 is returned.

3. epoll_wait Function
Function declaration: int epoll_wait (INT epfd, struct epoll_event * events, int maxevents, int timeout)
This function is used to poll the occurrence of I/O events.
Parameters:
Epfd: The epoll-specific file descriptor generated by epoll_create;
Epoll_event: the array used to return the events to be processed;
Maxevents: Number of events that can be processed each time;
Timeout: the timeout value for waiting for an I/O event;
Number of events returned.

Procedure

1. Use the epoll_create () function to create a file description and set the maximum number of socket descriptors that can be managed.

2. Create a receiving Thread associated with epoll. The application can create multiple receiving threads to handle read notification events on epoll. The number of threads depends on the specific needs of the program.

3. create a listener socket descriptor listensock and set this descriptor to non-blocking mode. Call the listen () function to listen for new connection requests on the socket, in the epoll_event structure, set the event type epollin to be processed, use epoll_ctl () to register the event, and finally start the network monitoring thread.

4. The Network monitors the thread startup cycle, and epoll_wait () waits for the epoll event to occur.

5. if the epoll event indicates that a new connection request exists, call the accept () function to add the user socket descriptor to the epoll_data consortium and set the descriptor to be non-blocking, in the epoll_event structure, set the event type to be processed to read and write.

6. if the epoll event indicates that the socket descriptor has readable data, add the socket descriptor to the readable queue, notify the receiving thread to read the data, and put the received data into the linked list of the received data, after logic processing, the feedback data packet is placed in the send data link table, waiting for sending by the sending thread.

Example)

// // a simple echo server using epoll in linux// // 2009-11-05// by sparkling// #include <sys/socket.h>#include <sys/epoll.h>#include <netinet/in.h>#include <arpa/inet.h>#include <fcntl.h>#include <unistd.h>#include <stdio.h>#include <errno.h>#include <iostream>using namespace std;#define MAX_EVENTS 500struct myevent_s{    int fd;    void (*call_back)(int fd, int events, void *arg);    int events;    void *arg;    int status; // 1: in epoll wait list, 0 not in    char buff[128]; // recv data buffer    int len;    long last_active; // last active time};// set eventvoid EventSet(myevent_s *ev, int fd, void (*call_back)(int, int, void*), void *arg){    ev->fd = fd;    ev->call_back = call_back;    ev->events = 0;    ev->arg = arg;    ev->status = 0;    ev->last_active = time(NULL);}// add/mod an event to epollvoid EventAdd(int epollFd, int events, myevent_s *ev){    struct epoll_event epv = {0, {0}};    int op;    epv.data.ptr = ev;    epv.events = ev->events = events;    if(ev->status == 1){        op = EPOLL_CTL_MOD;    }    else{        op = EPOLL_CTL_ADD;        ev->status = 1;    }    if(epoll_ctl(epollFd, op, ev->fd, &epv) < 0)        printf("Event Add failed[fd=%d]/n", ev->fd);    else        printf("Event Add OK[fd=%d]/n", ev->fd);}// delete an event from epollvoid EventDel(int epollFd, myevent_s *ev){    struct epoll_event epv = {0, {0}};    if(ev->status != 1) return;    epv.data.ptr = ev;    ev->status = 0;    epoll_ctl(epollFd, EPOLL_CTL_DEL, ev->fd, &epv);}int g_epollFd;myevent_s g_Events[MAX_EVENTS+1]; // g_Events[MAX_EVENTS] is used by listen fdvoid RecvData(int fd, int events, void *arg);void SendData(int fd, int events, void *arg);// accept new connections from clientsvoid AcceptConn(int fd, int events, void *arg){    struct sockaddr_in sin;    socklen_t len = sizeof(struct sockaddr_in);    int nfd, i;    // accept    if((nfd = accept(fd, (struct sockaddr*)&sin, &len)) == -1)    {        if(errno != EAGAIN && errno != EINTR)        {            printf("%s: bad accept", __func__);        }        return;    }    do    {        for(i = 0; i < MAX_EVENTS; i++)        {            if(g_Events[i].status == 0)            {                break;            }        }        if(i == MAX_EVENTS)        {            printf("%s:max connection limit[%d].", __func__, MAX_EVENTS);            break;        }        // set nonblocking        if(fcntl(nfd, F_SETFL, O_NONBLOCK) < 0) break;        // add a read event for receive data        EventSet(&g_Events[i], nfd, RecvData, &g_Events[i]);        EventAdd(g_epollFd, EPOLLIN|EPOLLET, &g_Events[i]);        printf("new conn[%s:%d][time:%d]/n", inet_ntoa(sin.sin_addr), ntohs(sin.sin_port), g_Events[i].last_active);    }while(0);}// receive datavoid RecvData(int fd, int events, void *arg){    struct myevent_s *ev = (struct myevent_s*)arg;    int len;    // receive data    len = recv(fd, ev->buff, sizeof(ev->buff)-1, 0);     EventDel(g_epollFd, ev);    if(len > 0)    {        ev->len = len;        ev->buff[len] = '/0';        printf("C[%d]:%s/n", fd, ev->buff);        // change to send event        EventSet(ev, fd, SendData, ev);        EventAdd(g_epollFd, EPOLLOUT|EPOLLET, ev);    }    else if(len == 0)    {        close(ev->fd);        printf("[fd=%d] closed gracefully./n", fd);    }    else    {        close(ev->fd);        printf("recv[fd=%d] error[%d]:%s/n", fd, errno, strerror(errno));    }}// send datavoid SendData(int fd, int events, void *arg){    struct myevent_s *ev = (struct myevent_s*)arg;    int len;    // send data    len = send(fd, ev->buff, ev->len, 0);    ev->len = 0;    EventDel(g_epollFd, ev);    if(len > 0)    {        // change to receive event        EventSet(ev, fd, RecvData, ev);        EventAdd(g_epollFd, EPOLLIN|EPOLLET, ev);    }    else    {        close(ev->fd);        printf("recv[fd=%d] error[%d]/n", fd, errno);    }}void InitListenSocket(int epollFd, short port){    int listenFd = socket(AF_INET, SOCK_STREAM, 0);    fcntl(listenFd, F_SETFL, O_NONBLOCK); // set non-blocking    printf("server listen fd=%d/n", listenFd);    EventSet(&g_Events[MAX_EVENTS], listenFd, AcceptConn, &g_Events[MAX_EVENTS]);    // add listen socket    EventAdd(epollFd, EPOLLIN|EPOLLET, &g_Events[MAX_EVENTS]);    // bind & listen    sockaddr_in sin;    bzero(&sin, sizeof(sin));    sin.sin_family = AF_INET;    sin.sin_addr.s_addr = INADDR_ANY;    sin.sin_port = htons(port);    bind(listenFd, (const sockaddr*)&sin, sizeof(sin));    listen(listenFd, 5);}int main(int argc, char **argv){    short port = 12345; // default port    if(argc == 2){        port = atoi(argv[1]);    }    // create epoll    g_epollFd = epoll_create(MAX_EVENTS);    if(g_epollFd <= 0) printf("create epoll failed.%d/n", g_epollFd);    // create & bind listen socket, and add to epoll, set non-blocking    InitListenSocket(g_epollFd, port);    // event loop    struct epoll_event events[MAX_EVENTS];    printf("server running:port[%d]/n", port);    int checkPos = 0;    while(1){        // a simple timeout check here, every time 100, better to use a mini-heap, and add timer event        long now = time(NULL);        for(int i = 0; i < 100; i++, checkPos++) // doesn't check listen fd        {            if(checkPos == MAX_EVENTS) checkPos = 0; // recycle            if(g_Events[checkPos].status != 1) continue;            long duration = now - g_Events[checkPos].last_active;            if(duration >= 60) // 60s timeout            {                close(g_Events[checkPos].fd);                printf("[fd=%d] timeout[%d--%d]./n", g_Events[checkPos].fd, g_Events[checkPos].last_active, now);                EventDel(g_epollFd, &g_Events[checkPos]);            }        }        // wait for events to happen        int fds = epoll_wait(g_epollFd, events, MAX_EVENTS, 1000);        if(fds < 0){            printf("epoll_wait error, exit/n");            break;        }        for(int i = 0; i < fds; i++){            myevent_s *ev = (struct myevent_s*)events[i].data.ptr;            if((events[i].events&EPOLLIN)&&(ev->events&EPOLLIN)) // read event            {                ev->call_back(ev->fd, events[i].events, ev->arg);            }            if((events[i].events&EPOLLOUT)&&(ev->events&EPOLLOUT)) // write event            {                ev->call_back(ev->fd, events[i].events, ev->arg);            }        }    }    // free resource    return 0;} 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.