Analysis of pythonselectepollpoll

Source: Internet
Author: User
This article describes the differences between select, poll, and epoll by analyzing pythonselectepollpoll.

Select

Select first appeared in. it monitors arrays of multiple file descriptors through a select () system call (everything in linux is a file, a block device, socket connection, etc .), When select () is returned, the ready file descriptor in the array will be changed to ready by the kernel ), this allows the process to obtain these file descriptors for subsequent read/write operations (select constantly monitors the number of file descriptors under a directory of the network interface into ready state [in the network interface, after a connection is established, a 'file' will be created. after the state changes to ready, select can operate on the file descriptor ).

[Socketserver processes multiple requests through multiple threads. each connection is allocated with one thread for processing. However, the select statement is a single process, and the code executed by one process must be serial, but now we need to use a process to achieve the effect of concurrency. there is only one main thread in a process, so we will talk about the effect of using one thread to achieve concurrency. Why use a single process to achieve multi-concurrency without multithreading to achieve multi-concurrency?

============ A: Because multi-concurrency is more efficient for a process than multi-thread, it has a lot of overhead to start multithreading, in addition, the CPU constantly checks the status of each thread to determine which thread can be executed. This is also stressful for the system. using a single process can avoid such overhead and the pressure on the system,

How can a single process achieve multi-concurrency ???

========== A: The producer and consumer mode (asynchronous) is cleverly used, and the producer and consumer can implement non-blocking, A socketserver receives multiple connections through select (a previous socket process can only receive one connection, and blocking occurs when receiving new connections because the socket process needs to communicate with the client first, the two are waiting for each other [the client sends a message, the server receives the message, and the client waits to return .... the server is waiting to receive .........] The connection is always blocked. if another connection is established at this time, the connection can be connected only after the previous connection is disconnected. ----------- That is to say, using the basic socket to implement multi-process blocking. To solve this problem, it is not blocked to generate a thread for each connection. However, when the number of threads is too large, the overhead and pressure on the cpu are relatively high .) For a single socket, most of the blocking time is waiting for IO operations (network operations are also IO operations ). To avoid this situation, a client initiates a connection and registers a file handle on the server, the server continuously polls the list of these file handles. The main process establishes a connection with the client without starting the thread. at this time, the main process interacts with the client, and other clients cannot connect to the main process, in order to enable the master process to send and receive messages from connected clients and establish connections with new clients, the round robin speed becomes very fast (Endless loop) click it to refresh the list of file handles connected to the client. Once the client sends a message, the server reads the message, and another list receives the message returned to the client, after the client is flushed, the communication with the client is completed, but the connection with the client has not been broken, but the next round robin is entered.]

Advantages of select

Currently, the select statement is supported on almost all platforms and is good at cross-platform performance.

Disadvantages of select

Each time you call the select statement, you need to copy the fd set from the user state to the kernel state. This overhead is very high in many cases of fd.

There is a maximum limit on the number of fd instances that a single process can monitor. the default value is 1024 in linux (you can increase this limit by modifying the macro definition or re-compiling the kernel)

In addition, since select fd is placed in an array and the entire array needs to be traversed linearly each time, the overhead is also high when fd is large.

Python select

The select function is readable, writable, and predictional = select. select (rlist, wlist, xlist [, timeout]), the first three parameters are three lists respectively, and the objects in the array are waitable objects: all are integer file descriptors or an object with the fileno () method of the returned file descriptor;

Rlist: list waiting for Read

Wlist: list waiting for writing

Errlist: list waiting for "exception"

The select method is used to monitor the file descriptor. if the file descriptor changes, this descriptor is obtained.

1. these three lists can be an empty list, but receiving three empty lists is dependent on the system (which is acceptable on Linux, but it is not allowed on the window ).

2. when the descriptors in the rlist sequence are readable (accetp and read), The Changed descriptors are obtained and added to the readable sequence.

3. when the wlist sequence contains descriptors, all the descriptors in the sequence are added to the writable sequence.

4. When the handle in the errlist sequence is incorrect, add the handle with the error to the predictional sequence.

5. when the time-out period is not set, the select statement will be blocked until the listener descriptor changes.

When the timeout value is 1, if the listening handle does not change, select will block 1 second, and then return three empty lists. if the listening descriptor (fd) changes, then run the command directly.

6. Ptython file objects (such as sys. stdin, or open () and OS. the object returned by open (). The socket object will return the socket. socket (). You can also customize the class as long as there is an appropriate fileno () method (a file descriptor needs to be actually returned, rather than a random integer ).

Select example:

The select () method of Python directly calls the I/O interface of the operating system. it monitors when sockets, open files, and pipes (all file handles with fileno () methods) are changed to readable and writeable, or communication error. select () makes it easier to monitor multiple connections at the same time, and it is more efficient than to write a long loop to wait and monitor multiple client connections, because select operates directly through the network interface C provided by the operating system, rather than through the Python interpreter

# Coding: UTF8import selectimport socketimport sysimport Queue # Create a TCP/IP process server = socket. socket (socket. AF_INET, socket. SOCK_STREAM) server. setblocking (0) # connection address and port server_address = ('localhost', 10000) print> sys. stderr, 'starting up on % s prot % s' % server_addressserver.bind (server_address) # maximum number of allowed connections server. listen (5) inputs = [server] outputs = [] message_queues ={} while inputs: print> sys. stderr, '\ nwaiting for the next event' readable, writable, predictional = select. select (inputs, outputs, inputs) # Handle inputs for s in readable: if s is server: # A "readable" server socket is ready to accept a connection, client_address = s. accept () print> sys. stderr, 'New connection from', client_address # connection. setblocking (0) inputs. append (connection) # Give the connection a queue for data we want to send message_queues [connection] = Queue. queue () else: data = s. recv (1024) if data: # A readable client socket has data print> sys. stderr, 'stored Ed "% s" from % s' % (data, s. getpeername () message_queues [s]. put (data) # This s is equivalent to connection # Add output channel for response if s not in outputs: outputs. append (s) else: # Interpret empty result as closed connection print> sys. stderr, 'closing', client_address, 'After reading no data' # Stop listening for input on the connection if s in outputs: outputs. remove (s) # since the client is disconnected, I don't need to return data to it. so if the client's connection object is still in the outputs list, delete it. remove (s) # Delete s in inputs. close () # close this connection # Remove message queue del message_queues [s] # Handle outputs for s in writable: try: next_msg = message_queues [s]. get_nowait () Queue T Queue. empty: # No messages waiting so stop checking for writability. print> sys. stderr, 'output queue for ', s. getpeername (), 'is empty' outputs. remove (s) else: print> sys. stderr, 'Sending "% s" to % s' % (next_msg, s. getpeername () s. send (next_msg.upper () # Handle "exceptional conditions" for s in exceptional: print> sys. stderr, 'handling exceptional condition for ', s. getpeername () # Stop listening for input on the connection inputs. remove (s) if s in outputs: outputs. remove (s) s. close () # Remove message queue del message_queues [s] server

Code parsing:

The select () method receives and monitors three communication lists. The first is all input data, that is, external data, 2nd are monitoring and receiving all outgoing data and 3rd monitoring error messages, next, we need to create two lists that contain input and output information to pass to select ().

# Sockets from which we recommend CT to readinputs = [server] # Sockets to which we recommend CT to writeoutputs = []

The connections and data from all clients will be processed by the server's main loop program in the list above. our current server needs to wait until the connection is writable, then, the system receives and returns the data (not immediately after receiving the data), because each connection caches the input or output data to the queue first, then it is obtained by the select statement and then sent out.

# Outgoing message queues (socket: Queue) message_queues = {}

The main portion of the server program loops, calling select () to block and wait for network activity.

The following is the main loop of this program. when you call select (), it will block and wait until new connections and data come in.

While inputs: # Wait for at least one of the sockets to be ready for processing print> sys. stderr, '\ nwaiting for the next event' readable, writable, predictional = select. select (inputs, outputs, inputs)

After you pass inputs, outputs, and predictional (shared with inputs) to select (), it returns three new lists. we assign them to readable, writable, predictional. All socket connections in the readable list indicate that data can be received (recv), and all socket connections in the writable list that you can send data, when an error occurs during connection communication, the error is written to the predictional list.

The socket in the Readable list can have three possible states. The first is that if the socket is a main "server" socket, it is responsible for listening to client connections, if the main server socket appears in readable, it indicates that the server has ready to receive a new connection. in order to allow the main server to process multiple connections at the same time, in the following code, we set the socket of the main server to non-blocking mode.

The second case is that the socket is a connection that has been established, and it sends the data. at this time, you can use recv () to receive the data sent from it, then, put the received data in the queue so that you can send the received data back to the client.

The third case is that the client is disconnected, so the data you receive through recv () is empty, so you can close the connection with the client at this time.

There are several statuses for the socket in the writable list. if the client connects to the corresponding queue with data, it will take the data and send it back to the client, otherwise, the connection will be removed from the output list. in this way, when the next round-robin select () call detects that this connection is not found in the outputs list, the connection will be considered to be inactive.

Finally, if an error occurs during the communication with a socket, delete the connection object in inputs \ outputs \ message_queue and close the connection.

#coding:UTF8import socketimport sys messages = [ 'This is the message. ',             'It will be sent ',             'in parts.',             ]server_address = ('localhost', 10003) # Create a TCP/IP socketsocks = [ socket.socket(socket.AF_INET, socket.SOCK_STREAM),          socket.socket(socket.AF_INET, socket.SOCK_STREAM),          ] # Connect the socket to the port where the server is listeningprint >>sys.stderr, 'connecting to %s port %s' % server_addressfor s in socks:    s.connect(server_address)for message in messages:     # Send messages on both sockets    for s in socks:        print >>sys.stderr, '%s: sending "%s"' % (s.getsockname(), message)        s.send(message)     # Read responses on both sockets    for s in socks:        data = s.recv(1024)        print >>sys.stderr, '%s: received "%s"' % (s.getsockname(), data)        if not data:            print >>sys.stderr, 'closing socket', s.getsockname()client

The client program shows how to manage the socket through select () and interact with multiple connections at the same time, and send and receive data to the server through each socket connection cyclically.

Server: starting up on localhost prot 10000 waiting for the next eventnew connection from ('2017. 0.0.1 ', 54812) waiting for the next eventnew connection from ('2017. 0.0.1 ', 54813) received "This is the message. "from ('2017. 0.0.1 ', 54812) waiting for the next eventreceived "This is the message. "from ('2017. 0.0.1 ', 54813) sending "This is the message. "to ('2017. 0.0.1 ', 54812) waiting for the next eventoutput queue for ('2017. 0.0.1 ', 54812) is emptysending "This is the message. "to ('2017. 0.0.1 ', 54813) waiting for the next eventoutput queue for ('2017. 0.0.1 ', 54813) is emptywaiting for the next eventreceived "It will be sent" from ('2017. 0.0.1 ', 54812) encoded Ed "It will be sent" from ('2017. 0.0.1 ', 54813) waiting for the next eventsending "It will be sent" to ('2017. 0.0.1 ', 54812) sending "It will be sent" to ('2017. 0.0.1 ', 54813) waiting for the next eventoutput queue for ('2017. 0.0.1 ', 54812) is emptyoutput queue for ('2017. 0.0.1 ', 54813) is emptywaiting for the next eventreceived "in parts. "from ('2017. 0.0.1 ', 54812) partitioned ed "in parts. "from ('2017. 0.0.1 ', 54813) waiting for the next eventsending "in parts. "to ('2017. 0.0.1 ', 54812) sending "in parts. "to ('2017. 0.0.1 ', 54813) waiting for the next eventoutput queue for ('2017. 0.0.1 ', 54812) is emptyoutput queue for ('2017. 0.0.1 ', 54813) is emptywaiting for the next eventclosing ('2017. 0.0.1 ', 54813) after reading no dataclosing ('2017. 0.0.1 ', 54813) after reading no datawaiting for the next eventclient: connecting to localhost port 10000 ('2017. 0.0.1 ', 54812): sending "This is the message. "('2017. 0.0.1 ', 54813): sending "This is the message. "('2017. 0.0.1 ', 54812): received "this is the message. "('2017. 0.0.1 ', 54813): received "this is the message. "('2017. 0.0.1 ', 54812): sending "It will be sent" ('2017. 0.0.1 ', 54813): sending "It will be sent" ('2017. 0.0.1 ', 54812): pinned Ed "it will be sent" ('2017. 0.0.1 ', 54813): pinned Ed "it will be sent" ('2017. 0.0.1 ', 54812): sending "in parts. "('2017. 0.0.1 ', 54813): sending "in parts. "('2017. 0.0.1 ', 54812): partitioned ed "in parts. "('2017. 0.0.1 ', 54813): partitioned ed "in parts. "Running result

Poll
Poll was born in System V Release 3 in 1986. it is essentially no different from select, but poll has no limit on the maximum number of file descriptors.

Poll and select have the same disadvantage: the array containing a large number of file descriptors is copied between the user state and the kernel address space, regardless of whether these file descriptors are ready, its overhead increases linearly with the increase in the number of file descriptors.

In addition, after select () and poll () tell the process the ready file descriptor, if the process does not perform IO operations on it, the next call of select () and poll () these file descriptors are reported again, so they generally do not lose the ready message. this method is called Level Triggered ).

Call poll in Python

Select. poll (), returns a poll object that supports file descriptor registration and cancellation.

Poll. register (fd [, eventmask]) registers a file descriptor. After registration, you can use the poll () method to check whether a corresponding I/O event occurs. Fd can be an I integer or a fileno () method object that returns an integer. If the File object implements fileno (), it can also be used as a parameter.

Eventmask is an event type you want to check. it can be a combination of constants POLLIN, POLLPRI, and POLLOUT. By default, all three event types are checked by default.

Event Constant meaning

POLLIN has data reading

POLLPRT has urgent data reading

POLLOUT preparation output: output will not be blocked

POLLERR errors

POLLHUP pending

Invalid POLLNVAL request: description cannot be opened

Poll. modify (fd, eventmask) modifies an existing fd, which has the same effect as poll. register (fd, eventmask. If you try to modify an unregistered fd, an IOError with errno as ENOENT will occur.

Poll. unregister (fd) cancels an fd from the poll object. Attempting to deregister an unregistered fd will cause KeyError.

Poll. poll ([timeout]) to detect registered file descriptors. Returns a list that may be empty. The list contains binary groups such as (fd, event. Fd is the file descriptor, and event is the event corresponding to the file descriptor. If an empty list is returned, the returned list times out and no file descriptor event occurs. The unit of timeout is milliseconds. if timeout is set, the system will wait for the corresponding time. If timeout defaults to None, this method will block until an event occurs on the corresponding poll object.

#coding: utf-8 import select, socketresponse = b"hello world"serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)serversocket.bind(('localhost', 10000))serversocket.listen(1)serversocket.setblocking(0)#poll = select.poll()poll.register(serversocket.fileno(), select.POLLIN)connections = {}while True:    for fd, event in poll.poll():        if event == select.POLLIN:            if fd == serversocket.fileno():                con, addr = serversocket.accept()                poll.register(con.fileno(), select.POLLIN)                connections[con.fileno()] = con            else:                con = connections[fd]                data = con.recv(1024)                if data:                    poll.modify(con.fileno(), select.POLLOUT)        elif event == select.POLLOUT:            con = connections[fd]            con.send(response)            poll.unregister(con.fileno())            con.close()

Epoll
It wasn't until Linux2.6 that there was an implementation method directly supported by the kernel, namely epoll, which had almost all the advantages mentioned earlier, it is recognized that Linux2.6 has the best performance in multi-channel I/O ready notification methods.

Epoll supports both horizontal triggering and Edge triggering (Edge Triggered). it only tells the process which file descriptors have just changed to the ready state. if we do not take any action, so it will not tell you again that this method is called Edge triggering). Theoretically, edge triggering has a higher performance, but the code implementation is quite complicated.

Epoll also only informs the ready file descriptors. when we call epoll_wait () to obtain the ready file descriptor, the returned value is not the actual descriptor, but a value representing the number of ready descriptors, you only need to obtain the corresponding number of file descriptors in sequence in an array specified by epoll. the memory ing (mmap) technology is also used here, this completely saves the overhead of copying these file descriptors during system calls.

Another essential improvement is that epoll uses event-based readiness notification. In select/poll, the kernel scans all monitored file descriptors only after a certain method is called, and epoll uses epoll_ctl () in advance () to register a file descriptor. Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback to quickly activate this file descriptor. when the process calls epoll_wait (), it will be notified.

Calling epoll in Python

Select. epoll ([sizehint =-1]) returns an epoll object.

Eventmask

Event Constant meaning

EPOLLIN read ready

EPOLLOUT write-ready

EPOLLPRI has urgent data reading

EPOLLERRassoc. fd error occurs

EPOLLHUPassoc. fd suspended

EPOLLRT sets edge triggering (ET) (horizontal triggering by default)

EPOLLONESHOT is set to one-short. after an event is pulled out, the corresponding fd is disabled internally.

EPOLLRDNORM and EPOLLIN are equal.

EPOLLRDBAND data band)

EPOLLWRNORM and EPOLLOUT are equal

EPOLLWRBAND data band)

EPOLLMSG ignored

Epoll. close () closes the file descriptor of the epoll object.

Epoll. fileno returns the number of the file descriptor of control fd.

Epoll. fromfd (fd) uses the fd given to create an epoll object.

Epoll. register (fd [, eventmask]) registers a file descriptor in the epoll object. (If the file descriptor already exists, an IOError will occur)

Epoll. modify (fd, eventmask) modifies a registered file descriptor.

Epoll. unregister (fd) cancels a file descriptor.

Epoll. poll (timeout =-1 [, maxevnets =-1]): wait for the event. the unit of timeout (float) is second ).

#coding:Utf8import socket, selectEOL1 = b'\n\n'EOL2 = b'\n\r\n'response  = b'HTTP/1.0 200 OK\r\nDate: Mon, 1 Jan 1996 01:01:01 GMT\r\n'response += b'Content-Type: text/plain\r\nContent-Length: 13\r\n\r\n'response += b'Hello, world!'serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)serversocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)serversocket.bind(('localhost', 10000))serversocket.listen(1)serversocket.setblocking(0)epoll = select.epoll()epoll.register(serversocket.fileno(), select.EPOLLIN)try:   connections = {}; requests = {}; responses = {}   while True:      events = epoll.poll(1)      for fileno, event in events:         if fileno == serversocket.fileno():            connection, address = serversocket.accept()            connection.setblocking(0)            epoll.register(connection.fileno(), select.EPOLLIN)            connections[connection.fileno()] = connection            requests[connection.fileno()] = b''            responses[connection.fileno()] = response         elif event & select.EPOLLIN:            requests[fileno] += connections[fileno].recv(1024)            if EOL1 in requests[fileno] or EOL2 in requests[fileno]:               epoll.modify(fileno, select.EPOLLOUT)               print('-'*40 + '\n' + requests[fileno].decode()[:-2])         elif event & select.EPOLLOUT:            byteswritten = connections[fileno].send(responses[fileno])            responses[fileno] = responses[fileno][byteswritten:]            if len(responses[fileno]) == 0:               epoll.modify(fileno, 0)               connections[fileno].shutdown(socket.SHUT_RDWR)         elif event & select.EPOLLHUP:            epoll.unregister(fileno)            connections[fileno].close()            del connections[fileno]finally:   epoll.unregister(serversocket.fileno())   epoll.close()   serversocket.close()

The above is the detailed explanation of the python select epoll poll. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.