Working mechanism of tornado in TCP layer
The previous section is about protocol HTTP for the application layer, which relies on the Transport layer protocol TCP, such as how the server binds the port? When was the HTTP server's handle_stream called? This section focuses on the implementation of the TCP hierarchy so that it can be linked to the program flow in the previous section.
The first is about the TCP protocol. This is a reliable, connection-oriented protocol for delivery. Because it is connection-oriented, the server side needs to allocate memory to remember the client connection, the same client needs to record the server. Due to the guarantee of reliable delivery, many mechanisms are introduced to ensure reliability, such as timing retransmission mechanism, syn/ack mechanism, etc., which are quite complex. Therefore, the TCP protocol software in each system is quite complex, this article does not intend to discuss these in depth (I also can not talk about how much, hehe). But we still have to get a sense of TCP. First on a graph (UNIX network programming)-state transition diagram.
Except for a TCP server-side programming classic three-segment code (c implementation):
//Create listener Socketint SFD = socket (af_inet, sock_stream, 0); Bind the socket to the address-port and start listening on the socket. The second parameter of the listen is called a backlog, and the connection queue is related to bind (SFD, (struct sockaddr *) (&s_addr), sizeof (struct sockaddr)) && Listen (sfd, 10); while (1) cfd = Accept (sfd, (struct sockaddr *) (&cli_addr&addr_size);
Above, ignoring all error handling and variable declarations, as the name implies ... More detailed, you can search Linux TCP server programming. So, the summary of TCP programming is to create a listening socket, then bind it to the port and address and start listening, and then stop and accept. This is also the work that Tornado TCPServer to do.
The definition of the TCPServer class is in tcpserver.py. There are two ways to use it: Bind+start or listen.
The first usage can be used for multi-threading, but the same is true for TCP. Let's take listen as an example. TCPServer's __init__ Nothing to note, is to remember the Ioloop this singleton, the next section of the analysis (it is the key to tornado asynchronous performance). The Listen method receives two parameter ports and addresses, the code is as follows
def listen (self, port, address= ""): "" "starts accepting connections on the given port. This method is called more than once to listen on multiple ports. "Listen" takes effect immediately; It is not a necessary to call ' Tcpserver.start ' afterwards. It was, however, necessary to startthe '. Ioloop '. "" " sockets = Bind_sockets (port, address=address) self.add_sockets (sockets)
Above. First the Bind_sockets method receives the address and port to create the sockets list and binds the address port and listens (completes the first two parts of the TCP trilogy), Add_sockets registers the Read/timeout event on these sockets. For high-performance concurrent server programming, you can refer to several programming models in UNIX network programming, Tornado can be seen as a single-threaded event-driven mode server, and the third part of the TCP trilogy is separated into event callbacks, so be sure to have all the files The Listener event on FD (including sockets). After doing these things, you can safely call the Ioloop single-instance start method to begin the loop monitoring event. Specific details can be referenced by the event model of modern high-performance Web servers (NGINX/LIGHTTTPD, etc.), and a little later.
In short, the event-driven server (tornado) is doing what it does: Create a socket, bind to a port and listen, then register the event and the corresponding callback and accept the new request in the callback.
The Bind_sockets method is defined in the Netutil, it is not difficult to create a listener socket for asynchronous, set the socket to be non-blocking (so that the socket derived from it is also non-blocking), and then bind and listen to it. The Add_sockets method receives the socket list, records it with the FD key for the socket in the list, and calls the Add_accept_handler method. It is also defined in Netutil, with the following code:
def add_accept_handler (sock, Callback, Io_loop=none): "" "Adds an" . Ioloop ' event handler to accept new connections on ' sock '. When a connection is accepted, ' callback (connection, address) ' 'll be run (' connection ' is a socket object, and ' The ' address ' is the address of the other end of the connection). Note that this signature are different from the "callback (FD, events) ' signature used for '. Ioloop ' handlers. ' "" " If Io_loop is None: io_loop = Ioloop.current () def accept_handler (FD, events): While True: try: connection, address = sock.accept () except Socket.error as e: if e.args[0] in (errno. Ewouldblock, errno. Eagain): return raise callback (connection, address) Io_loop.add_handler (Sock.fileno (), Accept_handler, Ioloop.read)
One of the parameters to be aware of is callback, which is now pointing to the _handle_connection method of TCPServer. Add_accept_handler the process of the method: the first is to ensure that the Ioloop object. Then call Add_handler to register the read event and callback function Accept_handler on the FD to the Loloop object. The callback function is a defined, Ioloop-level callback that is called whenever an event occurs. The callback content is the accept gets the new socket and the client address, and then calls the callback up layer to pass the event. From the above analysis, when the Read event occurs, Accept_handler is called, and then Callback=_handle_connection is called.
_handle_connection is simpler, skipping over those SSL processing, simplifying to two lines of stream = IOStream (connection, Io_loop=self.io_loop) and Self.handle_stream (). Here iostream represents the IO layer, later, anyway, read and write is not worried. Next, call Handle_stream. We can see that regardless of the protocol (or custom protocol) of the application layer, when a new connection arrives, the process is similar and goes through a callback of appeal, except the Handle_stream method. This method is overridden by a subclass custom, and its HTTP implementation has been seen in the previous section.
Here, the code flow to the previous section is connected to the rail. How does the callback occur when an event occurs? What is the process of ioloop.instance (). Start () in app.py? Continue tomorrow, see Tornado. The root of asynchronous high performance
The design interpretation of Tornado TCPServer class
As has been said before, Httpserver is derived from TCPServer, which is natural from the protocol level.
From the implementation of TCPServer, it is a common server framework, basically according to the BSD socket idea design. Create-bind-listen three-paragraph one is a lot.
Chasing down from helloworld.py, you can see:
- The main function in helloworld.py creates the httpserver.
- Httpserver inherits from TCPServer, and TCPServer's constructor is called directly in the constructor of Httpserver.
Next we'll look at the implementation of the TCPServer class, where the code is placed in the tornado/tcpserver.py. tcpserver.py only more than 200 lines, not too much. All code is implemented in the TCPServer class.
TCPServer
In the comments of the TCPServer class, it is first emphasized that it is a non-blocking, single-threaded TCP Server.
How do you understand it?
Non-blocking, that is, the server does not use the blocking API.
What is a blocking design? For example, in a BSD socket, the RECV function is blocked by default. When using RECV to read client data, if the other party does not send the data, then the API will always be blocked there does not return. In this way, the server design has to use multi-threaded or multi-process way, avoid because of one API blocking cause the server cannot do other things. Blocking API is very common, we can simply think that the blocking design is "whether there is no data, the server is sent API to read, cannot read, the API will not come back."
Rather than blocking, for recv, the difference is that when there is no data to read, it does not death, it returns directly. You might think that this approach is more dwarfish than blocking, because the server cannot predict if there is data to read, and has to send recv functions to read it over and over again. Isn't this a waste of a lot of CPU resources?
Of course not so silly. Tornado here says non-blocking to be much more advanced, is basically another way of thinking: the server does not actively read the data, it and the operating system to achieve a "monitor", TCP connection is its monitoring object. When there is data coming on a connection, the operating system notifies the server in advance of the agreement: The data arrives on the connection, so you can deal with it. The server then sends the API to fetch the data. The server does not have to create a large number of threads to block the processing of each connection, and do not have to send the API to check whether there is data on the connection, it only need to sit there and other operating system notifications, which ensures that the Recv API shot will not be frustrated.
Tornado Another highlighted feature is single-threaded, which is because our "monitor" is very efficient and can monitor the status of thousands of connections in one thread, basically without the need to divert threads. The results show that it is more efficient than blocking multi-threaded or multi-process design--of course, this relies on the strong cooperation of the operating system, now the mainstream operating system provides a very high-end atmosphere of the "monitor" mechanism, such as Epoll, Kqueue.
The author mentions that this class is generally not instantiated directly, but derives its subclass from it, and then instantiates it with subclasses.
To reinforce this design idea, the author defines an interface that is not directly implemented, called Handle_stream ().
def handle_stream (self, Stream, address): "" and " Override to handle a new". IOStream ' from an incoming connection. "" " Raise Notimplementederror ()
This is a good technique, forcing subclasses to cover this method, otherwise it will give you an error to see!
TCPServer is SSL-enabled. Because of the power of Python, SSL is not a bother to support. To start a tcpserver that supports SSL, just tell it your certifile and keyfile.
TCPServer (ssl_options={"CertFile": Os.path.join (Data_dir, "Mydomain.crt"), "KeyFile": Os.path.join (Data_dir, " Mydomain.key "),})
On the ins and outs of these two files, you can go to Google "Digital certificate principle" this article.
Three types of TCPServer
There are three types of tcpserver initialization.
1. Single Process form
Server = TCPServer () server.listen (8888) ioloop.instance (). Start ()
What we see in the helloworld.py is this usage, not to repeat it.
2. Multi-process form.
Server = TCPServer () server.bind (8888) server.start (0) # forks multiple sub-processesioloop.instance (). Start (
The difference is mainly in Server.start (0) here. After analyzing the two member functions of listen () and start (), you will see how they are combined with the process.
Note: When this mode starts, you cannot pass the Ioloop object to the TCPServer constructor, which causes the tcpserver to start directly on a single process.
3. Advanced multi-process form.
sockets = Bind_sockets (8888) tornado.process.fork_processes (0) Server = TCPServer () server.add_sockets (sockets) Ioloop.instance (). Start ()
Advanced means complexity. From the above code, although only a few one or two lines, the actual flow inside there is a relatively large difference.
The main advantage of this approach is tornado.process.fork_processes (0), which provides more flexibility for process creation. Now, of course, it's confusing to say that, after drilling into the code, let's verify the argument here.
All of the above are mentioned in the doc string of the TCPServer class. The following section begins to look at code.
Python Tornado Framework (TCP layer)