Python Web Server Gateway Interface for Building Web sites, pythonweb
In building a Web server and a Web Framework for a Web site in Python, We have figured out the concepts of Web servers, Web applications, and Web frameworks. For Python, more and more Web frameworks are available, which limit our selection of Web servers while giving us more choices. Java has many Web frameworks. Because of the existence of servlet APIs, any Java Web framework-written application can run on any Web Server.
The Python community also needs such an API to adapt to Web servers and applications. This API is WSGI (Python Web Server Gateway Interface), which is described in detail in PEP 3333. To put it simply, WSGI serves as a bridge between the Web server and Web applications. On the one hand, it obtains the original HTTP data from the Web server, processes it in a unified format, and then delivers it to the Web application, on the other hand, the business logic is processed from the application/framework side, and the response content is generated and handed over to the server.
Detailed process of coupling between Web servers and frameworks through WSGI is shown in:
WSGI Server adaptation
The specific explanation is as follows:
An application (network framework) provides a callable object named application (WSGI does not specify how to implement this object ). After receiving a request from an HTTP client, the server calls the callable object application. when calling the application, it passes a dictionary named environ as the parameter and a callable object named start_response. The framework/application generates the HTTP status code and HTTP Response Header, and then transmits the two to start_response, waiting for the server to save. In addition, the framework/application will return the response body. The server combines the status code, response header, and response body into an HTTP Response and returns it to the client (this step is not part of the WSGI protocol ).
The following describes how WSGI is adapted from the server and application respectively.
Server
We know that each HTTP request sent by a client (usually a browser) consists of three parts: request line, message header, and request body, which contains the details of this request. For example:
Method: indicates the Method executed on the resource identified by Request-URI, including GET, POST, and other User-Agent: allow the client to tell the server its operating system, browser, and other properties;
After the server receives an HTTP request from the client, the WSGI interface must unify these request fields for easy transmission to the application server interface (in fact, it is for the Framework ). Which data is transmitted by the Web server to the application? As early as CGI (Common Gateway Interface, General Gateway Interface), there are detailed rules that are called CGI environment variables. WSGI follows the CGI Environment Variable content and requires the Web server to create a dictionary to save these environment variables (generally named environ ). In addition to CGI-Defined Variables, environ must also save some WSGI-Defined Variables. In addition, it can save some client system environment Variables. You can refer to environ Variables to see which Variables are specific.
Then, the WSGI interface must hand over environ to the application for processing. Here, WSGI requires the application to provide a callable object application, and then the server calls the application to obtain the HTTP response body. When the server calls the application, it must provide two variables: one is the variable dictionary environ mentioned earlier, and the other is the callable object start_response, which generates the status code and response header, in this way, we get a complete HTTP response. The Web server returns the response to the client. A complete HTTP request-response process is complete.
Wsgiref Analysis
Python has a built-in Web server that implements the WSGI interface. In the wsgiref module, it is a reference implementation of the WSGI server written in pure Python, let's take a brief look at its implementation. Let's start a Web server with the following code:
# Instantiate the server httpd = make_server( 'localhost', # The host name 8051, # A port number where to wait for the request application # The application object name, in this case a function ) # Wait for a single request, serve it and quit httpd.handle_request()
Then, we use the Web server to receive a request, generate environ, and then call the application to process the request. This is the main line to analyze the source code calling process, as shown in:
WSGI Server call Process
There are three main classes: WSGIServer, WSGIRequestHandler, and ServerHandle. WSGIServer is a Web server class that provides the server_address (IP: Port) and WSGIRequestHandler classes for initialization to obtain a server object. This object listens to the response port. After receiving the HTTP request, it creates a RequestHandler class instance through finish_request. A Handle class instance is generated during the instance initialization process, then, call the run (application) function, and then call the application object provided by the application to generate a response.
The inheritance relationships of these three classes are shown in:
WSGI class inheritance relationship diagram
Specifically, TCPServer uses socket to complete TCP communication, while HTTPServer is used for HTTP-level processing. Similarly, StreamRequestHandler is used to process stream socket, while BaseHTTPRequestHandler is used to process HTTP-level content. This part has little to do with the WSGI interface. It is more about the specific implementation of Web servers and can be ignored.
Microserver instance
If the above wsgiref is too complex, we will implement a tiny Web server together to facilitate our understanding of the implementation of the WSGI interface on the Web server side. The code is extracted from the self-developed network server (2) and placed on gist. The main structure is as follows:
Class WSGIServer (object): # socket parameter address_family, socket_type = socket. AF_INET, socket. SOCK_STREAM request_queue_size = 1 def _ init _ (self, server_address): # TCP server initialization: Create a socket, bind the address, listen to the port # Get the server address, port def set_app (self, application): # obtain the application self provided by the framework. application = application def serve_forever (self): # process TCP connections: Get the request content, call the processing function def handle_request (self): # parse the HTTP request, obtain the environ, and process the request content, return HTTP Response Result: env = self. get_environ () result = self. application (env, self. start_response) self. finish_response (result) def parse_request (self, text): # parse the HTTP request def get_environ (self): # analyze the environ parameter. Here is just an example. There are many parameters in the actual situation. Env ['wsgi. url_scheme '] = 'HTTP '... env ['request _ method'] = self. request_method # GET... return env def start_response (self, status, response_headers, exc_info = None): # Add the response header and status code self. headers_set = [status, response_headers + server_headers] def finish_response (self, result): # Return HTTP Response Information SERVER_ADDRESS = (HOST, PORT) = '', 8888 # create a server instance def make_server (server_address, application): server = WSGIServer (server_address) server. set_app (application) return server
Currently, there are many mature Web servers that support WSGI, and Gunicorn is quite good. It originated from Unicorn in the ruby community and was successfully transplanted to python to become a wsgi http Server. It has the following advantages:
Easy to configure. Multiple worker processes can be automatically managed. Different backend extended interfaces (such as sync, gevent, and tornado) can be selected)
Compared with the server side, the application side (or you can think of the Framework) is much simpler to do. It only needs to provide a callable object (usually used to name it application ), this object receives two parameters passed by the server: environ and start_response. The callable object can be not only a function, but also a class (the second example below) or an instance with the _ call _ method, in short, as long as you can accept the two parameters mentioned above and the return value can be iterated by the server.
The specific purpose of the Application is to perform certain business processing based on the information about HTTP requests provided by environ, and return an iteratable object. The server end iterates this object through, to obtain the body of the HTTP response. If no response body is returned, None is returned.
At the same time, the application will also call the start_response provided by the server to generate the HTTP response status code and Response Header. The prototype is as follows:
def start_response(self, status, headers,exc_info=None):
Application needs to provide status: A string indicating the HTTP response status string, and response_headers: a list containing tuples in the following form: (header_name, header_value), used to represent the HTTP response headers. At the same time, exc_info is optional. It is used to return information to the browser when an error occurs.
So far, we can implement a simple application, as shown below:
def simple_app(environ, start_response): """Simplest possible application function""" HELLO_WORLD = "Hello world!\n" status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [HELLO_WORLD]
Or use the class to implement the following.
class AppClass: """Produce the same output, but using a class""" def __init__(self, environ, start_response): self.environ = environ self.start = start_response def __iter__(self): ... HELLO_WORLD = "Hello world!\n" yield HELLO_WORLD
Note that the AppClass class itself is the application, which is called (instantiated) using environ and start_response. It returns an instance object, which can be iterated and meets the application requirements of WSGI.
If you want to use the object of the AppClass class as the application, you must add a _ call _ Method to the class, accept environ and start_response as parameters, and return the iteratable object, as shown below:
class AppClass: """Produce the same output, but using an object""" def __call__(self, environ, start_response):
This part involves some advanced features of python, such as yield and magic method. You can refer to the python language points I have summarized to understand.
WSGI in Flask
Flask is a lightweight Python Web framework that complies with WSGI specifications. Its initial version only has more than 600 lines, which is relatively easy to understand. Next, let's take a look at the part about the WSGI interface in its initial version.
def wsgi_app(self, environ, start_response): """The actual WSGI application. This is not implemented in `__call__` so that middlewares can be applied: app.wsgi_app = MyMiddleware(app.wsgi_app) """ with self.request_context(environ): rv = self.preprocess_request() if rv is None: rv = self.dispatch_request() response = self.make_response(rv) response = self.process_response(response) return response(environ, start_response) def __call__(self, environ, start_response): """Shortcut for :attr:`wsgi_app`""" return self.wsgi_app(environ, start_response)
Here, wsgi_app implements the application function. rv encapsulates requests and response is a specific function used by the Framework to process business logic. The flask source code is not explained too much here. If you are interested, you can download the flask source code from github and check it to the original version.
Middleware
The preceding notes in the wsgi_app function of flask Code do not directly implement the application Section in _ call _ to use middleware. So why is middleware used?
Review the previous application/server interfaces. For an HTTP request, the server always calls an application for processing and returns the result after the application is processed. This is enough to deal with general scenarios, but it is not perfect. Consider the following application scenarios:
For different requests (such as different URLs), the server needs to call different applications, so how can we choose which one to call? For Load Balancing or remote processing, the application running on other hosts on the network must be used for processing; the content returned by the application must be processed before it can be used as an HTTP response;
One of the preceding scenarios is that some necessary operations are not suitable on the server side or on the application (framework) side. For applications, these operations should be performed by the server. For the server, these operations should be performed by the application. Middleware is introduced to handle this situation.
Middleware is like a bridge between the application end and the server end to communicate with each other. For the server side, middleware acts like an application, and for the application side, it acts like a server. As shown in:
Middleware
Middleware implementation
The flask framework uses middleware in the Flask class initialization code:
self.wsgi_app = SharedDataMiddleware(self.wsgi_app, { self.static_path: target })
The function here is the same as that in python. It is to execute SharedDataMiddleware before and after self. wsgi_app is executed. What the middleware does is very similar to what the python loader does. SharedDataMiddleware middleware is provided by the werkzeug library to support static content hosted on the site. In addition, the DispatcherMiddleware middleware is used to support calling different applications based on different requests. In this way, the problems in scenario 1 and 2 can be solved.
Let's take a look at the implementation of DispatcherMiddleware:
class DispatcherMiddleware(object): """Allows one to mount middlewares or applications in a WSGI application. This is useful if you want to combine multiple WSGI applications:: app = DispatcherMiddleware(app, { '/app2': app2, '/app3': app3 }) """ def __init__(self, app, mounts=None): self.app = app self.mounts = mounts or {} def __call__(self, environ, start_response): script = environ.get('PATH_INFO', '') path_info = '' while '/' in script: if script in self.mounts: app = self.mounts[script] break script, last_item = script.rsplit('/', 1) path_info = '/%s%s' % (last_item, path_info) else: app = self.mounts.get(script, self.app) original_script_name = environ.get('SCRIPT_NAME', '') environ['SCRIPT_NAME'] = original_script_name + script environ['PATH_INFO'] = path_info return app(environ, start_response)
During middleware initialization, A mounts dictionary is provided to specify the application ing between different URL paths and applications. In this way, the middleware checks the path of a request and selects the appropriate application for processing.
The principle of WSGI is basically over. Next I will introduce my understanding of the flask framework.