Python web world view-Overview of web architecture (suitable for traditional programmers)
Problems faced by traditional web Servers
We know that in traditional web server, a process opens a socket and listens, and requests are sent to generate a new process (or thread, or blocking) for response, and it is still listening. This is the network model that most people who have read unix Network Programming come into contact. However, unix is too old, and network requirements have changed dramatically in recent years. The most important thing is the requirements for concurrency.
The increase in concurrency makes the architecture of the web server of the local machine also change, and the word "local" also has different requirements. Because server requests may not be processed by one machine, one problem that needs to be solved is the collaboration and synchronization problem between multiple server machines.
Modern web server Requirements
The first is the local change. There are two changes in the local machine: Increased concurrency and increased demand for rapid development. We use python as an example. Because of the requirements for fast development and iteration, C ++ and so on, although the performance is good, but the change speed is slow, at present most of the network system backend is made using python, after the stability, switch to the high-performance C ++ or C or go background. Unfortunately, the Internet has been unstable...
Improve concurrency on a single machine
As mentioned above in the traditional web server model, if millions of access requests are sent, wouldn't millions of processes be generated? (In linux, the thread is also a process, occupying the pid), and process management actually consumes system resources, which leads to: no matter how powerful the hardware is, it will sooner or later be exhausted by the concurrency. To solve this problem, the concept of fiber (coroutine) is applied. The process itself is the scheduling entity provided by the operating system for users. The specific scheduling is completed by the operating system. Coroutine is a pseudo process implemented by the application itself in a single process. Multiple coroutine is the execution path of multiple codes, the scheduling of coroutine is done by the application itself (the current general practice is that each coroutine itself grants the execution right, for example, using the yield keyword ). In this way, the internal model of an operating system generates multiple independent parallel execution units, and the communication between coroutines is very simple (because it is a process, in the process, coroutine seems to be equivalent to the internal function beating and calling of the process ). This mechanism can greatly meet the concurrency requirements of a single machine.
In python, the key word for implementing coroutine is yield. However, this is not the complete implementation of the coroutine. However, the advantage of pytho is that it can be supplemented by the module implementation. Using the yield keyword, python implements two well-known coroutine modules greenlet and stackless. The main difference between the two is that stackless automatically schedules coroutine, however, greenlet requires the coroutine to manually give up the execution right. Each has its own advantages, but it is clear that the simple design of greenlet in refined control is consistent with the yield keyword of python, providing users with higher rights (but more code is required ).
Some people encapsulate gevent on greenlet. Is a network library based on coroutine. Because it is based on coroutine, the biggest feature of this network library is high concurrency. What is a network library? It encapsulates socket usage and process thread models. For example, you can dynamically open multiple processes (threads), each of which runs the greenlet coroutine, or directly create a thread pool. Greenlet defines the coroutine and switching methods, but there is no rule on how to switch. In the socket, it is easy to block, and gevent is scheduled to switch. That is to say, greenlet is a method in coroutine use, and gevent implements a policy.
As mentioned above, in order to speed up web development iteration, the industry has gradually switched to python. Everyone expects to switch back to a more efficient language one day, but the schedule will never give you a chance, so you will find that many of the current large websites are working directly using python. However, the python language can indeed increase the development speed. However, for network applications, python further increases the development speed by standardizing network development. The framework he proposed is WSGI, a web framework definition standard that defines application, server, and middleware interfaces. Most of the implementations of this standard are implemented middleware, followed by the server. What we understand is that web server is not a web server that enables socket listening and then processes it? Why is there a middleware layer? This is because opening the socket, parsing the http packet (https), maintaining and tracking sessions, cookies, and other operations are common for all http servers, the wsgi standard of python is to separate these general operations and define interfaces. All backend servers use the WSGI-defined (middleware exposed) interface for programming, which can make Server programming and server programs very simple.
There are many python libraries to implement this WSGI standard, such as gunicorn. As defined in this standard, the server also has many python libraries: bottle, django, flask, tornado, etc. Therefore, according to the WSGI logic, generally, a machine must start gunicorn as the middleware, and a server (such as bottle) as the actual backend, backend user program development is implemented using interfaces provided by libraries such as bottle. What kind of programming experience is this?
- @ Route ('/helloworld/: yourword', methods = ['get', 'post']) # url interface. Pay attention to the parameter writing format. A colon indicates the parameter.
- Def hello (yourwords ):
- Return 'Hello world. '+ yourwords defines the logic code to be executed and returned (incomplete) if it is a url ). Compared with traditional network backend cgi programming, does it feel very refreshing? It is completely clean and only processes the business. This is what WSGI standards bring to us. Gunicorn can be started independently, but the concurrency is not good, so we can use the gevent highly parallel network library we mentioned above. For the two, gevent is a method, and gunicorn is a policy. We can see that through this layer of encapsulation and calling, each component can select different implementation methods in different scenarios to achieve different purposes. This flexibility similar to building blocks is not available in traditional web development.
Cross-machine concurrency: the cloud background adds concurrency, provides and implements standards on one machine. Although it significantly improves the combat capability of a single machine, it still does not have enough requirements for modern networks. This requires multiple machines to provide external services. When multiple machines are used, there is a traditional cloud demand: When can I choose which PC to provide services? The requirements for Server Load balancer, network proxy, availability, and consistency are met. Many existing software can meet multiple services at the same time. For example, nginx can be used as a network proxy or load balancing. For Server Load balancer, there are dedicated software or hardware such as F5 and LVS. Keepalive software is responsible for availability, such as raid, and many other solutions can provide consistency. These are logically separated, but there may be compatible software in implementation. Currently, popular software may be outdated at some point. However, this requirement will not change. This forms the standard backend of modern web servers like firewalls.