Python's micro-frame (blue), NodeJS and Go (green) and Japronto (purple)
Errata: User @heppu mentions that if you use Go's stdlib HTTP server with caution, you can write code that is 12% faster than go. The fasthttp is also a great Go server, with almost 18% less performance than Japronto in the same test. It's great! More details can be found in HTTPS://GITHUB.COM/SQUEAKY-PL/JAPRONTO/PULL/12 and HTTPS://GITHUB.COM/SQUEAKY-PL/JAPRONTO/PULL/14.
We can see the fact that the Meinheld WSGI server is almost the same as the NodeJS and Go performance. Although it uses a blocking design, it is much faster than the previous four, and the first four uses an asynchronous Python solution. Therefore, do not easily believe that others about the asynchronous system is always faster than the synchronization system, although all are concurrent processing problems, but the fact is far less simple than imagined.
Although I just use "Hello world" to complete the above-mentioned micro-framework test, it clearly shows the processing power of various server frameworks.
The tests were done on an Amazon AWS EC2 c4.2xlarge instance with 8 Vcpus, data centers in the São Paulo region, shared hosts, HVM virtualization, and normal disks. The operating system is Ubuntu 16.04.1 LTS (xenial Xerus), and the kernel is Linux 4.4.0–53-generic x86_64. Is the CPU shown by the operating system Xeon? e5–2666 v3 @ 2.90GHz. Python I use the version is 3.6, just compiled from the source code.
To be fair, all programs, including Go, run only on a single processor core. The test tool is wrk, with parameters of 1 threads, 100 links, and 24 requests per link (cumulative concurrent 2,400 requests).
System calls, as well as moving data between kernel space and user space, are much more expensive than moving data within a process. This is why it is not a last resort to make system calls as few times as possible.
When Japronto receives the data and successfully resolves the request sequence, it attempts to complete the requests as quickly as possible, merges all the results in the correct order, and then executes the system call only once to send the data to the client. In fact, because there is a system call Scatter/gather IO, the work of merging does not need to be done by itself, but Japronto has not yet used these functions.
Yet things are not always perfect, and sometimes requests take a long time to process, and the process of waiting for completion adds unnecessary delays.
When we do optimizations, it is necessary to consider the cost of the system call and the expected completion time of the request.
Optimized Japronto got 1,214,440 RPS's results.
In addition to leveraging client pipeline requests, and optimization calls, there are other techniques available.
Japronto is almost always written in C. Objects that contain parsers, protocols, link management, routing, requests, replies, and so on, are written in C extension.
Japronto tries to do lazy loading of Python, for example, the dictionary of the Protocol header is created only when the request is attempted, and a series of objects are created only when they are first used.
Japronto uses the super-cool Picohttpparser C Library to parse the State, protocol header, and the HTTP message body of the Shard. Picohttpparser is a direct call to modern CPU-integrated SSE4.2 Extended Text processing instructions to quickly match the boundaries of an HTTP tag (those older x86_64 CPUs 10 years ago have this thing). I/O is used for the awesome Uvloop, which is a LIBUV package, at the very bottom, it is called Epoll to provide asynchronous read-write notifications.
Picohttpparser relies on features of SSE4.2 and Cmpestri x86_64 for parsing
Python is a garbage collection language, so you should be careful when designing high-performance systems to avoid unnecessarily increasing the pressure on your garbage collector. Japronto's interior is designed to avoid circular references and to allocate as little as possible, freeing up memory, and it will pre-request an area to hold objects, while attempting to reuse the objects of Python that are not further referenced in subsequent requests, rather than throwing those objects away directly.
The size of these pre-applied memory is fixed to multiples of 4KB. The internal structure uses these contiguous areas of memory very carefully and frequently to reduce the likelihood of cache failure.
Japronto avoids unnecessary inter-cache replication as much as possible, and only performs operations in the correct location. For example, when dealing with routes, URL decoding is done before routing matches.
=large Open Source Contributors, I need your help.
I have been continuously developing japronto for more than three months, not only on every business day, but also on weekends. In addition to my daily work, I devote all my time and energy to this project.
I think it's time to share the fruits of my labor with the community.
Japronto has reliably implemented the following features:
Enter the group: 125240963 to get the mystery gift pack
Senior programmers use Python to process HTTP requests 1.2 million times per second! What concept?