Pyparallel is a research project initiated by Trent Nelson, whose goal is to move the power of Windows I/O completion ports (IOCP) to Python in a way that delivers high-performance asynchronous support.
There is a problem with the asynchronous support of Python. It is designed around the unix/linux asynchronous, non-blocking I/O concept. The thread continues to poll incoming data and then distributes it accordingly. Although Linux is tuned for this pattern, on Windows machines, this approach is a disaster for performance. It is expensive to copy the data gurney to the thread that really handles the task.
What Pyparallel brings is the use of native IOCP's true asynchrony. Under the IOCP model, each kernel has a thread. Each thread handles the completion of the I/O request (for example, copying data from the network card) and the application-tier callback that is associated with the execution request.
This is not enough to extend python horizontally, and it also needs to address the problem posed by the Gil (Global interpreter Lock). Otherwise we are still limited to executing one thread at a time. Replacing the Gil with a fine-grained lock would be worse; software transaction memory, like PyPy, tends to eventually cause 1 of threads to continue to push, and N-1 threads to continually retry the problem. So we need another solution.
For the Pyparallel team, the solution is not to allow free creation of threads. In other words, the application is not free to create new threads. In contrast, parallel operations are bound to the concepts of asynchronous callback mechanisms and parallel contexts (parallel context).
Before we dive into the parallel context, we'll look at it in turn. When the parallel context is not running, the main thread runs and vice versa. The main thread is what you consider for normal python development. The main thread holds the Gil and has full access to the global namespace.
In contrast, a parallel context can have only read-only access to the global namespace. This means that developers need to be aware of whether something is a main thread object or a parallel context object. COM programmers who have handled the suite threading model (apartment threading models) are more aware of the pain.
For non-I/O tasks, the main thread queues the task using the Async.submit_work function and then switches to the parallel context using the Async.run function. This suspends the main thread and activates the parallel interpreter. Multiple parallel contexts can be run concurrently, and the Windows operating system handles the management of the thread pool.
Parallel with Gil
It is important to note that there are no multiple processes created here. Although multi-process technology is common in Python development, Pyparallel puts everything in one process to reduce the cost of communication across processes. This is generally not allowed because the CPython interpreter is not thread-safe, which includes:
Global static data is used frequently
The reference count is not atomic
object is not protected with a lock
Garbage collection is not thread-safe
The creation of a detention string (interned string) is not thread-safe
Bucket memory allocator is not thread-safe
Arena memory allocator is not thread-safe
Greg Stein tried to solve the problem by adding a fine-grained lock to Python 1.4, but in single-threaded code, his project slowed down by 40%, so it was rejected. So Trent Nelson decided to take a different approach. In the main thread, the Gil works as usual. However, when running in a parallel context, a thread-safe substitution scheme is used instead of a core function to run.
The cost of Trent's plan is 0.01%, much better than Greg's plan. As for PyPy software transaction memory, the cost of a single-threaded model is probably 200~500%.
An interesting part of the design is that code that runs in a parallel context does not need to acquire a lock when reading data from an object in the global namespace. But it only has the ability to read.
Pyparallel no garbage collector.
To avoid dealing with memory allocation, access, and garbage collection-related locks, Pyparallel uses a no-shared pattern. Each parallel context has its own heap, and there is no garbage collector. That is, there is no garbage collector associated with the parallel context. So it actually is:
Memory allocations are done using a simple block allocator. Each memory allocation just adjusts the pointer.
Allocate a new page of 4 K or 2MB of size as needed, which is controlled by the large page settings of the parallel context.
No reference count is used.
When the parallel context ends, all pages associated with it are released at the same time.
This design avoids the cost of a thread-safe garbage collector or thread-safe reference count. In addition, it supports the previously mentioned block allocator, which may be the fastest way to allocate memory.
The Pyparallel team believes this design can be successful because the parallel context is intended to support applications that have a shorter life cycle and a limited range. A good example is a parallel sort algorithm or Web page request handler.
For this design to work correctly, objects created in parallel contexts cannot escape into the main thread. This is guaranteed by the restriction of read-only access to the global namespace.