Http://randomtaste.appspot.com/view/page/coproc
Introduction
This article introduces the design idea of a developing C ++ service framework coproc in the concurrent processing model. based on the libevent and basic reactor models, coproc gradually implements a lightweight process and a concurrency model similar to Unix fork-wait, the ucontext coroutine mechanism is used to implement real "process" context switching. in this way, the Model Evolution from event-driven to sequential processing, from Asynchronous to synchronous waiting is realized. while implementing the traditional UNIX concurrency abstraction, it maintains a high running efficiency.
Reactor
Reactor is a common concurrent processing model that organizes businesses into event-driven objects. A reactor that listens to receive new TCP connections will look like:
class ListenReactor : public Reactor{ ... void OnSockEvent(int listen_fd, int ev) { int sock = accept(listen_fd, NULL, 0); ... }}
The libevent is used as the event drive. After a little packaging, the reactor can register the handle event and call back the onsockevent. This code can work, which is very simple.
Note: in implementation, we have a task/message queue to implement asynchronous multi-type message/event distribution. The following event triggers, message transmission, or method calls (the same thing) the model is asynchronous. synchronous direct call is only a cut, and the following will not be repeated. message Queue is the key to supporting cross-thread collaboration and capacity control, and also the focus of performance optimization. In the future, it is also responsible for scheduling similar OS tasks (so it is called Scheduler ), but it has nothing to do with the questions discussed here.
Model Improvement
The task "Listening to receive new connections" is event-driven in text. but the task "reading n Bytes from socket" is not. It is a procedural task. most of the services that web application services need to process belong to the latter, and the model is not consistent with the event-driven model. organizing a web application with complex business processes and IO interactions into event-driven reactor is a great challenge, regardless of design, programming, debugging, and subsequent maintenance.
We need to Implement AsynchronousCall-return SemanticsThis is the most common mode in procedural processing logic:
- A task is processed by an independent reactor. For example, a reactor that provides unified I/O services does not exist in all sockets of the Management Program. Instead, a dedicated reactor is created to complete an I/O INTERACTION task.
- Introduce Standard Events: oninit (), triggered when the reactor task starts, onstop (), triggered when the reactor stops for any reason (stop or kill by yourself.
- Link association mechanism is introduced: If a is associated with B, the onlinkreturn (B) event of A is triggered when B is stopped.
As a result, the reactor implements a model corresponding to call-return in procedural processing:
- Call: Create a subtask reactor, associate it with the parent task, and start it. We call this process as spawn.
- Return: The onlinkreturn event of the parent task is triggered when the subtask is completed. In event processing, the parent task obtains information such as the completion status of the subtask, destroys the subtask reactor, and completes the reactor lifecycle.
In this model, we can use familiar ideas to implement hierarchical logic encapsulation. in a good encapsulation design, in addition to the underlying Io processing reactor, other reactor only needs to handle the onlinkreturn event, and changes the status based on the return results of each seed task to drive the next step, the task can be successfully completed.
The processing of an RPC client can be implemented as follows ):
class RPCClient : public Reactor{ int m_stage; virtual void OnInit() { m_stage = STAGE_SEND; m_send_reactor.SetRequest(request); Spawn(&m_send_reactor); } virtual void OnLinkReturn(Reactor *src) { if (m_stage == STAGE_SEND) { m_stage = STAGE_RECV; Spawn(&m_recv_reactor); } else { Return(m_recv_reactor.GetResult()); } }}
Proc
In the preceding example, we need to wait for the sending to complete, start the reading task, and retrieve the result after the reading is complete. waiting is a natural attribute of this task. We need to implement support. of course, we will not block the waiting task on the OS process or thread. what we do is actually a small improvement to the above Code.
class RPCClient : public Reactor{ int m_next_stage; void OnInit() { Process(STAGE_INIT); } void Process(int stage) { switch (stage) { case STAGE_INIT: m_send_reactor.SetRequest(request); Spawn(&m_send_reactor); return SetWait(STAGE_SENT); case STAGE_SENT: Spawn(&m_recv_reactor); return SetWait(STAGE_RECV); case STAGE_RECV: return Return(m_recv_reactor.GetResult()); } } void SetWait(int stage) { m_next_stage = stage; } void OnLinkReturn(Reactor *src) { Process(m_next_stage); }}
Here, we divide business processing into several stages. Each stage is non-blocking. When it is a spawn subtask, when the subtask returns to the next stage, it can exit after it is set to wait for the wake-up status. when the only event it waits for-the subreactor returns to, onlinkreturn executes a unified task: According to the returned status set last time, it calls process to process the next stage, until the task is finished.
We didn't introduce new features, but simply optimized the code structure. however, we found that the only event processing process was unified and taken over by the framework. the processing process is organized into code that is very similar to the original process. in the above simple implementation, we only implement the waiting for a subtask. with a slight modification, we can implement the semantics of waiting for all subtasks to return. these modes may not be flexible enough, but are often used and easy to use. although it is only some simple code transformation, our model is no longer an event processing model.
I call this modeSpawn-WaitMode, or fork-Wait mode. this is the simplest and most direct way to implement concurrency in UNIX: fork a bunch of sub-processes work, and then wait until all of them return. we introduced the process state concept in reactor: Ready, wait, and done. We implemented basic process semantics in a very lightweight way, the difference is that we can now create thousands of processes very efficiently and let them process them concurrently. I call it a lightweight process. in implementation, it is just a very simple reactor derived class, which can be called Proc.
We organized the task into a tree-like relationship between the parent and child processes/reactor, which also gave C ++ a new way to manage resources that are very troublesome. in the current implementation, Proc has a timer (ah, you can understand it as sending sigalrm), and will forcibly terminate proc when proc times out, because proc maintains the list of all pending sub-proc/reactor, Proc terminates all these sub-processes and effectively recycles all resources (memory, handle, registration on libevent ).
Coproc
However, the above processing mode is still too clumsy. to put it bluntly, this mode of dependency on stage jump is to use goto to write programs, more complex logical organization, or porting of existing synchronized business processes.
In Proc, stage is used to manage the processing context. When a result is returned, stage is used to return to the appropriate processing logic. in fact, Linux has the userland context switch Management Mechanism ucontext. for details, refer to the makecontext (3) series manual.
Using this mechanism, we can implement yield. yield can switch the current processing flow from anywhere and enter the waiting state of Proc. When onlinkreturn is awakened, the context saved by yield is restored, continue the process below yield. (Note: The yield here is better than the yield in Python. It can be used on multiple invocation stacks. The following code is an example ).
The RPC client example is too difficult. The following logic calls dB reactor concurrently. When the total number of returned results exceeds 1000, the system exits.
class Searcher : public Coproc { const static int NUM_DBS = 256; int GetDouble(DB *db1, DB *db2) { Spawn(db1); Spawn(db2); Yield(); return db1->GetNumResult() + db2->GetNumResult(); } virtual int Main() { int total = 0; DB dbs[NUM_DBS]; for (int i = 0; i < NUM_DBS; i += 2) { total += GetDouble(&dbs[i], &dbs[i + 1]); if (total >= 1000) { break; } } return total; }}
We can see that the code is basically the same as the synchronization logic code. We call this model coproc.
- Compared with proc, coproc is closer to common processes. Of course, the reserved ucontext stack space and the ucontext switching cost are paid, which is heavier.
- Compared with OS processes or threads, coproc is not universal: The switching time is controlled by yourself. If your yield is not replaced, a long pure CPU task can block the system; it does not automatically process the read (2) and other interfaces that will be blocked, but this is a much smaller resource consumption.
Application
Although so many things have been done on the reactor, it is surprising that for the underlying mechanism, the model has not changed much: the reactor still handles the event, however, when the event processing of Proc and coproc is taken over by the framework, the event processing is still non-blocking. for libevent, the entire system remains unchanged. therefore, at the underlying layer, the libevent and reactor modes still use the original method to ensure good system concurrent processing capabilities. At the upper layer, the reactor and coproc are mutually compatible and can be called from each other. developers can select the most appropriate model based on the trade-offs they need.
For example, in a web application service. port listening is still implemented by the Basic reactor, while the relatively stable and common business framework and data access layer are implemented by proc-all the complicated Io modes are ignored, the business logic can be implemented as coproc, and the code under the original synchronization framework can also be easily transplanted. when a request arrives, listen to the reactor and directly create a service framework proc for processing-this is the most basic accept-fork model in UNIX. I like this beautiful and natural mode, and now I can create thousands of Proc. the business framework proc directs the entire business processing process,
It involves multiple business logic modules written by coproc. Although the Code logic is synchronous, Io can be naturally parallel.
Summary
From reactor to coproc, we continue to weigh, through appropriate model restrictions and specifications, and a few additional mechanisms, we can from difficult to design, the programming and debugging event-driven model evolved almost the same as the normal Synchronization Model. in this process, we have largely re-built the wheel: OS, and we have continuously added various OS mechanisms to our implementation. in fact, no matter whether the IO task or the OS itself is event-driven by hardware interruption. we simply use a slightly different, very lightweight method to drive events to sequential processing, and asynchronously repeat the synchronous blocking path.
I think it is valuable to do this. The operating system itself provides a beautiful and mature environment for parallelism-more importantly, the abstraction that everyone is familiar. however, as a general OS, * nix has to consider more cases to achieve more complete encapsulation and abstraction, resulting in its basic concurrency model: Overhead with serious processes. in the past, we had to make no compromise on its concurrency model: for example, we abandoned every request to fork a process, but used the process or thread pool, and did not open a bunch of processes for concurrent requests; or, just give up the model completely, for example, adopt the full event-driven model. my idea is that under certain conditions, we can maintain this mature model,
Measure the test taker's knowledge about the implementation of the operating system.
This idea is obviously influenced by exokernel research and Erlang. here, the light process model is obviously not as flexible and powerful as Erlang, but I believe it will be a more familiar and easy-to-use model for Unix programmers.
The model proposed in this article is similar to cerl, but the specific design varies greatly. the contribution of this article is to demonstrate the ideas from the mature open-source reactor model to the coproc model with reference to the OS design. It shows that the model selection is not either another, but can be determined based on the situation, flexible definition as needed (in addition to proc, reactor
Also evolved many models such as conditional variables. the cerl project has a broad vision and hopes to build a complete framework or system, while coproc focuses on model issues and processes Io events (libevent), service protocols, and descriptions (protobuf etc) to other mature open-source software.