Concepts and differences between Reactor and Proactor in network programming

Last Update:2017-01-13 Source: Internet

Author: User

Tags call back epoll posix prepare set socket socket

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Definition

Two I/O multiplexing modes: Reactor and Proactor

Generally, the I/O multiplexing mechanism depends on an Event Demultiplexer ). The splitter object can separate I/O events from the Event source and distribute them to the corresponding read/write Event processor (Event Handler ). Developers pre-register the event to be processed and its event processor (or callback function); the event splitter is responsible for passing request events to the event processor. Two modes related to the event splitter are Reactor and Proactor. The Reactor mode adopts synchronous IO, while the Proactor adopts asynchronous IO.

In the Reactor, the event splitter is responsible for waiting for the file descriptor or socket to prepare for the read/write operation, passing the ready event to the corresponding processor, and finally the processor is responsible for the actual read/write work.

In Proactor mode, the processor, or the event splitter that serves as the processor, is only responsible for initiating asynchronous read/write operations. IO operations are completed by the operating system. The parameters passed to the operating system must include the user-defined data buffer address and data size. The operating system can obtain the data required for the operation or write the data read from the socket. The event splitter captures the IO operation completion event and then transmits the event to the corresponding processor. For example, in windows, the processor initiates an asynchronous IO operation, and the event splitter waits for the IOCompletion event. The typical asynchronous mode implementation is based on the operating system supporting asynchronous APIs. We call this implementation "system-level" asynchronous or "true" asynchronous, because the application relies entirely on the operating system to execute real IO work.

For example, the difference between a Reactor and a Proactor can be understood. A read operation is used as an example (similar to a class operation ).

Implement read in Reactor:

-Register read-ready events and corresponding event processors
-Event separator wait event
-When the event arrives, the separator is activated and the processor corresponding to the separator call event is activated.
-The event processor completes the actual read operation, processes the read data, registers a new event, and then returns control.

Implement read in Proactor:

-The processor initiates an asynchronous read operation (note: the operating system must support asynchronous IO ). In this case, the processor ignores the IO readiness event and focuses on the completion event.
-Event splitter waiting for Operation completion event
-When the splitter is waiting, the operating system uses parallel kernel threads to perform actual read operations, stores the result data into the user-defined buffer zone, and finally notifies the event splitter to complete read operations.
-The event splitter calls the processor.
-The event processor processes data in the user-defined buffer, starts a new asynchronous operation, and returns control to the event splitter.

It can be seen that the similarities between the two modes are the event notification for an IO event (that is, to tell a module that this IO operation can be performed or has been completed ). In terms of structure, the two are also similar: demultiplexor is responsible for submitting IO operations (asynchronous), querying whether the device can operate (synchronous), and then callback handler when the conditions are met; the difference is that, in asynchronous mode, when the callback handler is used, it indicates that the IO operation has been completed; when the synchronization (Reactor) is used to call back the handler, indicates that the IO device can perform an operation (can read or can write ).

2. General understanding

Using the Proactor framework and Reactor framework can greatly simplify the development of network applications, but their focus is different.

In the Reactor framework, user-defined operations are called before actual operations. For example, if you define that the operation is to write data to a SOCKET, then when the SOCKET can receive data, your operation will be called; in the Proactor framework, user-defined operations are called after actual operations. For example, if you define an operation to display the data read from the SOCKET, your operation will be called only after the read operation is complete.

Both Proactor and Reactor are design patterns in concurrent programming. In my opinion, they are all used to dispatch/detach IO operation events. Here, I/O events are IO operations such as read/write. "Distribution/separation" means to notify the upper-layer module of independent IO events. The difference between the two modes is that Proactor is used for asynchronous IO while Reactor is used for synchronous IO.

3. Remarks

In fact, these two models are embodied in ACE (Network Library). To learn about these two models, refer to ACE source code. ACE is an open-source network framework, it is worth learning ..

Differences between Reactor and Proactor

System I/O can be classified into blocking type, non-blocking synchronous type, and non-blocking asynchronous type.

Blocking I/O means that the control will be returned to the caller only after the call operation is complete.

Non-blocking synchronization immediately returns control to the caller. The caller does not need to wait. The caller obtains two results from the called function: either the call is successful, or the system returns an error ID to indicate that the caller's current resource is unavailable, wait or try again. For example, in the read () operation, if the current socket has no data to read, EWOULBLOCK/EAGAIN will be returned immediately, telling the caller who calls read () that "the data is not ready yet. Please try again later ".

Non-blocking asynchronous calls are slightly different. When the function is returned immediately, the caller is also notified that the request has started. The system will use another resource or thread to complete the call operation, and notify the caller (for example, through the callback function) when the call is completed ). For POSIX aio_read (), after calling it, the function returns immediately, and the operating system starts read operations at the same time in the background. That is, the work is handed over to the kernel to complete this operation.

Among the above three types of IO, non-blocking asynchronization provides the highest performance and the best scalability.

Two IO Multiplexing Solutions: Reactor and Proactor

Generally, the I/O multiplexing mechanism requires the event demultiplexor ). The role of the event sharer is to distribute the read/write event sources to the handlers of various read/write events. It is like a courier shouting downstairs: who has sent anything? Come and get it. At the beginning, developers need to register events of interest in the sharer and provide corresponding event handlers or callback functions; when appropriate, the event Sharer will distribute the requested events to these handler or callback functions.

The two modes that involve the event Sharer are called Reactor and Proactor. The Reactor mode is based on synchronous I/O, while the Proactor mode is related to asynchronous I/O. In Reactor mode, the event separator waits for an event or an application or operation to occur (for example, the file descriptor can be read/written, or the socket can be read/written ), the event splitter transmits the event to the previously registered event handler or callback function, and the latter performs actual read/write operations.

In Proactor mode, the event handler (or initiated on behalf of the event separator) directly initiates an asynchronous read/write operation (equivalent to a request), and the actual work is done by the operating system. When initiating a request, you must provide the following parameters: the cache used to store read data, the size of read data, or the cache used to store outgoing data, and the callback function after the request is completed. The event splitter knows the request, and silently waits for the request to be completed, and then forwards the event to the corresponding event handler or callback. The typical implementation of this asynchronous mode is based on the underlying asynchronous API of the operating system. Therefore, we can call it "system-level" or "true" asynchronous, because the specific read/write operations are performed by the operating system.

Another example is to better understand the differences between the Reactor and Proactor modes. Here we only focus on the read operation, because the write operation is similar. The Reactor practices are as follows:

An event handler claims that it is very interested in reading events on a socket;

The event separator is waiting for the occurrence of the event;

When an event occurs, the event splitter is awakened, which notifies the previous event handler;

The event handler received the message and went to the socket to read the data. If necessary, it again claims to be interested in the Read events on the socket and repeats the above steps;

Next let's take a look at how the real asynchronous mode Proactor works:

The event handler ships a write operation directly (of course, the operating system must support this asynchronous operation ). At this time, the event handler does not care about the read event at all. It only sends such a request, and it is dreaming of the completion event of this write operation. This processor is very jealous. If you send a command, you don't need to worry about specific things. You just need to give it back when someone else (the system) helps him.

The event separator waits for the completion of the Read event (compared to the Reactor );

When the event separator silently waits for the completion of the task, the operating system is already working. It reads data from the target, stores the data in the cache provided by the user, and finally notifies the event separator, I have finished this;

The event Sharer notifies the previous event handler that the thing you ordered is handled;

The event handler will find that the data to be read has been placed in the cache provided by the handler, and everything can be done. If necessary, the event handler initiates another write operation as before, just like the previous steps.

Standard classic Reactor mode:

Step 1) wait for the event (Reactor's work)

Step 2) send an "readable" event to the event handler or callback (which the Reactor wants to do) registered in advance)

Step 3) read data (what the user code needs to do)

Step 4) process data (what the user code needs to do)

Simulated Proactor mode:

Step 1) wait for the event (Proactor's work)

Step 2) read data (see, it turns into letting Proactor do this)

Step 3) send the prepared message to the user's handler, that is, the event handler)

Step 4) process data (what the user code needs to do)

In an operating system that does not support underlying asynchronous I/O APIs, this method can help us hide the differences between socket interfaces (whether it is performance or other ), provides a fully available unified "asynchronous interface ". In this way, we can develop an independent universal interface for the platform.

So what are the differences between the two?

Simple and intuitive understanding:

1. The Reactor mode is to wait for the action to be concerned, and then hand over the action to the user-State application for processing. The Reactor event Sharer only cares about the event, the others are completely handed over to the application for processing. The Proactor mode only cares about the results returned after the asynchronous non-blocking operation is completed by the operating system (kernel;

2. In Proactor scenarios, only asynchronous non-blocking syscall (system call) can be used, while in Reactor scenarios, non-blocking synchronous syscall (system call) is used );

I/O-synchronous, asynchronous, blocking, non-blocking concepts, please take a look at this popular science:

What are the differences between synchronous I/O and asynchronous I/O, blocking I/O, and non-blocking I/O? Different people may give different answers to this question. For example, wiki considers asynchronous IO and non-blocking IO as one thing. This is because different people have different knowledge backgrounds and the context is different when discussing this issue. Therefore, to better answer this question, I will first limit the context of this article.

The background of this article is network IO in Linux.

The most important references in this article are Richard Stevens's "UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking", Section 6.2 "I/O Models ", this section describes in detail the features and differences of various IO. If the English is good enough, we recommend that you read it directly. Steven S's style is famous for its depth, so you don't have to worry about it. The flowchart in this article is also taken from references.

In this article, Steven S compares five IO models:

Blocking IO

Nonblocking IO

IO multiplexing

Signal driven IO

Asynchronous IO

Because signal driven IO is not commonly used in practice, I will only mention the remaining four IO models.

Let's talk about the objects and steps involved when I/O occurs.

For a network IO (here we use read as an example), it involves two system objects, one is to call the IO process (or thread), and the other is the system kernel (kernel ). When a read operation occurs, it goes through two phases:

1. Waiting for data preparation (Waiting for the data to be ready)

2. Copy data from the kernel to the process (Copying the data from the kernel to the process)

It is important to remember these two points, because the differences between these IO models are different in the two phases.

Blocking IO

In linux, all sockets are blocking by default. A typical read operation process is like this:

When the user process calls the recvfrom system call, the kernel starts the first stage of IO: Prepare Data. For network io, data has not arrived at the beginning (for example, a complete UDP packet has not yet been received). At this time, the kernel will wait for enough data to arrive. On the user process side, the whole process will be blocked. When the kernel waits until the data is ready, it will copy the data from the kernel to the user memory, and then the kernel returns the result, the user process will unblock the status and run it again.

Therefore, the feature of blocking IO is that it is blocked in both stages of IO execution.

Non-blocking IO

In linux, you can set socket to non-blocking. When a read operation is performed on a non-blocking socket, the process looks like this:

As shown in the figure, when a user process sends a read operation, if the data in the kernel is not ready, it does not block the user process, but returns an error immediately. From the perspective of the user process, after initiating a read operation, it does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and the system call of the user process is received again, it immediately copies the data to the user memory and returns it.

Therefore, the user process needs to actively ask about the kernel data.

IO multiplexing

IO multiplexing may be a bit unfamiliar, but if I say select or epoll, I will probably understand it. This IO method is also called event driven IO in some places. We all know that the benefit of select/epoll is that a single process can process the IO of multiple network connections at the same time. The basic principle of this function is that the select/epoll function will continuously poll all the sockets in charge. When a socket has data, it will notify the user process. The process is shown in the following figure:

When a user process calls the select statement, the entire process will be blocked. At the same time, the kernel will "monitor" all sockets under the select statement. When the data in any socket is ready, select returns. At this time, the user process then calls the read operation to copy data from the kernel to the user process.

This graph is not much different from the blocking IO graph. In fact, it is worse. Because two system calls (select and recvfrom) need to be used here, while blocking IO only calls one system call (recvfrom ). However, the advantage of using select is that it can process multiple connections at the same time. (More. Therefore, if the number of connections to be processed is not very high, the web server using select/epoll may not have better performance than the web server using multi-threading + blocking IO, and may have a greater latency. The advantage of select/epoll is not that it can process a single connection faster, but that it can process more connections .)

In the I/O multiplexing Model, in practice, each socket is generally set to non-blocking. However, as shown in the figure above, the entire process of the user is always block. However, process is block by the select function, rather than block by socket IO.

Asynchronous I/O

In linux, asynchronous IO is rarely used. Let's take a look at its process:

After the user process initiates the read operation, it can immediately start to do other things. On the other hand, from the perspective of kernel, when it receives an asynchronous read, it will first return immediately, so it will not generate any block to the user process. Then, the kernel will wait for the data preparation to complete and then copy the data to the user memory. After all this is done, the kernel will send a signal to the user process to tell it that the read operation is complete.

So far, four I/O models have been introduced. Now let's look back at the first few questions: What is the difference between blocking and non-blocking? What is the difference between synchronous IO and asynchronous IO.

Answer the simplest one: blocking vs non-blocking. The difference between the two is clearly described in the previous introduction. Calling blocking IO will block the corresponding process until the operation is completed, while non-blocking IO will return immediately when the kernel still prepares data.

Before describing the differences between synchronous IO and asynchronous IO, you must first define the two. The definitions provided by Steven S (actually the POSIX definition) are like this:

A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;

An asynchronous I/O operation does not cause the requesting process to be blocked;

The difference between the two is that synchronous IO blocks process when performing "IO operation. According to this definition, the previously described blocking IO, non-blocking IO, and IO multiplexing all belong to synchronous IO. Some may say that non-blocking IO is not blocked. Here is a very "tricky" place. The "IO operation" in the definition refers to the real IO operation, that is, the recvfrom system call in the example. When non-blocking IO executes the recvfrom system call, if the kernel data is not ready, the process will not be blocked. However, when the data in the kernel is ready, recvfrom will copy the data from the kernel to the user memory. At this time, the process is blocked. During this time, the process is blocked. Asynchronous IO is different. When a process initiates an I/O operation, it directly returns the result and ignores it again until the kernel sends a signal telling the process that I/O is complete. In this process, the process is not blocked at all.

The comparison of each IO Model is shown in the following figure:

After the above introduction, we will find that the difference between non-blocking IO and asynchronous IO is quite obvious. In non-blocking IO, although the process is not blocked for most of the time, it still requires the process to take the initiative to check, and after the data preparation is complete, the process also needs to actively call recvfrom again to copy data to the user memory. Asynchronous IO is completely different. It is like a user process handing over the entire IO operation to another person (kernel) to complete, and then the other person will send a signal after completion. During this period, the user process does not need to check the I/O operation status or actively copy data.

Finally, let's take a few examples that are not very appropriate to illustrate these four IO models:

There are four people A, B, C, and D fishing:

A uses the oldest fishing rod, so you have to keep it. When the fish are hooked up, draw A lever;

B's fishing rod has a function to show whether there is a fish bait. So B chats with the MM next to it and checks whether there is a fish bait. If yes, the rod can be quickly pulled;

C's fishing rod is similar to B's, but he thought of a good way: put several fishing rods at the same time, and then stay next to it. Once there is a display that the fish are hooked, it pulls the corresponding fishing rod;

D is a rich man. He simply hired a man to help him fish. Once the man caught the fish, he sent a text message to D.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Concepts and differences between Reactor and Proactor in network programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Concepts and differences between Reactor and Proactor in network programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support