In the actual software development process, often encounter the following scenario: A module is responsible for generating data, the data is handled by another module (the module here is generalized, can be classes, functions, threads, processes, etc.). The module that produces the data is visually called the producer, and the module that processes the data is called the consumer.
Abstract producers and consumers alone, but also not up is the producer/consumer model. The model also needs to have a buffer between the producer and the consumer as an intermediary. The producer puts the data into the buffer, and the consumer pulls the data out of the buffer.
◇ Decoupling
Assume that producers and consumers are two classes respectively. If a producer calls a method of the consumer directly, the producer will have a dependency on the consumer (that is, coupling). In the future, if the consumer's code changes, it may affect the producers. If both are dependent on a buffer, the two are not directly dependent, and the coupling is correspondingly reduced.
◇ support concurrency (concurrency)
There is another drawback to the way that producers call consumers directly. Since the function call is synchronous (or blocked), the producer has to wait there until the consumer's method returns. In the event that consumers process data slowly, producers will spoil the good times in vain.
After using producer/consumer patterns, producers and consumers can be two independent concurrent principals (common concurrency types are both process and thread, and later posts talk about applications under two concurrency types). The producer loses the manufactured data to the buffer and can then produce the next data. Basically don't rely on the processing speed of consumers.
In fact, this model is mainly used to deal with concurrency problems.
◇ support free and busy uneven
The buffer also has another benefit. If the speed of manufacturing data is fast and slow, the benefits of the buffer are reflected. When the data is manufactured fast, the consumer is too late to handle it, and the unhandled data can be temporarily present in the buffer. And so the producers slow down the production, consumers and then slowly dispose of.
Data unit
Simply put, every time a producer puts in a buffer, it is a data unit, and every time a consumer pulls out of the buffer, it is a data unit. For the example of sending a letter in a previous post, we can think of each individual letter as a unit of data.
★ Features of the data unit
To analyze the data unit, consider the following features:
◇ associating to a business object
First, the data unit must be associated to a business object. When considering this issue, you must understand the business logic of the current producer/consumer model in order to be able to make appropriate judgments.
Because the business logic of "send a letter" is relatively simple, it's easy to tell what the data unit is. But in real life, it is often not so optimistic. Most business logic is complex, with business objects in a variety of layers and types. In this case, it is difficult to make a decision.
This step is important if the wrong business object is chosen, resulting in a significant increase in the complexity of subsequent programming and coding implementations, increasing development and maintenance costs.
◇ Completeness
The so-called integrity, is in the transmission process, to ensure that the integrity of the data unit. Either the entire data unit is passed to the consumer, or it is not delivered to the consumer at all. A partial pass scenario is not allowed.
In the case of a letter, you cannot put half a letter in a mailbox, and likewise, the postman takes a letter from the mailbox and cannot just take part of the letter.
◇ Independence
The so-called independence is that each data unit does not depend on each other, the transmission failure of a data unit should not affect the unit that has completed the transmission, nor should it affect the units that have not yet been transferred.
Why is there a transmission failure pinch? If the producer's production speed has exceeded the consumer's processing speed for a period of time, it will cause the buffer to grow and reach the upper limit, then the data unit will be discarded. If the data units are independent, the subsequent data units will continue to be processed and not be implicated until the producer's speed has been lowered, whereas if there is some coupling between the data units, the discarded data cells will affect the processing of the subsequent units, which will complicate the program logic.
◇ Particle size
As mentioned earlier, a data unit needs to be associated with a business object. Do the data units and business objects have to be matched to each other? Many of the occasions are indeed one by one corresponding.
However, it is possible to package N business objects into a single data unit, sometimes for reasons such as performance. So, this n how to take the value is the granularity of consideration. The size of the granularity is fastidious. Too much particle size can cause some kind of waste; too small a particle size can cause performance problems. The trade-offs of granularity are based on a number of factors, as well as some empirical value considerations.
Or take the example of a letter. If the particles are small (for example, set to 1), the postman only takes out 1 letters at a time. If there are more letters, you have to go back and forth a lot, wasting time.
If the particle size is too large (for example, set to 100), the person who sent the letter will have to wait until the 100 letters are filled to put it in the mailbox. If you seldom write a letter, you have to wait a long time, not too cool.
Perhaps the reunion asked: whether the granularity of producers and consumers can be set to a different size (for example, the sender is set to 1, the postman is set to 100). Of course, this can be done theoretically, but in some cases it increases the complexity of program logic and code implementation.
Queue buffers
A single producer corresponds to a single consumer when using a queue (FIFO) for buffering.
★ Threading Method
Let's take a look at the example of using queues in concurrent threads, and the pros and cons of the correlation.
◇ Performance of memory allocation
In the thread mode, the producer and the consumer are each one of the threads. The producer writes data to the queue header (hereinafter referred to as push), and the consumer reads the data from the end of the queue (hereinafter referred to as POP). When the queue is empty, the consumer is at ease (take a break) and the producer is at ease when the queue is full (maximum length is reached). The whole process is not complicated.
So what's the problem with this process? A major issue is the performance overhead of memory allocations. For common queue implementations: the allocation of heap memory may be involved in each push, and the release of heap memory may be involved in each pop. If both producers and consumers are diligent and frequently push and pop, the overhead of allocating memory is considerable. Allocating heap memory (new or malloc) has the overhead of lock-in and user-state/kernel-mindset switching.
◇ Synchronization and Mutex performance
In addition, because two threads share a queue, nature will involve threads such as synchronization, mutual exclusion, deadlock, and so on. The performance overhead of synchronization and mutexes. In many cases, the use of such things as semaphores, mutexes, and so on, can also be costly (in some instances, it may also lead to user-state/kernel-mindset switching). If the producers and consumers are diligent, as we have just said, these expenses should be underestimated.
What's the pinch? Please listen to the following breakdown, about "producer/consumer mode [4]: Double buffer".
◇ suitable for the queue of occasions
So, if your data flow is not very large, the benefits of using the queue buffer are obvious: clear logic, simple code, easy maintenance.
★ Process Mode
Talk about the way threads are done, and then introduce process-based concurrency.
The cross-process producer/consumer model relies heavily on the specific interprocess communication (IPC) approach. There are numerous types of IPC that are not easy to enumerate (after all, there is limited saliva). So let's pick a few cross-platform, programming languages that support more IPC ways.
◇ Anonymous Pipeline
The sense pipeline is the IPC type most like the queue. The producer process puts the data in the writing end of the pipeline, and the consumer process extracts the data at the end of the pipeline. The whole effect is very similar to using a queue in a thread, except that you don't have to worry about thread safety, memory allocation, and so on, when using pipelines (the operating system secretly helps you out).
Pipelines are divided into two types of named pipes and anonymous pipelines, and today we mainly talk about anonymous pipelines. Because named pipes differ significantly under different operating systems (such as WIN32 and POSIX, there are significant differences in API interfaces and feature implementations for named Pipes, and some platforms do not support named pipes, such as Windows CE).
In fact, the anonymous pipeline API interface on different platforms, there are also differences (such as Win32 createpipe and POSIX pipe, the usage is very different). However, we can use only standard input and standard output (hereinafter referred to as stdio) for data inflow and outflow. The Shell's pipe character is then used to correlate the producer process with the consumer process. In fact, many of the commands that come with operating systems, especially POSIX-style, take advantage of this feature for data transfer (such as more, grep, and so on).
There are several benefits to doing so:
1. Basically all operating systems support the use of pipe characters in shell mode. So it's easy to implement cross-platform.
2, most programming languages can operate stdio, so cross programming language is easy to implement.
3. As mentioned earlier, the piping method eliminates the trivial aspects of thread safety. Help to reduce development, commissioning costs.
Of course, this approach also has its own drawbacks:
1, the producer process and the consumer process must be on the same host, unable to communicate across the machine. This shortcoming is more obvious.
2, in a one-to-one situation, this method is very suitable. But if you want to scale to one or more pairs, it's a bit tricky. So the scalability of this approach is a discount. If you want to consider a similar extension in the future, this shortcoming is more obvious.
3, because the pipeline is created by the shell, for both sides of the process is not visible (the program sees only stdio). In some cases, it is not easy for the program to manipulate the pipe (such as sizing the pipe buffer). This shortcoming is not very obvious.
4, finally, this method can only pass data in one direction. Fortunately, in most cases, the consumer process does not need to pass data to the producer process. In case you really need feedback (from the consumer to the producer), it's hard. You may have to consider a different IPC approach.
By the way, add a few notes, and let's take a look:
1, the stdio to read and write operation is to block the way. For example, if there is no data in the pipeline, the reading of the consumer process will be stopped until the data is re-available in the pipeline.
2. Because the stdio has its own buffer (which is not the same as the buffer and the pipeline buffer), it can sometimes lead to some less-than-good phenomena (such as the output of data by the producer process, but the consumer process is not immediately read). You can see "here" for specifics.
◇socket (TCP mode)
TCP-based socket communication is another form of IPC similar to queue. It also guarantees a sequential arrival of the data, as well as a buffering mechanism. And this thing is cross-platform and cross-lingual, similar to the shell pipe symbol just introduced.
What are the advantages of the socket compared to the shell pipe character? The main advantages are as follows:
1, Socket mode can be cross-machine (easy to achieve distributed). This is the main advantage.
2. Socket mode is convenient for future expansion into one or more pairs. This is also the main advantage.
3, socket can be set up blocking and non-blocking methods, use more flexible. This is a secondary advantage.
4, socket support two-way communication, to facilitate consumer feedback information.
Ring buffer
The previous post mentions the possible performance problems and workarounds for queue buffers: ring buffers. You should consider the use of ring buffers only if the allocation/release of the storage space is very frequent and does have a noticeable impact.
★ Ring Buffer vs Queue Buffer
◇ External interface Similar
Before we introduce the ring buffer, let's review the normal queue. The normal queue has a write end and a read-out end. When the queue is empty, the read-out cannot read the data, and the write end cannot write the data when the queue is full (maximum size is reached).
For the consumer, the ring buffer and the queue buffer are the same. It also has a write-side (for push) and a read-out (for pop), and a buffer of "full" and "empty" states. Therefore, switching from a queue buffer to a ring buffer can result in a smoother transition for the consumer.
◇ different internal structure
Although the external interface of the two is similar, but the internal structure and operating mechanism are very different. The internal structure of the queue is not much verbose here. Highlight the internal structure of the ring buffer.
You can think of the read-out end of the ring buffer (hereinafter referred to as R) and the write end (hereinafter referred to as W) as two people chasing (R chasing W) on the stadium runway. When R catches up W, the buffer is empty; When W catches up with R (W is more than R), the buffer is full.
This article is from the "Small Stop" blog, please be sure to keep this source http://10541556.blog.51cto.com/10531556/1837348
Advantages of producers and consumers