4. Concurrency programming model and concurrency Programming Model

Source: Internet
Author: User

4. Concurrency programming model and concurrency Programming Model

The concurrency system can adopt multiple concurrent programming models. The concurrency model specifies how threads in the system collaborate to complete the jobs assigned to them. Different concurrent models split jobs in different ways, and the ways of collaboration and interaction between threads are also different. This concurrency model tutorial will introduce several popular concurrency models.

Similarity between the concurrency model and the distributed system

The concurrency model described in this article is similar to many architectures used in distributed systems. In a concurrent system, threads can communicate with each other. In a distributed system, processes can also communicate with each other (processes may be on different machines ). There are many similar features between threads and processes. This is why many concurrency models are generally similar to various distributed system architectures.

Of course, distributed systems also face additional challenges in dealing with network failures, remote hosts, or process failures. However, concurrent systems running on giant servers may encounter similar problems, such as failure of a CPU, failure of a nic, or damage to a disk. Although the probability of failure may be very low, it may still happen theoretically.

Because the concurrency model is similar to the distributed system architecture, they can often learn from each other. For example, the model for assigning jobs to workers (threads) is generally similar to the Load Balancing System in the distributed system. Likewise, they are similar in log recording, failover, power, and other error processing techniques.
[Note: idempotence: An idempotence operation is characterized by the same impact on any multiple executions as that on one invocation]

Parallel worker

The first concurrency model is what I call the parallel worker model. Input jobs are assigned to different workers. Shows the parallel Worker Model:

In the parallel worker model, the delegate assigns input jobs to different workers. Each worker completes the entire task. Workers run in parallel on different threads, or even on different CPUs.

If a parallel worker model is implemented in a car factory, each vehicle will be produced by one worker. The workers will get the production specification of the car and take charge of all the work from start to end.

In Java application systems, the parallel worker model is the most common concurrency model (even if it is being transformed ). Many concurrent utilities in the java. util. concurrent package are designed for this model. You can also see traces of this model in the design of Java Enterprise (J2EE) application server.

Advantages of the parallel Worker Model

The advantage of the parallel worker mode is that it is easy to understand. You only need to add more workers to improve the degree of parallelism of the system.

For example, if you are creating a web crawler, you can try to use different numbers of workers to capture a certain number of pages and then see how many workers consume the shortest time (which means the highest performance ). Because web crawler is an IO-intensive job, it is likely that several threads are allocated to each CPU or core on your computer. If each CPU is allocated to only one thread, the CPU may be idle for a large amount of time while waiting for data download.

Disadvantages of the parallel Worker Model

Although the parallel worker model looks simple, it hides some disadvantages. In the following sections, I will analyze some of the most obvious weaknesses.

The sharing status may be complex.

In practical applications, the parallel worker model may be much more complex than described above. Shared workers often need to access Shared data, whether in memory or in a shared database. Demonstrate how the parallel worker model becomes complex:

Some sharing statuses are in communication mechanisms such as job queues. However, there are also some sharing statuses such as business data, data caching, and database connection pools.

Once the sharing state is integrated into the parallel worker model, the situation will become complicated. A thread needs to access Shared data in some way to ensure that modifications to a thread can be visible to other threads (data modifications must be synchronized to the primary memory, not only store data in the cache of the CPU that executes this thread ). Threads need to avoid concurrency issues such as actual states, deadlocks, and many other sharing states.

In addition, when waiting to access the shared data structure, the mutual waiting between threads will lose part of the concurrency. Many concurrent data structures are congested, meaning that only one or few threads can access the data at any time. This leads to competition in these shared data structures. When executing code that requires access to the shared data structure, high competition will basically result in a certain degree of serialization during execution.

Current non-blocking concurrent algorithms may reduce competition and improve performance, but it is difficult to implement non-blocking algorithms.

A persistent data structure is another option. During modification, the persistent data structure always protects the previous version from being affected. Therefore, if multiple threads direct to the same persistent data structure, and one of the threads has been modified, the modified thread will get a reference pointing to the new structure. All other threads maintain reference to the old structure. The old structure is not modified and therefore consistent. Scala programming includes several persistent data structures.
[Note: the persistent data structure here is not a persistent storage, but a data structure, such as the String class in Java and the CopyOnWriteArrayList class. For details, refer]

Although the persistent data structure is elegant in solving the concurrent modification of the shared data structure, the persistence data structure is often unsatisfactory.

For example, a persistent linked list needs to insert a new node in the header and return a reference pointing to the new node (this node points to the rest of the linked list ). All other sites still keep the first node before the linked list. For these threads, the linked list is still changed. They cannot see the newly added elements.

This type of persistent list is implemented using a linked list. Unfortunately, linked lists do not do well in modern hardware. Each element in the linked list is an independent object, which can be distributed throughout the computer memory. Modern CPUs allow faster sequential access, so you can use arrays to implement a list on modern hardware for higher performance. Arrays can store data sequentially. The CPU cache can load a large part of the array at a time for caching. Once loaded, the CPU can directly access the data in the cache. This is not possible for linked lists where elements are scattered in RAM.

Stateless workers

The sharing status can be modified by other threads in the system. Therefore, the worker must re-read the status whenever needed to ensure that the latest copy can be accessed each time, regardless of whether the shared status is stored in the memory or in the external database. Workers cannot save this State internally (but can re-read it every time they need it.

Every time you repeat the required data, the speed will be slow, especially when the status is stored in an external database.

The task order is uncertain.

Another disadvantage of the parallel worker mode is that the job execution sequence is uncertain. It cannot be ensured that the job is first or last executed. Job A may be assigned A worker before Job B, but Job B may be executed before job.

This non-deterministic feature of the parallel worker mode makes it difficult to infer the state of the system at any specific time point. This also makes it harder (if not impossible) to ensure that a job is executed before other jobs.

Assembly Line Mode

The second concurrency model is called the pipeline concurrency model. I chose this name to match the metaphor of "parallel workers. Other developers may choose other names (such as the reactor system or event-driven system) based on the platform or community ). Indicates a pipeline concurrency model:

Similar to workers on production lines in factories. Each worker is only responsible for part of the work. After completing this part of work, the worker forwards the job to the next worker. Each worker runs in its own thread and does not share the status with other workers. And sometimes becomeNo sharingParallel model.

We usually use non-blocking IO to design a system that uses the pipeline concurrency model. Non-blocking IO means that once a worker starts an IO operation (such as reading a file or reading data from a network connection), the worker will not wait until the IO operation ends. IO operations are slow, so it is a waste of CPU time to wait for the IO operation to end. At this time, the CPU can do some other things. When I/O operations are completed, the results of I/O operations (such as read data or data writing status) are passed to the next worker.

With non-blocking IO, I/O operations can be used to determine the boundary between workers. The worker will run as much as possible until an I/O operation is encountered and started. Then hand over the control of the job. When I/O operations are completed, the next worker on the pipeline continues until it also encounters and starts an I/O operation.

In practical applications, jobs may not run along a single pipeline. Since most systems can execute multiple jobs, the flow of jobs from one worker to another depends on the jobs that need to be done. In practice, multiple virtual pipelines may run simultaneously. This is the possible movement of jobs in the pipeline system in reality:

Jobs may even be forwarded to more than one worker for concurrent processing. For example, a job may be forwarded to the job executor and the job logger at the same time. This article explains how to forward the three pipelines to the same worker (the last worker in the intermediate pipeline) to complete the assignment:

The pipeline is sometimes more complex than this situation.

Reactor and event-driven system

A system that uses the pipeline concurrency model is sometimes called a reactor system or an event-driven system. Workers in the system respond to events in the system, which may also come from the external world or from other workers. The event can be an incoming HTTP request or a file is successfully loaded to the memory. At the time of writing this article, there are already many interesting reactors/event-driven platforms available, and more will be available in the near future. The popular ones seem to be:

  • Vert. x
  • AKKa
  • Node. JS (JavaScript)

I personally think Vert. x is quite interesting (especially for those who use Java/JVM like this)

Actors and Channels

Actors and channels are two similar pipelines (or reactors/event-driven) models.

In the Actor model, each worker is called an actor. The Actor can directly send and process messages asynchronously. Actor can be used to implement one or more job processing pipelines as described above. The Actor model is provided:

In the Channel model, workers do not directly communicate with each other. Instead, they publish their own messages (events) in different channels ). Other workers can listen to messages through these channels, and the sender does not need to know who is listening. The Channel model is provided:

At the time of writing this article, the channel model seems more flexible for me. A worker does not need to know who is working on the next pipeline. You only need to know the channel through which the job (or message, etc.) needs to be forwarded. Listeners on the channel can subscribe to or unsubscribe at will without affecting the workers who send messages to the channel. This gives workers loose coupling.

Advantages of the pipeline model

Compared with the parallel worker model, the pipeline concurrency model has several advantages. In the following sections, I will introduce several major advantages.

No shared status required

There is no need to share the status between workers, which means that you do not need to consider all the concurrency issues caused by concurrent access to shared objects. This makes it very easy to implement workers. When implementing workers, it is like a single thread is processing work-basically a single thread implementation.

Stateful workers

When the worker knows that no other thread can modify their data, the worker can become stateful. For stateful, I mean, they can save the data they need to operate in the memory, simply write the changes back to the external storage system. Therefore, stateful workers generally have higher performance than stateless workers.

Better Hardware integration (Hardware Conformity)

Single-threaded Code tends to have better advantages when integrating underlying hardware. First, you can create more optimized data structures and algorithms when the code is only executed in single-threaded mode.

Second, as described above, single-thread stateful workers can cache data in the memory. While caching data in the memory, it also means that the data may also be cached in the CPU that executes this thread. This makes access to the cache faster.

I am talking about hardware integration, which refers to the code written in some way, so that it can naturally benefit from the working principles of the underlying hardware. Some developers call itMechanical sympathy. I prefer the term hardware integration, because computers have only a few mechanical components and can be used as a metaphor for "better match better" than "sympathy) "The meaning of the word in the context, I think the word" conform "is very good. Of course, this is a bit tricky. Just use your favorite terms.

Proper job order

A Concurrent System Based on the pipeline concurrency model may guarantee the order of jobs to some extent. The ordering of jobs makes it easier to roll out the system status at a specific time point. Furthermore, you can write all the jobs that arrive into the log. Once a part of the system fails, the log can be used to re-start rebuilding the current state of the system. Write jobs into logs in a specific order and use this order as a guaranteed job order. Shows a possible design:

It is not easy to implement a guaranteed job sequence, but it is often feasible. If possible, it will greatly simplify some tasks, such as backup, data recovery, and data replication, which can be completed through log files.

Disadvantages of the pipeline model

The biggest drawback of the pipeline concurrency model is that the execution of jobs is usually distributed to multiple workers, and therefore distributed to multiple classes in the project. This makes it difficult to track the code that executes a job.

Similarly, this increases the difficulty of coding. Sometimes the code of the worker is written in the form of callback processing. If too many callback processing is embedded in the Code, the so-called callback hell will often occur. The so-called callback hell means that it is very difficult to track what the code has done in the callback process and ensure that each callback only accesses the data it needs.

This problem can be simplified by using the parallel worker model. You can open the code of the worker and read the executed code from start to end. Of course, code in the parallel worker mode may also be distributed in different classes, but it is often easy to analyze the execution sequence from the code.

Function-based parallel processing (Functional Parallelism)

The third concurrency model is the functional parallel model, which is also discussed recently in (2015. The basic idea of functional parallelism is to use function calls to implement programs. A function can be considered as an "agent" or "actor". functions can send messages to each other like a pipeline model (AKA reactor or event-driven system. A function calls another function. This process is similar to sending a message.

All functions PASS Parameters through the copy operation. Therefore, no entity can operate data except receive functions. This is necessary to avoid competition for shared data. Similarly, function execution is similar to atomic operations. The execution of each function call is independent of any other function call.

Once each function call can be executed independently, they can be executed separately on different CPUs. This means that functions can be used for parallel execution on the multi-processor.

The ForkAndJoinPool contained in java. util. concurrent in java 7 can help us implement something similar to functional parallel. In Java 8, parallel streams can be used to help us iterate large sets in parallel. Remember that some developers criticized ForkAndJoinPool (you can see the criticism link in my ForkAndJoinPool tutorial ).

In function-based parallelism, the most difficult thing is to determine the function call that requires parallelism. Cross-CPU coordination function calling requires a certain amount of overhead. The unit of work completed by a function must reach a certain size to compensate for this overhead. If the function call function has little effect, it may be slower to run it in parallel than a single thread or a single CPU.

I personally think (may not be correct) that you can use a reactor or event-driven model to implement an algorithm and implement decomposition of work in a way like functional parallelism. Using the event-driven model can more accurately control how to implement parallelism (my point of view ).

In addition, the overhead caused by splitting a task to multiple CPUs is meaningful only when the task is the only task currently executed by the program. However, if the current system is executing multiple other tasks (such as web servers, database servers, or many other similar systems), it makes no sense to parallelize a single task. No matter how many other CPUs in the computer are busy processing other tasks, there is no reason to disturb them with a slow, functional parallel task. It may be better to use the pipeline (Reactor) concurrency model because it has lower overhead (sequential execution in single-threaded mode) and can better integrate with underlying hardware.

Which concurrency model is the best?

So which concurrency model is better?

Generally, this answer depends on what your system is going to do. If your job is in a parallel, independent State and does not need to be shared, you may use the parallel Worker Model to implement your system. Although many jobs are not naturally parallel and independent. For this type of system, I believe that using the pipeline concurrency model can give full play to its advantages, and it is more advantageous than the parallel worker model.

You don't even have to write the infrastructure of all the pipeline models in person. A modern platform like Vert. x has already implemented a lot for you. I will also explore how to design my next project so that it runs on excellent platforms like Vert. x. I feel that Java EE has no advantages.

Original article, reprinted Please note:Self-Concurrent Programming Network-ifeve.comLink:Concurrent Programming Model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.