People who have visited the factory assembly line must be familiar with the assembly line name. The semi-finished products flow through a series of assembly line nodes on the belt conveyor. Each node is further assembled in its own way and then transmitted to the next node. Modern High-Performance CPUs all adopt this pipeline design, which divides computing tasks into several stages, including finger fetch, decoding, execution, memory access, and feedback. The biggest advantage of pipeline design is to increase the system throughput. For example, when the first instruction is in the execution stage, the decoding unit can translate the Second instruction, the third instruction can be loaded. Even some nodes can execute in parallel. For example, a modern MIMD multi-instruction and multi-data computer can execute multiple commands at the same time or update multiple data at the same time.
After intel realized that frequency has become the bottleneck of CPU performance, multi-core processors emerged. Nowadays, the foundation of high-performance programming has been changed to how to make full use of CPU resources and process data more quickly. The open-source TBB library developed by Intel cleverly utilizes the idea of pipeline, an Adaptive High-performance software pipeline TBB: pipeline is implemented. This article will take text_filter as an example to briefly introduce the implementation principle and some key technical points of pipeline, in order to achieve the effect of attracting others.
Introduction to TBB: Pipeline I have to talk about the engine-task scheduler of the TBB database before. It is also called the heart of the TBB database [intel TBB nutshell book], which is the basic component of all algorithms, it is used to drive the operation of the entire TBB database. For example, the parallel_for algorithm provided by the TBB library contains traces of the task scheduler, and pipeline is no exception.
Let's take a look at the implementation of parallel_for:
Template <typename range, typename body>
Void parallel_for (const range & range, const Body & Body, const simple_partitioner & partitioner = simple_partitioner ()){
Internal: start_for <range, body, simple_partitioner >:: run (range, body, partitioner );
}
Let's look at the following:
Template <typename range, typename body, typename partitioner>
Class start_for: public task {
Range my_range;
Const body my_body;
Typename partitioner: partition_type my_partition;
/* Override */task * execute ();
//! Constructor for root task.
Start_for (const range & range, const Body & Body, partitioner & partitioner ):
...
}
As you can see, class start_for is inherited from the task, and this class task is the basic element of task scheduling in the task scheduler, which is also the soul of the TBB database. Compared with the native thread Library (raw thread), such as POSIX thread (pthread), the TBB library can be considered as an encapsulation of a higher level of multithreading, and it no longer uses the thread, task is used as the basic task abstraction to better integrate computing resources and optimize scheduling tasks. All the advantages of the TBB library, such as automatic workload adjustment and system scalability, are attributed to the task scheduler. Every algorithm provided by TBB has its own unique application background. If the algorithm cannot meet your needs, you can use the task class to derive a new class, extends the new task execution and scheduling algorithms. This idea runs through the entire design of TBB, and TBB: pipeline is also a typical embodiment of this idea.
TBB: Pipeline advantages:
Ensure the sequence of Data Execution
Automatic Thread Load Adjustment
Higher cache hit rate
System scalability
If there is such a task, analyze the content of a file, change the first character of each string to uppercase, and then write it into a new file.
A traditional serial execution solution is:
Create read and write files respectively
While (! EOF)
{
Read a string from a file
First character to uppercase character
Write a string to a file
}
Disable file descriptor reading and writing
Can such a simple process provide performance through TBB: Pipeline? Let's take a look at the pipeline solution:
1. Create file descriptors for reading and writing respectively.
2. create three tasks: "reading a string from a file", "converting the first character to an uppercase character", and "writing a string to a file ", you need to specify the "read a string from a file" and "Write a string to a file" tasks for serial execution. (Why serial execution? please think about it yourself or go to Intel TBB's nutshell book)
3. Start pipeline to schedule the running of these tasks through the built-in task scheduler.
Using a 29 MB file as a test case, the speed of serial execution on my dual-core machine is 0.527582 seconds, and the speed of pipeline is 0.446161. For more complex logic, the performance of pipeline is also significantly improved. The mystery of performance improvement lies in the ability of pipeline to automatically execute the task "converting the first character into uppercase characters" in parallel based on the system situation.
For details about the sample code and usage of pipeline, refer to the intel TBB's nutshell book:
1. Why can pipeline ensure the sequence of Data Execution? Since TBB executes tasks through multiple threads in the final analysis, why is the string read after reading two strings first processed by the next task? Is there something in pipeline similar to FIFO first-in-first-out queue?
2. Why can pipeline automatically execute the task "converting the first character into uppercase characters" in parallel? If this task is executed in parallel, how can we ensure the first point?
3. How does pipeline ensure that tasks are executed in serial mode.
4. What is the so-called "automatic Task Scheduling Based on System Conditions?
These are both problems and key technical points in pipeline. If you are interested, you can take a look at the pipeline code first.
Intel TBB's nutshell book -- <intel threading building blocks-outfitting C ++ for multi-core processor parallelism>
<To be continued>
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/softarts/archive/2009/04/25/4123957.aspx