1 Basic Concepts
Multi-core architecture for shared memory : A single package encapsulates multiple interconnected processors, and all cores have access to the main memory. Some microarchitecture of the shared memory multi-core system, such as the kernel pause function, overclocking.
Kernel pause function : When the kernel is used, the operating system will put the rest of the kernel into hibernation, and when these cores are needed, the operating system will wake up the sleeping cores.
overclocking : The process of raising a kernel frequency. When the kernel is heavy, the kernel works at a higher frequency.
Distributed Memory System : consists of multiple processors (CPUs), each of which can be located on a different computer, can have different types of communication channels between computers, and have their own private memory.
Physical Kernel : The physical kernel is a truly independent processing unit.
hardware thread (logical core \ Logical processor) : Each physical core may provide multiple logical cores.
software threads (commonly referred to as threads) : The smallest unit of program execution flow, sometimes called a lightweight process (Lightweight PROCESS,LWP). Each software thread shares a private, unique memory space with its parent process, but each software thread has its own stack, register, and private local storage area.
Process : is a computer program on a data set on a running activity, the system is the basic unit of resource allocation and scheduling, is the basis of the operating system structure. In the early process design-oriented computer architecture, the process is the basic execution entity of the program, and in the contemporary thread-oriented computer architecture, the process is the container of the thread. A program is a description of instructions, data, and its organization, and the process is the entity of the program. each running program in Windows is a process.
Main Thread : when a program starts, a process is started by the operating system ( OS) is created, while a thread is also running immediately, which is often called the main thread of the program. a process has at least one main thread.
Excess Application : An oversubscription occurs when the application uses more concurrent threads than the number of logical cores.
Load Balancing : Different tasks are assigned to the same amount of work to make efficient use of processor resources.
Load Imbalance: different tasks are assigned to different workloads, so that some tasks have nothing to do and do not use processor resources effectively.
Concurrency : Multiple instructions are executed at the same time period.
Parallel : When the system has more than one When the CPU is idle, two threads do not preempt CPU resources, which can be done simultaneously, which is called parallelism.
Interleaved concurrency : Executes one thread at a time, with instructions interleaved by two threads.
Competition : The calculation results depend on the order in which the statements are executed, and the order of execution is not controlled or synchronized.
Competitive Conditions : Conditions in which competition arises.
dead Lock : Refers to two or more than two processes in the course of execution, because of competing resources or due to the communication between each other caused by a blocking phenomenon, if there is no external force, they will not be able to proceed.
Live Lock : Similar to deadlock, the difference is that the thread state in a live lock is constantly switching between two states and the thread is not blocked.
Lock Contention : Multiple threads compete for the same lock.
Lock Seal Protection : Lock protection occurs when multiple threads with the same priority are repeatedly contending for the same lock. Unlike a deadlock and a live lock, a thread in a lock is still going forward, but each time a thread attempts to acquire a lock and fails, it yields the remaining dispatch and forces the context switch.
critical Section : The serial time period between two parallel parts that need to be executed sequentially is called the critical section.
the. NET Framework 4 introduced the TPL:. NET Framework 4 introduced a new TPL (Task Parallel Library, tasks Parallel libraries), using the new lightweight concurrency model. The new TPL supports data parallelism, task parallelism, and pipelining.
1) Data parallelism: For a large amount of data to be processed, each piece of data performs the same action.
2) Task parallelism: Run different operations concurrently.
3) Pipelining: The combination of data parallelism and task parallelism, to coordinate multiple concurrent tasks.
2 Concept Analysis
1) Concurrent, interleaved concurrency
Figure 1 is concurrent, with two threads, numbered 0 and 1, respectively. Each thread has two instructions, 0-0 represents the first instruction of the first thread, 0-1 represents the second instruction of the first thread, and so on.
Figure 2 is interleaved concurrency, with two threads, numbered 0 and 1, respectively. Each thread has two instructions, 0-0 represents the first instruction of the first thread, 0-1 represents the second instruction of the first thread, and so on. As can be seen, the instructions of two threads are executed alternately, executing only one instruction at a time.
Figure 1 concurrency
Figure 2 interleaved concurrency
2) physical kernel, hardware thread
Intel Core i5-3470 Processor has four physical cores with one hardware thread per core.
Intel Xeon Processor e7-8893 v4 has four physical cores with two hardware threads per core.
Visible A the number of physical cores Processor does not necessarily equal the number of hardware threads.
Figure 3 Intel Core i5-3470 Processor
Figure 4 Intel Xeon Processor e7-8893 v4
3 Amdahl Law and Gustafson Law
Amdahl Law
function : Predicting maximum theoretical performance improvement for multiprocessor systems (speedup,speedup)
Formula : Maximum acceleration ratio (multiples)= 1/((1-P) + (P/N))
among them: P represents the proportion of code that can run in full parallel
n indicates the available computer units (number of processors or physical cores)
Law Flaws :
Only the changes in the number of physical cores are considered, and it is not considered possible to add new functionality to the existing applications to take advantage of the increased parallelism.
Only hardware changes were considered, and the size of the problem to be addressed was not taken into account.
The overhead associated with parallelism is not considered.
The conversion of a serial part to an algorithm that takes advantage of parallelization is not considered.
Gustafson Law
function : Predict the amount of work that can be performed within a fixed time by the size of the problem
Formula : Total work (number of units)=s+n*p
among them: s indicates the number of units of work to be completed in one order
P indicates the number of work units that each part can execute in full parallel
n indicates the number of available execution units (number of processors or physical cores)
Law Flaws :
The overhead associated with parallelism is not considered.
The conversion of a serial part to an algorithm that takes advantage of parallelization is not considered.
Introduction of programming Guidelines:
When the program serial portion consumes time and problem size, adjust the problem size to get a better speedup.
When the amount of data that can be processed in parallel is limited, new features can be added to take advantage of the parallel processing power of modern hardware.
Minimize critical areas.
4. Multi-core parallel programming principles
(1) Thinking in a parallel way
(2) Use abstract functionality to take full advantage of the new features provided by the TPL (Task Parallel Library) in the. NET Framework 4
(3) programming according to tasks (things) rather than threads (CPU cores), focusing on tasks rather than underlying threads
(4) Consider closing the concurrency when designing, considering that the program can also be run on a single-core processor
(5) Avoid the use of locks
(6) leveraging tools and libraries designed to help with concurrency
(7) using a scalable memory allocator
(8) When designing, consider scaling with increasing workloads
Reference Material:
1) Advanced Course in C # parallel programming (Chinese version) Hillary Clinton , Zheng / Fang Peizh , published by Tsinghua University Press
2) design mode . NET Parallel Programming (Chinese version)
3) Https://en.wikipedia.org/wiki/Lock_convoy
4) Https://en.wikipedia.org/wiki/Deadlock#Livelock
5) https://en.wikipedia.org/wiki/Concurrent_computing
6) https://en.wikipedia.org/wiki/Parallel_computing
7) Http://ark.intel.com/products/68316/Intel-Core-i5-3470-Processor-6M-Cache-up-to-3_60-GHz
8) Http://ark.intel.com/products/93791/Intel-Xeon-Processor-E7-8893-v4-60M-Cache-3_20-GHz
-----------------------------------------------------------------------------------------
Reprint and quote please specify the source.
Time haste, the level is limited, if has the improper place, welcome correction.
. NET multithreaded Programming-pre-knowledge