In the current computer application, the demand for high-speed parallel computing is extensive, summed up, there are three types of application requirements:
- Computationally intensive (computer-intensive) applications such as large-scale scientific engineering calculations and numerical simulations;
- Data-intensive (data-intensive) applications such as digital libraries, data warehouses, data mining, and computational visualization;
- Network-intensive (network-intensive) applications such as collaborative work, remote control, and telemedicine diagnostics.
There are three main types of parallel programming models: Multi-threaded programming model for shared memory, a message-passing programming model for distributed memory, and a hybrid programming model.
In a computer system, the processor is always the fastest to access the storage space closest to itself, such as L1 cache->l2-> Local node memory, remote node memory/disk, and the storage capacity at each level is the opposite of access speed. In parallel computing, the design of parallel algorithms is the key to performance, some problems inherently have good parallelism, such as data sets to be processed can be better decoupled, and some problems require complex formula derivation and transformation to fit parallel computing. At the same time, to avoid possible bottlenecks in the calculation process, the task partition should take full account of load balancing, especially dynamic load balancing, the idea of "equivalence" is one of the keys to maintain load balancing and maintain the scalability, that is, to avoid using Master/slave and Client/server mode at design time.
1. Parallel machine System
Parallel machine development from SIMD to MIMD, derived in addition to four classic architectural patterns: SMP (symmetric shared-memory multiprocessor, such as common multi-core machines, poor scalability, number of processors 8~16), DSM (distributed shared-memory, the physical memory is distributed across processing nodes, and the logical address space is unified addressing, so it belongs to shared storage, the time of the visit is limited by the network bandwidth), MPP (Massive Parallel Processor, A large-scale system consisting of hundreds of processors, a symbol of the country's comprehensive strength. ), cluster system (Cluster, interconnected homogeneous or heterogeneous independent computer aggregation, each node has its own memory, I/O, operating system, can be used as a single machine, between nodes using commodity network interconnection, flexible).
Hardware: Multi-core CPU (Intel, AMD), GPU (Nvidia), Cellbe (SONY&TOSHIBA&IBM, including a master processing Unit and 8 co-processing units)
Concept: Data bus address bus control bus (register) bit number
Lao Li share: Parallel Computing Fundamentals & programming models and Tools 1