Parallel Computing Fundamentals & programming models and tools

Source: Internet
Author: User

In the current computer application, the demand for fast parallel computing is extensive, summed up, there are mainly three types of application requirements:

    1. Computationally intensive (computer-intensive) applications such as large-scale scientific project calculations and numerical simulations;
    2. Data-intensive (data-intensive) applications such as digital libraries, data warehouses, data mining, and computational visualization;
    3. Network-intensive (network-intensive) applications such as collaborative work, remote control, and telemedicine diagnostics.

There are three main types of parallel programming models: multithreaded programming models for shared memory. A message-passing programming model for distributed memory, a hybrid programming model.

In a computer system. Processors are always visiting the fastest storage space, such as L1 cache->l2-> Local node memory, remote node memory/disk, and storage capacity at all levels is the opposite of the speed of access.

In parallel computing, the design of parallel algorithms is the key to determining performance. Some problems are inherently well-parallelized. For example, the data set to be processed can be better decoupled, while some problems require a complex formula derivation and conversion to fit parallel computing. At the same time, avoid possible bottlenecks in the calculation process. Task partitioning to take full account of load balancing, especially dynamic load balancing, the idea of "peering" is one of the keys to maintaining load balancing and maintaining scalability by avoiding the use of master/slave and client/server patterns as much as possible at design time.

1. Parallel machine System

The development of parallel machines from SIMD to MIMD. Derived in addition to four classic architectural patterns: SMP (symmetric shared-memory multiprocessor. For example, a frequently used multi-core machine. Poor scalability. Number of processors 8~16), DSM (distributed shared-memory. The physical memory is distributed across the processing nodes, and the logical address space is used for unified addressing and therefore belongs to shared storage. Access time is limited by network bandwidth). MPP (Massive Parallel Processor. A large-scale system consisting of hundreds of processors, a symbol of the country's comprehensive strength. )。 Cluster System (Cluster. Interconnected homogeneous or heterogeneous set of independent computers, each node has its own memory, I/O, operating system, can be used as a single machine, the node between the use of commodity network interconnection, flexibility.

Hardware: Multi-core CPU (Intel, AMD), GPU (Nvidia), Cellbe (SONY&TOSHIBA&IBM, including a master processing Unit and 8 co-processing units)

Concept: Data bus address bus control bus (register) bit number

2. Parallel programming models and tools

–mpi–

MPI (Message passing Interface) is a message-passing programming model. Service to process communication. It is not specific to a certain implementation of it, but a standard and normative representative, it is a library descriptive narrative, rather than a language, easy to use and highly portable. To be blunt is some programming interface.

–openmp–

The Open multi-processing is a portable parallel programming model for shared memory multiprocessor architectures. The interface was initiated by SGI Corporation.

Includes the compilation guide, the execution function library and the environment variable three parts, has the serial equivalence (whether using one or more threads to execute a program, all bring the same result, easier to maintain and understand) and incremental parallelism (the processor starts with a serial program, and then looks for those snippets that are worth parallelization).

The execution model of Openmpi is used in fork-join form. That is, the main thread-from threads. Reduces the difficulty and complexity of parallel programming.

Compiler guidance statements, supported by Visio Studio, enable OpenMP to be seen as a parallel program or as a serial program, or to easily rewrite a serial program as a parallel program while keeping the serial program part intact.

–mapreduce–

Google. PageRank the construction of the inverted table index.

Map inputs input into the middle of the key/value pair, reduce the key/value synthesis finally outputs output.

–hadoop–

Open source version number for MapReduce. Hfds,namenode (Jobtracker), DataNode (tasktracker), cluster architecture.

–cuda–

The GPU Parallel computing tool developed by NVIDIA.

–cellbe–

The main goal of Cellbe is to increase the processor performance of PlayStation2 by 10 times times, and in 2006 IBM introduced the Cell Blade computer system.

References: Fundamentals of parallel computer Programming & CUDA Course

Parallel Computing Fundamentals & programming models and tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.