Parallel Computing Fundamentals & programming models and tools

Source: Internet
Author: User

In the current computer application, the demand for high-speed parallel computing is extensive, summed up, there are three types of application requirements:

    1. Computationally intensive (computer-intensive) applications such as large-scale scientific engineering calculations and numerical simulations;
    2. Data-intensive (data-intensive) applications such as digital libraries, data warehouses, data mining, and computational visualization;
    3. Network-intensive (network-intensive) applications such as collaborative work, remote control, and telemedicine diagnostics.

There are three main types of parallel programming models: Multi-threaded programming model for shared memory, a message-passing programming model for distributed memory, and a hybrid programming model.

In a computer system, the processor is always the fastest to access the storage space closest to itself, such as L1 cache->l2-> Local node memory, remote node memory/disk, and the storage capacity at each level is the opposite of access speed. In parallel computing, the design of parallel algorithms is the key to performance, some problems inherently have good parallelism, such as data sets to be processed can be better decoupled, and some problems require complex formula derivation and transformation to fit parallel computing. At the same time, to avoid possible bottlenecks in the calculation process, the task partition should take full account of load balancing, especially dynamic load balancing, the idea of "equivalence" is one of the keys to maintain load balancing and maintain the scalability, that is, to avoid using Master/slave and Client/server mode at design time.

1. Parallel machine System

Parallel machine development from SIMD to MIMD, derived in addition to four classic architectural patterns: SMP (symmetric shared-memory multiprocessor, such as common multi-core machines, poor scalability, number of processors 8~16), DSM (distributed shared-memory, the physical memory is distributed across processing nodes, and the logical address space is unified addressing, so it belongs to shared storage, the time of the visit is limited by the network bandwidth), MPP (Massive Parallel Processor, A large-scale system consisting of hundreds of processors, a symbol of the country's comprehensive strength. ), cluster system (Cluster, interconnected homogeneous or heterogeneous independent computer aggregation, each node has its own memory, I/O, operating system, can be used as a single machine, between nodes using commodity network interconnection, flexible).

Hardware: Multi-core CPU (Intel, AMD), GPU (Nvidia), Cellbe (SONY&TOSHIBA&IBM, including a master processing Unit and 8 co-processing units)

Concept: Data bus address bus control bus (register) bit number

2. Parallel programming models and tools

–mpi–

MPI (Message passing Interface) is a messaging programming model that serves process communication. It is not specific to an implementation of it, but rather a standard and canonical representation, which is a library description, not a language, easy to use and highly portable. To be blunt is some programming interface.

–openmp–

The Open multi-processing is a portable parallel programming model for shared memory multiprocessor architectures, with an interface initiated by SGI. Contains the compilation guide, the Run function library and the environment variable three parts, has the serial equivalence (whether using one or more threads to run a program, all results in the same result, easier to maintain and understand) and incremental parallelism (the processor starts with a serial program, and then looks for those snippets that are worth parallelization). Openmpi's execution model adopts the Fork-join form, that is, the main thread, which reduces the difficulty and complexity of parallel programming.

Compiler guidance statements, supported by Visio Studio, enable OpenMP to be seen as a parallel program or as a serial program, or to make it easy for a user to rewrite a serial program as a parallel program while keeping the serial program part intact.

–mapreduce–

Google,pagerank the construction of the inverted table index. Map inputs input into the middle of the key/value pair, reduce the Key/value synthesis final output.

–hadoop–

Open source version of MapReduce. Hfds,namenode (Jobtracker), DataNode (tasktracker), cluster architecture.

–cuda–

The GPU Parallel computing tool developed by NVIDIA.

–cellbe–

The main goal of Cellbe is to increase the processor performance of PlayStation2 by 10 times times, and in 2006 IBM introduced the Cell Blade computer system.

Reference: Fundamentals of parallel computer Programming & CUDA Course

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Parallel Computing Fundamentals & programming models and tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.