Basic parameters in the BSP Model
The BSP model serves as a bridge between the computer language and the architecture. The following three parameters describe the distributed storage multi-computer model:
● Processor/memory module. P is used in the model to represent the number of processor/memory modules.
● A vro that transmits messages point-to-point between processors and memory modules. In this model, G represents the router throughput (also known as the bandwidth factor );
● Execution interval L is the obstacle synchronization of periodical obstacle synchronization, where l represents the time interval between global synchronization;
Computing in the BSP Model
The BSP model can be represented in the following figure. In the BSP model, computation is composed of a series of Computation Tasks separated by global synchronization with a period of L. These computation are called supersteps ). In each superstep, each processor executes local computing and receives and sends messages through the selector. Then, a global check is performed to determine whether the superstep has been completed by all processors; if yes, go to the next superstep. Otherwise, the next L-cycle is allocated to an unfinished superstep.
Cost analysis in BSP Model
In a super computing step of BSP, the cost model of BSP can be abstracted as follows:
Cost per supercomputing step =
Among them, WI is the local computing time of process I, hi is the maximum number of communication packets sent or received by process I, and G is the reciprocal of bandwidth (time step/communication package ), l is the obstacle synchronization time (note that I/O time is not taken into account in the BSP cost model ). Therefore, in BSP calculation, if s supersteps are used, the total running time is:
This performance formula is very easy to analyze algorithms and programs.
The nature and features of the BSP Model
The BSP model is a distributed storage MIMD computing model, which features:
● It separates the processor from the router, emphasizing the separation of computing tasks and communication tasks. The router only implements point-to-point message transmission and does not provide functions such as combination, replication, and broadcast, this not only masks the specific interconnection network topology, but also simplifies the communication protocol;
● The hardware-based global synchronization is implemented in a controllable coarse-grained manner, which provides parallel execution of tightly coupled synchronization.AlgorithmAndProgramEmployees are not overly burdened;
● When analyzing the performance of the BSP model, it is assumed that local operations can be completed in one time step, and in each super step, A single processor can send or receive at most h messages (referred to as H-relation ). Assume that S is the transmission establishment time, so the time for sending H messages is GH + S. If , Then l should be at least equal to or greater than GH. It is clear that the hardware can set L as small as possible (for example, to use a pipeline or a large communication bandwidth to make G as small as possible), and the software can set the L upper limit (because l is larger, larger parallel granularity ). In actual use, G can be defined as the ratio of the number of local computations that the processor can perform per second to the amount of data that the router can transmit per second. If balanced computing and communication are appropriate, the BSP model has major advantages in programming, while directly executing algorithms on the BSP model (rather than automatically compiling them ), this advantage will become more obvious as G increases;
● Algorithms designed for the pram model can be implemented by simulating some pram processor methods on each BSP processor. Theoretical analysis shows that this kind of simulation is optimal within the range of constant factors, as long as the parallel loose (parallel slackness), that is, the number of pram processors that each BSP processor can simulate is large enough. Concurrent accesses to distributed memory by multiple processors at the same time may cause some problems. However, the hash method can be used to allow the program to access distributed memory evenly. In the case of PRAM-EREW, if the selected hash function is valid enough, then l is at least logarithm, so the simulation can achieve the best, this is because we want to simulate Virtual processors. Virtual processors are allocated to each physical processor. In a super step, the V access requests can be evenly distributed, with each processor having about V/P times. Therefore, the optimal time for the computer to execute this super step is O (V/P ), the probability is high. Similarly, in the PRAM-CRCW model of the V processor ), And The BSP model can use o (V/P) Time to achieve the best simulation.
BSP model evaluation
● In parallel computing, valiant tries to build a bridge between software and hardware similar to the von o nuoman machine, which demonstrates that the BSP model can play such a role, because of this, the BSP model is also called the bridge model.
● in general, the MIMD model of distributed storage has poor programmability, but in the BSP model, if computing and communication can be properly balanced (for example, G = 1), it presents the main advantages in programming;
● In the BSP model, some important algorithms (such as matrix multiplication, parallel pre-order operations, FFT, and sorting) were directly implemented ), they all avoid the additional overhead of Automatic Storage Management;
● the BSP model can be effectively implemented in the over cube network and optical cross-switch interconnection technology., show This model has nothing to do with specific technical implementation, as long as the router has a certain Communication Throughput;
● In the BSP model, the length of a superstep must be fully adapted to any H-relation, which is the least desirable.
● In the BSP model, A message sent from a superstep can only be used in the next superstep even if the network delay time is shorter than the length of the superstep;
● The Global obstacle synchronization assumption in the BSP model is supported by special hardware, which may not have the corresponding hardware in many parallel machines;
● In the programming simulation environment proposed by Valiant, the constants in algorithm simulation may not be small, this constant may be large if we consider inter-process switching (not only register setting, but also some high-speed caches.