Characteristics:
Strong data correlation;
Often exhibit poorer memory access locality
Too little processing for a single vertex
Along with the change of parallelism in the calculation process
Large graph calculation mainly includes two kinds:
A real-time graph database based on traversal algorithm, such as NEO4J, Orientdb, Dex and Infinite graph;
Based on the vertex-centric parallel engine, such as Goldenorb, Giraph, Pregel and Hama, the graph processing software is a parallel graph processing system based on BSP model.
BSP: one BSP (Bulk synchronous Parallel Computing model, also known as "large synchronous" models) the calculation process includes a series of global hyperlinks (the so-called super step is an iteration in the calculation), each step consists of three components:
Local calculation: Each participating processor has its own computing task
Communications: Processor groups Exchange data with each other
Fence Sync (Barrier synchronization): When a processor encounters a "roadblock" (or fence), it waits until all other processors complete their calculation steps
"Troika":
Caffeine: Building a large scale Web page index
Dremel: Real-time interaction
Pregel: Based on BSP parallel graph computation Processing
Pregel is a parallel graph processing system based on BSP model implementation
In order to solve the problem of distributed computing of large scale graphs, Pregel has built a scalable and fault-tolerant platform, which provides a very flexible API that can be used to describe all kinds of graph computations.
Pregel as the computational framework of distributed graph calculation, mainly used for graph traversal, shortest path, pagerank calculation, etc.