MapReduce implementation of single source shortest path algorithm (Metis version)

Source: Internet
Author: User

1. MapReduce Framework 1.1 MapReduce Introduction

Mapreduce is a distributed computing framework proposed by Google, which allows users to easily use multiple machines to process data in parallel. The framework has two important functions: Map and Reduce, the map function processes the entire input data, and produces intermediate key-value pairs (key, value) from the input data according to user-defined processing. The reduce function handles these key-value pairs, and key-value pairs of the same key are processed by the same Reduce process. The results of the processing are eventually merged. The entire processing process is as follows:

1.2 Phoenix & Metis

Phoenix is a MapReduce framework implemented by Stanford's multi-core multiprocessor system, and the Phoenix framework automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance. In essence, there is no difference between user-written multithreaded programs, but through this framework, it is easier to write a data processing business that can be divided into "map-reduce-merge" mode.

Metis framework is improved on the basis of Phoenix, Phoenix in the map phase using a hash table, for each entry in the Hashtable using a sorted array to save, and Metis adopted the BTree way to replace, so as to improve the speed.

2. Metis Framework 2.1 Three data processing modes
    • Map_reduce
    • Map_group
    • Map_only
2.2 Framework Use
struct  SPFA: public  map_reduce{bool  split (split_t *out , int  ncores);    void  map_function (split_t *ma); void  reduce_function (void  *k, void  **v, size_t length); int  key_compare (const  void  *s1, const   void  *s2);};  
    • Split: The data segmentation function, where the user defines how to divide the data and save the data slice information in out according to the Ncores parameter.

    • Map_funcion: Each data shard is processed to produce a key-value pair.

    • Reduce_function: The key value pair is processed.

    • Key_compare: A comparison function for user-defined key-value pairs

2.3 Program Execution Flow
    SPFA app;    app.set_reduce_task(reduce_tasks);    app.set_ncore(nprocs);    mapreduce_appbase::initialize();    app.sched_run();    mapreduce_appbase::deinitialize();
    • Set_reduce_task: Set the number of reduce threads

    • Set_ncore: Set the number of cores used

    • Sched_run: Program start, internal will be called data segmentation, map, reduce, merge and other operations

    • Number of map threads: indirectly controlled by the data segmentation function, the number of data blocks corresponds to Map_tasks.

3. Single source shortest path via algorithm 3.1 SPFA algorithm

SPFA's idea is very simple, simply said is the Bellman-ford algorithm with queue optimization, using the relaxation operation, to update the distance. With the introduction of queues, there are several optimizations:

    • Reduced slack operation: The Bellman-ford algorithm does not need to use all the points to relax the other points in each iteration, reducing the number of slack.

    • Negative ring judgment: by the number of points into the queue, you can determine whether there is a negative ring, if enter the queue n times, it indicates that there is a negative ring.

3.2 Dijkstra algorithm

The Dijkstra algorithm uses the greedy way, each time from the updated node, selects the smallest and the value of the point can not be updated by the remaining nodes (without negative loop).

4. Realization of Design 4.1 SPFA algorithm

How graphs are stored: two-dimensional arrays
Data processing mode: map_only
Mapreduce Processing Process:

  1. Data partitioning: The node collection of the entire graph is evenly divided by the number of map_tasks, with each map processing part of it.

  2. Map: In a data shard, use theUpdate Queue [ 1 " The current data shard is relaxed, and if an update operation occurs, theTag Array [ 2 " The point is recorded and modified in theDist [ 3 " The value of the array.

  3. Replace update queue: Empties the update queue, joins the flagged nodes in the update queue, and
    The other nodes are relaxed on the next iteration.

  4. If the update queue is empty, the iteration ends, otherwise the next iteration continues.

Description
[1] Update queue: The queue that the node is updated into, and then use the nodes in that queue to relax the nodes of the entire graph.
[2] Tag array: Used to mark whether the node has been updated in the current iteration as a basis for whether to update the queue in the next iteration.
[3] Dist array: The closest distance from the source point to another point, the initial value is infinity.

4.2 Dijkstra Algorithm Implementation

How graphs are stored: two-dimensional arrays
Data processing mode: map_reduce
Mapreduce Processing Process:

    1. Data partitioning: The node collection of the entire graph is evenly divided by the number of map_tasks, with each map processing part of it.

    2. Map: In a data shard, use the point nearest to the source point in the data shard to
      The distance of the point is relaxed, and if an update operation occurs, the value of the point is sent as key, and the number of the points is the%reduce_tasks of value.

    3. Reduce: In each reduce, find the nearest point of the current source point, the key value pair that is sent (the number of points, the distance of the points).

    4. Find the nearest point: in the final processed result array result_, select the nearest point from the source point,
      To update other points in the next iteration. If the result_ array is empty, exit the iteration.
      Description: The point nearest to the source point is divided into two stages, first in the reduce phase to find the most recent, and finally in the local nearest result set to find the most recent point in the global.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

MapReduce implementation of single source shortest path algorithm (Metis version)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.