Giraph source code analysis starts the ZooKeeper Service

Source: Internet
Author: User
(1) experiment environment. Three servers: test165, test62, and test63. Test165 is both JobTracker and TaskTracker. test example: SSSP program provided on the official website. The data is generated by simulation. Run the command: hadoopjargiraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jaro

Note: (1) experiment environment. Three servers: test165, test62, and test63. Test165 is both JobTracker and TaskTracker. test example: SSSP program provided on the official website. The data is generated by simulation. Run the command: hadoop jar giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar o

Note:

(1) experiment environment.

Three servers: test165, test62, and test63. Test165 is both JobTracker and TaskTracker.

Test example: The SSSP program comes with the official website. The data is generated by simulation.

Run the hadoop jar giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org. apache. giraph. giraphRunner org. apache. giraph. examples. simpleShortestPathsVertex-vif org. apache. giraph. io. formats. jsonLongDoubleFloatDoubleVertexInputFormat-vip/user/giraph/SSSP-of org. apache. giraph. io. formats. idWithValueTextOutputFormat-op/user/giraph/output-sssp-debug-7-w 5

(2). To save space, all the codes below are core code snippets.

(3) In the core-site.xml hadoop. tmp. dir path is set to:/home/hadoop/hadooptmp

(4). writing this article is completed after debugging multiple times. Therefore, the JobID in this article is different and can be understood as the same JobID.

(5) Subsequent articles also follow the above rules.

1. org. apache. giraph. graph. GraphMapper class

You can customize the org. apache. Giraph. graph. GraphMapper class in giraph to inherit from org. apache. Hadoop. mapreduce. Mapper in hadoop. Class that overwrites the setup (), map (), cleanup (), and run () methods. The GraphMapper class is described as follows:

This mapper that will execute the BSP graph tasks alloted to this worker. all tasks will be stored med by calling the GraphTaskManager object managed by this GraphMapper wrapper classs. since this mapper will not be passing data by key-value pairs through the MR framework, the Mapper parameter types are irrelevant, and set to Object type.

The BSP operation logic is encapsulated in the GraphMapper class, which has a GraphTaskManager object for managing Job tasks. Each GraphMapper object is equivalent to a compute node in BSP ).

In the setup () method of the GraphMapper class, create the GraphTaskManager object and call its setup () method for initialization. As follows:

  @Override  public void setup(Context context)    throws IOException, InterruptedException {    // Execute all Giraph-related role(s) assigned to this compute node.    // Roles can include "master," "worker," "zookeeper," or . . . ?    graphTaskManager = new GraphTaskManager
 
  (context);    graphTaskManager.setup(      DistributedCache.getLocalCacheArchives(context.getConfiguration()));  }
 
The map () method is empty because all operations are encapsulated in the GraphTaskManager class. Call the execute () method of the GraphTaskManager object in the run () method for BSP iteration calculation.
@Override  public void run(Context context) throws IOException, InterruptedException {    // Notify the master quicker if there is worker failure rather than    // waiting for ZooKeeper to timeout and delete the ephemeral znodes    try {      setup(context);      while (context.nextKeyValue()) {        graphTaskManager.execute();      }      cleanup(context);      // Checkstyle exception due to needing to dump ZooKeeper failure    } catch (RuntimeException e) {      graphTaskManager.zooKeeperCleanup();      graphTaskManager.workerFailureCleanup();    }  }

2. org. apache. giraph. graph. GraphTaskManager class

Function: The Giraph-specific business logic for a single BSP compute node in whatever underlying type of cluster our Giraph job will run on. owning object will provide the glue into the underlying cluster framework and will call this object to perform Giraph work.

The following describes the setup () method. The Code is as follows.

 /**   * Called by owner of this GraphTaskManager on each compute node   * @param zkPathList the path to the ZK jars we need to run the job   */  public void setup(Path[] zkPathList) throws IOException, InterruptedException {    context.setStatus("setup: Initializing Zookeeper services.");    locateZookeeperClasspath(zkPathList);    serverPortList = conf.getZookeeperList();    if (serverPortList == null && startZooKeeperManager()) {      return; // ZK connect/startup failed    }    if (zkManager != null && zkManager.runsZooKeeper()) {        LOG.info("setup: Chosen to run ZooKeeper...");    }    context.setStatus("setup: Connected to Zookeeper service " +serverPortList);    this.graphFunctions = determineGraphFunctions(conf, zkManager);    instantiateBspService(serverPortList, sessionMsecTimeout);  }
The functions of each method are described in sequence:

1) locateZookeeperClasspath (zkPathList): Find the local copy of ZK jar. The path is: /home/hadoop/hadooptmp/mapred/local/taskTracker/root/jobcache/job_201403270456_0001/jars/job. jar to start the ZooKeeper service.
2) startZooKeeperManager (): Initialize and configure ZooKeeperManager. The definition is as follows,

 /**   * Instantiate and configure ZooKeeperManager for this job. This will   * result in a Giraph-owned Zookeeper instance, a connection to an   * existing quorum as specified in the job configuration, or task failure   * @return true if this task should terminate   */  private boolean startZooKeeperManager()    throws IOException, InterruptedException {    zkManager = new ZooKeeperManager(context, conf);    context.setStatus("setup: Setting up Zookeeper manager.");    zkManager.setup();    if (zkManager.computationDone()) {      done = true;      return true;    }    zkManager.onlineZooKeeperServers();    serverPortList = zkManager.getZooKeeperServerPortString();    return false;  }

Org. apache. giraph. zk. ZooKeeperManager class. function: Manages the election of ZooKeeper servers, starting/stopping the services, etc.

The setup () of the ZooKeeperManager class is defined as follows:

/**   * Create the candidate stamps and decide on the servers to start if   * you are partition 0.   */  public void setup() throws IOException, InterruptedException {    createCandidateStamp();    getZooKeeperServerList();  }
The createCandidateStamp () method creates a file for each task in the _ bsp/_ defaultZkManagerDir/job_201403301409_0006/_ task directory of HDFS. The file content is empty. The file name is Hostname + taskPartition of the local machine, as follows:

Five workers (-w 5) are specified during the running, and a master is added, with six tasks on it.

In the getZooKeeperServerList () method,Tasks whose taskPartition is 0 call the createZooKeeperServerList () method to create the ZooKeeper server List.Creates an empty file to describe Zookeeper servers through the file name.

The createZooKeeperServerList core code is as follows:

/**   * Task 0 will call this to create the ZooKeeper server list.  The result is   * a file that describes the ZooKeeper servers through the filename.   */  private void createZooKeeperServerList() throws IOException,      InterruptedException {    Map
 
   hostnameTaskMap = Maps.newTreeMap();    while (true) {      FileStatus [] fileStatusArray = fs.listStatus(taskDirectory);      hostnameTaskMap.clear();      if (fileStatusArray.length > 0) {        for (FileStatus fileStatus : fileStatusArray) {            String[] hostnameTaskArray =              fileStatus.getPath().getName().split(HOSTNAME_TASK_SEPARATOR);             if (!hostnameTaskMap.containsKey(hostnameTaskArray[0])) {            hostnameTaskMap.put(hostnameTaskArray[0],                new Integer(hostnameTaskArray[1]));          }        }        if (hostnameTaskMap.size() >= serverCount) {          break;        }        Thread.sleep(pollMsecs);      }    }  }
 
First, obtain the file in the taskDirectory (_ bsp/_ defaultZkManagerDir/job_201403301409_0006/_ task) directory. If there is a file in the current directory, put the file name (Hostname + taskPartition) the Hostname and taskPartition in are stored in the hostNameTaskMap. After scanning the taskDirectory directory, if the size of hostNameTaskMap is greater than serverCount (equal to the ZOOKEEPER_SERVER_COUNT variable in GiraphConstants. java, defined as 1), the outer loop is stopped. The purpose of the outer loop is: Because multiple tasks are created under the distributed conditions in each task file in taskDirectory, it is possible that when task 0 creates the server List here, other tasks have not yet been generated. By default, Giraph starts a ZooKeeper service for each Job. That is to say, only one task starts the ZooKeeper service.

After multiple tests, task 0 is always selected as the ZooKeeper Server, because in the same process, when scanning taskDirectory, only the corresponding task file is available (other task files have not been generated ), then exit the for loop and find that the size of hostNameTaskMap is equal to 1. Then exit the while LOOP directly. Test162 0 is selected here.

Finally, the file _ bsp/_ defaultZkManagerDir/job_201403301409_0006/zkServerList_test162 0 is created.

OnlineZooKeeperServers (). According to the zkServerList_test162 0 file, Task 0 is zoo. cfg configuration file, use ProcessBuilder to create the ZooKeeper service process, and then Task 0 is connected to the ZooKeeper service process through socket, finally, create the file _ bsp/_ defaultZkManagerDir/job_201403301409_0006/_ zkServer/test162 0 to mark that the master task has been completed. Worker has been cyclically checking whether the master has generated _ bsp/_ defaultZkManagerDir/job_201403301409_0006/_ zkServer/test162 0,That is, the worker waits until the ZooKeeper service on the master node has been started.

The command to start the ZooKeeper service is as follows:

3) determineGraphFunctions ().

The GraphTaskManager class includes the CentralizedServiceMaster object and the CentralizedServiceWorker object, which correspond to the master and worker respectively. The role determination logic of each BSP compute node is as follows:

A) If not split master, everyone does the everything and/or running ZooKeeper.

B) If split master/worker, masters also run ZooKeeper

C) If split master/worker = true and giraph. zkList is set, the master will not instantiate a ZK instance, but will assume a quorum is already active on the cluster for Giraph to use.

This criterion is defined in the static method determineGraphFunctions () in the GraphTaskManager class. The snippet code is as follows:

 private static GraphFunctions determineGraphFunctions(      ImmutableClassesGiraphConfiguration conf,      ZooKeeperManager zkManager) {    // What functions should this mapper do?    if (!splitMasterWorker) {      if ((zkManager != null) && zkManager.runsZooKeeper()) {        functions = GraphFunctions.ALL;      } else {        functions = GraphFunctions.ALL_EXCEPT_ZOOKEEPER;      }    } else {      if (zkAlreadyProvided) {        int masterCount = conf.getZooKeeperServerCount();        if (taskPartition < masterCount) {          functions = GraphFunctions.MASTER_ONLY;        } else {          functions = GraphFunctions.WORKER_ONLY;        }      } else {        if ((zkManager != null) && zkManager.runsZooKeeper()) {          functions = GraphFunctions.MASTER_ZOOKEEPER_ONLY;        } else {          functions = GraphFunctions.WORKER_ONLY;        }      }    }    return functions;  }

By default, Giraph distinguishes master from worker. The zookeeper service will be started on the master node, and the ZooKeeper service will not be started on the worker. Then Task 0 is master + ZooKeeper, and other Tasks are workers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.