Analysis on the heartbeat of the worker in the tachyon framework and the high availability of the master

Source: Internet
Author: User
0 Overview

The master-slave type in the distributed framework. The slave node is responsible for the specific execution of the work, and the master is responsible for task distribution or storage of related metadata. Generally, a master node corresponds to multiple slave nodes, when assigning tasks, the master needs to know which slave nodes can accept their own commands (the slave node may be suspended for various reasons ), therefore, you need to maintain a linked list inside it to save all the living slave nodes. The hmaster of hbase is like this, the namenode of HDFS is like this, and the master node of tachyon is also like this. The server Load balancer node communicates with the master node through heartbeat. The master node regards the Server Load balancer node that receives the heartbeat report as currently alive. Otherwise, the Server Load balancer node fails. In addition to maintaining the memory activity, the master node usually returns the commands to be executed to the slave node through heartbeat. The slave node receives the commands sent by the master to complete an interaction.

Master is the core. If it fails, it will have a fatal impact on the entire system. single point of failure (spof) is a problem that should be considered by every distributed framework. Zookeeper that applies and implements the paxos algorithm is a powerful tool to solve the consistency problem. HDFS, storm, and hbase all use zookeeper as the carrier of metadata information ha, and Tachyon is no exception.

1 worker heartbeat 1.1 overall process

The worker node of tachyon reports the memory used in the current worker and the data block information to the master through heartbeat, after the master receives the heartbeat report, it returns the execution commands related to the worker node. These commands can be register, free, or nothing.

 

Note the following:

(1) The default interval of the worker heartbeat is 1 second, which is set by the tachyon. Worker. to. master. Heartbeat. interval. Ms parameter.

(2) The timeout value of the worker heartbeat parameter is set to 10 seconds by default by tachyon. Worker. Heartbeat. Timeout. Ms.

(3) There are five main Commands returned by the master to the worker: Unknown, nothing, register, free, and delete. The Nothing command does nothing. The register command executes the worker registration to the master, and the master returns the workerid stored locally in the worker; the free command releases the data stored in the worker memory; the DELETE command deletes both the data in the memory and the data on the disk.

(4) The heartbeat method of workerstorage is called.

(5) checkstatus checks the memory usage managed by the current worker node.

1.2 normal heartbeat Processing

 

Note the following:

(1) first, obtain information about all blocks that need to be removed from the memory of the current worker node. Generally, the blocks are removed in the following situations, when the master sends the free command, the workerstorage is initialized, and the memory is insufficient, you need to use the LRU Algorithm for inbound and outbound exchange.

(2) Call the worker_heartbeat method of the masterclient to report the heartbeat to the master. The masterclient will use the masterservice. the client object calls the thrift Service of the master to transmit messages, similar to the dynamic proxy for RPC communication in HDFS.

(3) connection establishment is the Connect Method of the masterclient class called. Its main purpose is to create a masterservice. Client object, that is, the client of the thrift service. The procedure is as follows:

Step 1: Call the cleanconnect method to disable the transport port of thrift and stop the current heartbeatthread. if the object is not null, set misshutdown to true, texception is thrown. For the first clean operation, there is almost nothing to do.

Step 2: Clean, enter the while loop, and prepare to obtain the Master Address to establish a connection. The while loop condition is tries ++ <max_connect_try &&! Misshutdown: retry five times by default.

Step 3: Get the current address of the master. Use the getmasteraddress method to find all nodes in the leader directory in zk. Based on the Creation Time of the node, find the latest node as the active master node to be connected.

Step 4: Initialize the communication protocol between the thrift client and the server. tbinaryprotocol is used here.

Step 5: Initialize the thrift client masterservice. Client object.

Step 6: To enable the Protocol's transport, it is the data transmission channel, ready to read and write. If it fails to open, it will throw ttransportexception, and then stop the heartbeatthread used to maintain the connection, in addition, after sleep1 second, the attempt to retry the while operation fails to reach the maximum number of failures, and a texception is thrown.

Step 7: Initializes heartbeatthread, which continuously performs heartbeat with the server. The timeout value is tachyon. user. master. client. timeout. the attribute value configured by Ms. The default value is 10 seconds. This parameter is used to keep the created thrift connection alive. If the heartbeat times out, the cleanconnect method is called and the thrift data transmission channel is disabled, terminate this heartbeat thread.

(4) After the connect connection is successful, the worker_heartbeat method of masterservice. Client is called for heartbeat processing. The result returned is command. The worker_heartbeat process is as follows:

Step 1: Call the send_worker_heartbeat method to set the workerid, memory used, and blockid deleted by the worker.

Step 2: Create a worker_heartbeat_result object. worker_heartbeat_result is a static internal class in masterservice. Two types of fields are defined here. The value of success_field_desc is 0 and that of e_field_desc is 1.

Step 3: It is sent to the master through the thrift service and handled by the workerheartbeat method of masterinfo. If the worker node cannot find the worker information reported by heartbeat, The commandtype. Register command is returned to the worker node. If memory needs to be released, the master returns the commandtype. Free command to the worker node. Otherwise, the commandtype. Nothing command is returned to the worker node.

1.3 heartbeat Exception Handling

Two major exceptions may occur during tachyonworker heartbeat reporting: blockinfoexception and texception exception.

(1) If blockinfoexception occurs, the workerstorage. checkstatus () method is called in tachyonworker. If the heartbeat condition is still true, that is, the mstop parameter is false, the heartbeat report is continued.

(2) If texception occurs, call the workerstorage resetmasterclient method in tachyonworker to reset the masterclient object and connect to the thrift server using the connect method. It should be noted that the worker heartbeat timeout determines that the default timeout time is 10 seconds. If the timeout time is reached, a runtimeexception is thrown, and the heartbeat thread crashes directly. If no heartbeat timeout occurs, then proceed to the checkstatus of workerstorage, re-check the heartbeat condition, and enter the next heartbeat.

2 master ha

The master node will create the journal directory during initialization. If the underlying file system is HDFS, it will directly create the corresponding directory on HDFS, and you need to format it (the formatting here is actually to create an empty file to mark the format ).

The file system information of tachyon is stored by edits logs and fsimage images (respectively, image. data File and log. data File), edits log is the incremental log of the metadata information of the tachyon file system, and fsimage is a snapshot at a certain time point. When the tachyon master is started, it first reads the metadata information of the file system from the fsimage file, that is, information of various data nodes (files, directories, raw tables, checkpoints, dependencies, etc, then, read the incremental operation records from continuing edits (possibly multiple). The edits log content basically corresponds to some operations related to the tachyon File System Client, including file addition and deletion, rename and add data blocks. However, the edits log here does not include the actual file content data, but only metadata information. When the file content in the cache is lost, it is not persistent, and related lineage information is not bound, the contents of the corresponding file will be lost. After this is done, the tachyon master will first write the current metadata information into a new fsimage.

When zookeeper is used as the HA implementation mechanism of the master, the master in the standby role regularly merges editlog and creates the fsimage of standby. If the master without standby is only started, the new fsimage is generated by merging editslog.

Active master election is completed through the leaderselectorclient class. If the current master is elected as a leader, the editslog rolling is stopped and the init method of masterinfo is called to initialize relevant parameters, then, start the Web Service uiwebserver (the standby status master does not have the webui Service), initialize the master service processing object masterservicehandler, and start the thrift service.

Note that:

(1) The Journal path, the prefix of the formatted file, and the IP addresses and ports of the service and web are all set in the masterconf class;

(2) The init method of masterinfo will do the following in sequence:

Step 1: Load the editslog file to the memory

Step 2: Create a new image file

Step 3: Create a New editslog Log File

Step 4: Create a heartbeat reporter masterinfoheartbeatexecutor and start it.

Step 5: Create a file loss restorer recomputationscheduler and start

(3) The master node also has a heartbeat, but it only performs periodic system status checks.

Step 1: Get the list of expired workers blockingqueue, and delete it from the worker storage list on the master end

Step 2: Try to restore the lost files from the timeout worker list

Step 3: Restart the list of all workers that have timed out. This is important!

(4) The master creates two nodes in zookeeper, namely the election node and the face node under it is createmode. the temporary node of the ephemeral_sequential type. The leader node, under which the subnode is createmode. persistent type, used to maintain information about the currently active master node.

(5) because the current active master may change, when the worker selects the master for communication, it must first retrieve all master nodes from the leader directory in zookeeper for traversal, if there is only one master, it will be returned directly. Otherwise, the master with the largest ctime can be found as the current active node, and the worker node is responsible for communicating with it. While the worker side will retry five times during heartbeat reporting and throw an exception texception if it still fails. The tachyonworker's run method catch calls the resetmasterclient Method for resetting, the latest active master address will be obtained from zookeeper every time you connect. If you still cannot connect to the master after a period of time, stop the heartbeat and call the cleanconnect method.

-------------------------------------------------------------------------------

If you have read this blog and think it is helpful to you, click[Top]

If you want to repost this blog,Please specify the source

If you have any comments or suggestions for this article, please leave a message.

Thank you for reading this article. Please follow up on my blog


Analysis on the heartbeat of the worker in the tachyon framework and the high availability of the master

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.