Spark Technology Insider: master high availability (HA) source code implementation based on zookeeper

Last Update:2014-06-25 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If standalone is selected for spark deployment, a typical architecture using Master/slaves, the master has spof (single point of failure, single point of failure ). Spark can use zookeeper to implement ha.

Zookeeper provides a leader election mechanism, which ensures that only one master in the cluster is active, while all others are standby. When the active master fails, another standby master will be elected. Because the cluster information, including the worker, driver, and application information, has been persisted to the file system, the submission of new jobs will only be affected during the switchover process, there is no impact on ongoing jobs. Shows the overall architecture of the cluster that joins zookeeper.

1. Master restart Policy

When the master node is started, different master failure restart policies are determined based on the startup parameters:

Zookeeper implementation ha
Filesystem: enables the master to restart without data loss. The data in the running hours of the cluster is saved to the local/Network File System.
Discard all original data and restart

MASTER: prestart () shows the implementation of these three different logics.

Override def prestart () {loginfo ("Starting spark master at" + masterurl )... // persistenceengine is persistent for worker, driver, and application information, in this way, the running of the submitted job persistenceengine = recovery_mode match {Case "zookeeper" => loginfo ("persisting recovery state to zookeeper") will not be affected when the master restarts ") new zookeeperpersistenceengine (serializationextension (context. system), conf) Case "filesystem" => loginfo ("persisting recovery State to directory: "+ recovery_dir) New filesystempersistenceengine (recovery_dir, serializationextension (context. System) case _ => New blackholepersistenceengine ()} // leaderelectionagent selects leader. Leaderelectionagent = recovery_mode match {Case "zookeeper" => context. actorof (props (classof [zookeeperleaderelectionagent], self, masterurl, conf) case _ => // there is only one master cluster, then the current master is the active context. actorof (props (classof [monarchyleaderagent], self ))}}

Recovery_mode is a string that can be set from spark-env.sh.

val RECOVERY_MODE = conf.get("spark.deploy.recoveryMode", "NONE")

If spark. Deploy. recoverymode is not set, all running data of the cluster will be lost when the master restarts. This conclusion is obtained from the implementation of blackholepersistenceengine.

private[spark] class BlackHolePersistenceEngine extends PersistenceEngine {  override def addApplication(app: ApplicationInfo) {}  override def removeApplication(app: ApplicationInfo) {}  override def addWorker(worker: WorkerInfo) {}  override def removeWorker(worker: WorkerInfo) {}  override def addDriver(driver: DriverInfo) {}  override def removeDriver(driver: DriverInfo) {}  override def readPersistedData() = (Nil, Nil, Nil)}

It implements all interfaces as null. Persistenceengine is a trait. For comparison, let's take a look at the implementation of zookeeper.

Class zookeeperpersistenceengine (serialization: serialization, conf: sparkconf) extends persistenceengine with Logging {Val working_dir = Conf. get ("spark. deploy. zookeeper. dir ","/spark ") +"/master_status "Val ZK: curatorframework = sparkcuratorutil. newclient (CONF) sparkcuratorutil. mkdir (zk, working_dir) // serialize the app information to the working_dir/APP _ {app. override def addapplication (APP: applicationinfo) {serialize1_file (working_dir + "/APP _" + app. ID, APP)} override def removeapplication (APP: applicationinfo) {zk. delete (). forpath (working_dir + "/APP _" + app. ID )}

Spark uses not the zookeeper API, but org. Apache. curator. Framework. curatorframework and org. Apache. curator. Framework. recipes. Leader. {leaderlatchlistener, leaderlatch }. Curator makes a friendly encapsulation on zookeeper.

2. Configure cluster startup parameters

A brief summary of the parameter settings. Through the analysis of the above Code, we know that we should at least set the parameters to use zookeeper (in fact, we only need to set these parameters. By setting spark-env.sh:

Spark. deploy. recoverymode = zookeeperspark. deploy. zookeeper. url = zk_server_1: 2181, zk_server_2: 2181spark. deploy. zookeeper. dir =/DIR // or set export spark_daemon_java_opts = "-dspark. deploy. recoverymode = zookeeper "Export spark_daemon_java_opts =" $ {spark_daemon_java_opts}-dspark. deploy. zookeeper. url = zk_server1: 2181, zk_server_2: 2181"

Meanings of parameters:

Parameters	Default Value	Description
Spark. Deploy. recoverymode	None	Recovery mode (Master restart mode): 1, Zookeeper, 2, filesystem, 3 None
Spark. Deploy. zookeeper. url		Server address of zookeeper
Spark. Deploy. zookeeper. dir	/Spark	Zookeeper is the file directory that stores the metadata of the cluster, including worker, driver, and application.

3. Introduction to curatorframework

Curatorframework greatly simplifies the use of zookeeper. It provides high-level APIs and adds many features based on zookeeper, including

Automatic Connection Management: the connection to the zookeeper client may be interrupted. The curator handles this situation and the automatic reconnection is transparent to the client.
Simple API: simplifies the original zookeeper method and events, and provides a simple and easy-to-use interface.
Implementation of Recipe (for more information, click recipes ):

Leader Selection
Shared lock
Cache and monitoring
Distributed Queue
Distributed priority queue

Curatorframeworks uses curatorframeworkfactory to create a thread-safe zookeeper instance.

Curatorframeworkfactory. newclient () provides a simple method to create a zookeeper instance. Different parameters can be input to completely control the instance. After obtaining the instance, you must start the instance through start (). At the end, you must call close ().

/**     * Create a new client     *     *     * @param connectString list of servers to connect to     * @param sessionTimeoutMs session timeout     * @param connectionTimeoutMs connection timeout     * @param retryPolicy retry policy to use     * @return client     */    public static CuratorFramework newClient(String connectString, int sessionTimeoutMs, int connectionTimeoutMs, RetryPolicy retryPolicy)    {        return builder().            connectString(connectString).            sessionTimeoutMs(sessionTimeoutMs).            connectionTimeoutMs(connectionTimeoutMs).            retryPolicy(retryPolicy).            build();    }

There are also two recipe: org. Apache. curator. Framework. recipes. Leader. {leaderlatchlistener, leaderlatch }.

First, let's take a look at leaderlatchlistener, which is notified when the leaderlatch status changes:

When this node is selected as the leader, the interface isleader () will be called
When a node is deprived of leader, the interface notleader () will be called.

Because the notification is asynchronous, the status may be accurate when the interface is called. Check whether the hasleadership () of leaderlatch is true/false. This can be reflected in the spark implementation.

/*** LeaderLatchListener can be used to be notified asynchronously about when the state of the LeaderLatch has changed.** Note that just because you are in the middle of one of these method calls, it does not necessarily mean that* hasLeadership() is the corresponding true/false value. It is possible for the state to change behind the scenes* before these methods get called. The contract is that if that happens, you should see another call to the other* method pretty quickly.*/public interface LeaderLatchListener{  /*** This is called when the LeaderLatch‘s state goes from hasLeadership = false to hasLeadership = true.** Note that it is possible that by the time this method call happens, hasLeadership has fallen back to false. If* this occurs, you can expect {@link #notLeader()} to also be called.*/  public void isLeader();  /*** This is called when the LeaderLatch‘s state goes from hasLeadership = true to hasLeadership = false.** Note that it is possible that by the time this method call happens, hasLeadership has become true. If* this occurs, you can expect {@link #isLeader()} to also be called.*/  public void notLeader();}

Leaderlatch is responsible for selecting a leader among the many competitors connected to the zookeeper cluster. The leader selection mechanism can be seen in the specific implementation of zookeeper. leaderlatch is a good encapsulation. We only need to know that after the instance is initialized

public class LeaderLatch implements Closeable{    private final Logger log = LoggerFactory.getLogger(getClass());    private final CuratorFramework client;    private final String latchPath;    private final String id;    private final AtomicReference<State> state = new AtomicReference<State>(State.LATENT);    private final AtomicBoolean hasLeadership = new AtomicBoolean(false);    private final AtomicReference<String> ourPath = new AtomicReference<String>();    private final ListenerContainer<LeaderLatchListener> listeners = new ListenerContainer<LeaderLatchListener>();    private final CloseMode closeMode;    private final AtomicReference<Future<?>> startTask = new AtomicReference<Future<?>>();...    /**     * Attaches a listener to this LeaderLatch     * <p/>     * Attaching the same listener multiple times is a noop from the second time on.     * <p/>     * All methods for the listener are run using the provided Executor.  It is common to pass in a single-threaded     * executor so that you can be certain that listener methods are called in sequence, but if you are fine with     * them being called out of order you are welcome to use multiple threads.     *     * @param listener the listener to attach     */    public void addListener(LeaderLatchListener listener)    {        listeners.addListener(listener);    }

Addlistener can be used to add the implemented listener to leaderlatch. In listener, we implement the logic when the leader is selected or the leader role is denied in the two interfaces.

4. Implementation of zookeeperleaderelectionagent

In fact, because of the existence of curator, Spark's implementation of master ha becomes very simple. zookeeperleaderelectionagent implements the leaderlatchlistener interface. After isleader () confirms that the master to which it belongs is selected as the leader, send the electedleader message to the master, and the master will change its status to alive. When noleader () is called, it will send the revokedleadership message to the master, and the master will shut down.

Private [spark] class zookeeperleaderelectionagent (Val masteractor: actorref, masterurl: String, conf: sparkconf) extends leaderelectionagent with leaderlatchlistener with Logging {Val working_dir = Conf. get ("spark. deploy. zookeeper. dir ","/spark ") +"/leader_election "// ZK is the zookeeper instance private var ZK: curatorframework =_// leaderlatch: curator is responsible for selecting the leader created through curatorframeworkfactory. Private var leaderlatch: leaderlatch = _ private var status = leadershipstatus. not_leader override def prestart () {loginfo ("Starting zookeeper leaderelection agent") zk = sparkcuratorutil. newclient (CONF) leaderlatch = new leaderlatch (zk, working_dir) leaderlatch. addlistener (this) leaderlatch. start ()}

In prestart, leaderlatch is started to process the leader in the zk election. As analyzed in the previous section, the main logic is in isleader and noleader.

Override def isleader () {synchronized {// cocould have lost leadership by now. // now leadership may be denied .. For more information, see the implementation of curator. If (! Leaderlatch. hasleadership) {return} loginfo ("we have gained leadership") updateleadershipstatus (true)} override def notleader () {synchronized {// now may be assigned to leadership. For more information, see the implementation of curator. If (leaderlatch. hasleadership) {return} loginfo ("we have lost leadership") updateleadershipstatus (false )}}

The logic of updateleadershipstatus is simple, that is, to send messages to the master.

def updateLeadershipStatus(isLeader: Boolean) {    if (isLeader && status == LeadershipStatus.NOT_LEADER) {      status = LeadershipStatus.LEADER      masterActor ! ElectedLeader    } else if (!isLeader && status == LeadershipStatus.LEADER) {      status = LeadershipStatus.NOT_LEADER      masterActor ! RevokedLeadership    }  }

5. Design Philosophy

To solve the spof of the master in standalone mode, spark uses the election function provided by zookeeper. Spark does not use the native Java API of zookeeper, but uses curator, a framework for zookeeper encapsulation. With curator, spark does not need to manage connections with zookeeper, which is transparent to spark. Spark uses only 100 lines of code to implement the ha of the master. Of course, spark is standing on the shoulders of giants. Who will reinvent the wheel?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More