If standalone is selected for spark deployment, a typical architecture using Master/slaves, the master has spof (single point of failure, single point of failure ). Spark can use zookeeper to implement ha.
Zookeeper provides a leader election mechanism, which ensures that only one master in the cluster is active, while all others are standby. When the active master fails, another standby master will be elected. Because the cluster information, including the worker, driver, and application information, has been persisted to the file system, the submission of new jobs will only be affected during the switchover process, there is no impact on ongoing jobs. Shows the overall architecture of the cluster that joins zookeeper.
1. Master restart Policy
When the master node is started, different master failure restart policies are determined based on the startup parameters:
- Zookeeper implementation ha
- Filesystem: enables the master to restart without data loss. The data in the running hours of the cluster is saved to the local/Network File System.
- Discard all original data and restart
MASTER: prestart () shows the implementation of these three different logics.
Override def prestart () {loginfo ("Starting spark master at" + masterurl )... // persistenceengine is persistent for worker, driver, and application information, in this way, the running of the submitted job persistenceengine = recovery_mode match {Case "zookeeper" => loginfo ("persisting recovery state to zookeeper") will not be affected when the master restarts ") new zookeeperpersistenceengine (serializationextension (context. system), conf) Case "filesystem" => loginfo ("persisting recovery State to directory: "+ recovery_dir) New filesystempersistenceengine (recovery_dir, serializationextension (context. System) case _ => New blackholepersistenceengine ()} // leaderelectionagent selects leader. Leaderelectionagent = recovery_mode match {Case "zookeeper" => context. actorof (props (classof [zookeeperleaderelectionagent], self, masterurl, conf) case _ => // there is only one master cluster, then the current master is the active context. actorof (props (classof [monarchyleaderagent], self ))}}
Recovery_mode is a string that can be set from spark-env.sh.
val RECOVERY_MODE = conf.get("spark.deploy.recoveryMode", "NONE")
If spark. Deploy. recoverymode is not set, all running data of the cluster will be lost when the master restarts. This conclusion is obtained from the implementation of blackholepersistenceengine.
private[spark] class BlackHolePersistenceEngine extends PersistenceEngine { override def addApplication(app: ApplicationInfo) {} override def removeApplication(app: ApplicationInfo) {} override def addWorker(worker: WorkerInfo) {} override def removeWorker(worker: WorkerInfo) {} override def addDriver(driver: DriverInfo) {} override def removeDriver(driver: DriverInfo) {} override def readPersistedData() = (Nil, Nil, Nil)}
It implements all interfaces as null. Persistenceengine is a trait. For comparison, let's take a look at the implementation of zookeeper.
Class zookeeperpersistenceengine (serialization: serialization, conf: sparkconf) extends persistenceengine with Logging {Val working_dir = Conf. get ("spark. deploy. zookeeper. dir ","/spark ") +"/master_status "Val ZK: curatorframework = sparkcuratorutil. newclient (CONF) sparkcuratorutil. mkdir (zk, working_dir) // serialize the app information to the working_dir/APP _ {app. override def addapplication (APP: applicationinfo) {serialize1_file (working_dir + "/APP _" + app. ID, APP)} override def removeapplication (APP: applicationinfo) {zk. delete (). forpath (working_dir + "/APP _" + app. ID )}
Spark uses not the zookeeper API, but org. Apache. curator. Framework. curatorframework and org. Apache. curator. Framework. recipes. Leader. {leaderlatchlistener, leaderlatch }. Curator makes a friendly encapsulation on zookeeper.
2. Configure cluster startup parameters
A brief summary of the parameter settings. Through the analysis of the above Code, we know that we should at least set the parameters to use zookeeper (in fact, we only need to set these parameters. By setting spark-env.sh:
Spark. deploy. recoverymode = zookeeperspark. deploy. zookeeper. url = zk_server_1: 2181, zk_server_2: 2181spark. deploy. zookeeper. dir =/DIR // or set export spark_daemon_java_opts = "-dspark. deploy. recoverymode = zookeeper "Export spark_daemon_java_opts =" $ {spark_daemon_java_opts}-dspark. deploy. zookeeper. url = zk_server1: 2181, zk_server_2: 2181"
Meanings of parameters:
Parameters
|
Default Value
|
Description
|
Spark. Deploy. recoverymode
|
None
|
Recovery mode (Master restart mode): 1, Zookeeper, 2, filesystem, 3 None
|
Spark. Deploy. zookeeper. url
|
|
Server address of zookeeper
|
Spark. Deploy. zookeeper. dir
|
/Spark
|
Zookeeper is the file directory that stores the metadata of the cluster, including worker, driver, and application.
|
3. Introduction to curatorframework
Curatorframework greatly simplifies the use of zookeeper. It provides high-level APIs and adds many features based on zookeeper, including
- Automatic Connection Management: the connection to the zookeeper client may be interrupted. The curator handles this situation and the automatic reconnection is transparent to the client.
- Simple API: simplifies the original zookeeper method and events, and provides a simple and easy-to-use interface.
- Implementation of Recipe (for more information, click recipes ):
- Leader Selection
- Shared lock
- Cache and monitoring
- Distributed Queue
- Distributed priority queue
Curatorframeworks uses curatorframeworkfactory to create a thread-safe zookeeper instance.
Curatorframeworkfactory. newclient () provides a simple method to create a zookeeper instance. Different parameters can be input to completely control the instance. After obtaining the instance, you must start the instance through start (). At the end, you must call close ().
/** * Create a new client * * * @param connectString list of servers to connect to * @param sessionTimeoutMs session timeout * @param connectionTimeoutMs connection timeout * @param retryPolicy retry policy to use * @return client */ public static CuratorFramework newClient(String connectString, int sessionTimeoutMs, int connectionTimeoutMs, RetryPolicy retryPolicy) { return builder(). connectString(connectString). sessionTimeoutMs(sessionTimeoutMs). connectionTimeoutMs(connectionTimeoutMs). retryPolicy(retryPolicy). build(); }
There are also two recipe: org. Apache. curator. Framework. recipes. Leader. {leaderlatchlistener, leaderlatch }.
First, let's take a look at leaderlatchlistener, which is notified when the leaderlatch status changes:
- When this node is selected as the leader, the interface isleader () will be called
- When a node is deprived of leader, the interface notleader () will be called.
Because the notification is asynchronous, the status may be accurate when the interface is called. Check whether the hasleadership () of leaderlatch is true/false. This can be reflected in the spark implementation.
/*** LeaderLatchListener can be used to be notified asynchronously about when the state of the LeaderLatch has changed.** Note that just because you are in the middle of one of these method calls, it does not necessarily mean that* hasLeadership() is the corresponding true/false value. It is possible for the state to change behind the scenes* before these methods get called. The contract is that if that happens, you should see another call to the other* method pretty quickly.*/public interface LeaderLatchListener{ /*** This is called when the LeaderLatch‘s state goes from hasLeadership = false to hasLeadership = true.** Note that it is possible that by the time this method call happens, hasLeadership has fallen back to false. If* this occurs, you can expect {@link #notLeader()} to also be called.*/ public void isLeader(); /*** This is called when the LeaderLatch‘s state goes from hasLeadership = true to hasLeadership = false.** Note that it is possible that by the time this method call happens, hasLeadership has become true. If* this occurs, you can expect {@link #isLeader()} to also be called.*/ public void notLeader();}
Leaderlatch is responsible for selecting a leader among the many competitors connected to the zookeeper cluster. The leader selection mechanism can be seen in the specific implementation of zookeeper. leaderlatch is a good encapsulation. We only need to know that after the instance is initialized
public class LeaderLatch implements Closeable{ private final Logger log = LoggerFactory.getLogger(getClass()); private final CuratorFramework client; private final String latchPath; private final String id; private final AtomicReference<State> state = new AtomicReference<State>(State.LATENT); private final AtomicBoolean hasLeadership = new AtomicBoolean(false); private final AtomicReference<String> ourPath = new AtomicReference<String>(); private final ListenerContainer<LeaderLatchListener> listeners = new ListenerContainer<LeaderLatchListener>(); private final CloseMode closeMode; private final AtomicReference<Future<?>> startTask = new AtomicReference<Future<?>>();... /** * Attaches a listener to this LeaderLatch * <p/> * Attaching the same listener multiple times is a noop from the second time on. * <p/> * All methods for the listener are run using the provided Executor. It is common to pass in a single-threaded * executor so that you can be certain that listener methods are called in sequence, but if you are fine with * them being called out of order you are welcome to use multiple threads. * * @param listener the listener to attach */ public void addListener(LeaderLatchListener listener) { listeners.addListener(listener); }
Addlistener can be used to add the implemented listener to leaderlatch. In listener, we implement the logic when the leader is selected or the leader role is denied in the two interfaces.
4. Implementation of zookeeperleaderelectionagent
In fact, because of the existence of curator, Spark's implementation of master ha becomes very simple. zookeeperleaderelectionagent implements the leaderlatchlistener interface. After isleader () confirms that the master to which it belongs is selected as the leader, send the electedleader message to the master, and the master will change its status to alive. When noleader () is called, it will send the revokedleadership message to the master, and the master will shut down.
Private [spark] class zookeeperleaderelectionagent (Val masteractor: actorref, masterurl: String, conf: sparkconf) extends leaderelectionagent with leaderlatchlistener with Logging {Val working_dir = Conf. get ("spark. deploy. zookeeper. dir ","/spark ") +"/leader_election "// ZK is the zookeeper instance private var ZK: curatorframework =_// leaderlatch: curator is responsible for selecting the leader created through curatorframeworkfactory. Private var leaderlatch: leaderlatch = _ private var status = leadershipstatus. not_leader override def prestart () {loginfo ("Starting zookeeper leaderelection agent") zk = sparkcuratorutil. newclient (CONF) leaderlatch = new leaderlatch (zk, working_dir) leaderlatch. addlistener (this) leaderlatch. start ()}
In prestart, leaderlatch is started to process the leader in the zk election. As analyzed in the previous section, the main logic is in isleader and noleader.
Override def isleader () {synchronized {// cocould have lost leadership by now. // now leadership may be denied .. For more information, see the implementation of curator. If (! Leaderlatch. hasleadership) {return} loginfo ("we have gained leadership") updateleadershipstatus (true)} override def notleader () {synchronized {// now may be assigned to leadership. For more information, see the implementation of curator. If (leaderlatch. hasleadership) {return} loginfo ("we have lost leadership") updateleadershipstatus (false )}}
The logic of updateleadershipstatus is simple, that is, to send messages to the master.
def updateLeadershipStatus(isLeader: Boolean) { if (isLeader && status == LeadershipStatus.NOT_LEADER) { status = LeadershipStatus.LEADER masterActor ! ElectedLeader } else if (!isLeader && status == LeadershipStatus.LEADER) { status = LeadershipStatus.NOT_LEADER masterActor ! RevokedLeadership } }
5. Design Philosophy
To solve the spof of the master in standalone mode, spark uses the election function provided by zookeeper. Spark does not use the native Java API of zookeeper, but uses curator, a framework for zookeeper encapsulation. With curator, spark does not need to manage connections with zookeeper, which is transparent to spark. Spark uses only 100 lines of code to implement the ha of the master. Of course, spark is standing on the shoulders of giants. Who will reinvent the wheel?