Spark standalone cluster is a cluster mode in the master-slaves architecture. Like most master-slaves cluster clusters, there is a single point of failure (spof) in the master node. Spark provides two solutions to solve this single point of failure problem:
- Single-node recovery with local file system)
- Zookeeper-based standby Masters (standby masters with zookeeper)
Zookeeper provides a leader election mechanism, which ensures that, although the cluster has multiple masters, only one of them is active and the others are standby. When the active master fails, Another standby master will be elected. Because the cluster information, including the worker, driver, and application information, has been persisted to the file system, the submission of new jobs will only be affected during the switchover process, there is no impact on ongoing jobs. Shows the overall architecture of the cluster that joins zookeeper.
The test in this article is in spark0.9.0 standalone, which is also applicable to spark1.0.0 standalone and later versions.
1. File System-based single-point recovery
It is mainly used for development or test environments. When spark provides a directory to store registration information of spark application and worker and write their recovery status to this directory, once the master node fails, you can recover registration information for running spark applications and worker by restarting the master process (sbin/start-master.sh.
The file system-based single point of recovery is mainly set for spark_daemon_java_opts in spark-env:
System Property |
Meaning |
spark.deploy.recoveryMode |
Set to filesystem to enable single-node recovery mode (default: none). (set to filesystem, the default value is none) |
spark.deploy.recoveryDirectory |
The directory in which spark will store recovery state, accessible from the master's perspective. (spark stores the recovery state directory) |
You can use the NFS shared directory to save the spark recovery status.
1.1 Configuration
[[Email protected] spark] # vi CONF/spark-env.sh
Add Property
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/nfs/spark/recovery"
1.2 Test
1. Start the spark standalone cluster: [[email protected] spark] #./sbin/start-all.sh
2. Start a spark-shell client and do some operations, then use sbin/stop-master.sh to kill the master Process
[[Email protected] spark] # Master = spark: // bigdata001: 7077 bin/spark-shell
[[Email protected] spark] #./sbin/stop-master.sh
3. Test Result: You can see the information in bigdata003 and cannot connect to the master.
14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN AppClient$ClientActor: Could not connect to akka.tcp://[email protected]:7077: akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:7077]
4. Restart the master to restore the normal operation:
[[Email protected] spark] #./sbin/start-master.sh
2. Standby masters with zookeeper
Used in production mode. The basic principle is to use zookeeper to elect a master, and other masters are in the STANDBY state.
Connect the standalone cluster to the same zookeeper instance and start multiple masters. With the election and state saving functions provided by zookeeper, one master can be elected, while other masters are in the STANDBY state. If the current master dies, the other master will be elected and restored to the old master state, and then resume scheduling. The entire recovery process may take 1-2 minutes.
Note:
- This process will only affect the scheduling of the new application, but will not affect the application that has been run during the fault.
- Because multiple masters are involved, the submission of the application changes a little because the application needs to know the IP address and port of the current master. This HA solution is easy to handle. You only need to point to a master list in sparkcontext, such as SPARK: // host1: port1, host2: port2, host3: port3, the application polls the list.
This HA solution is easy to use. First, start a zookeeper cluster and then start the master on different nodes. Note that these nodes must have the same zookeeper configuration (zookeeper URL and directory ).
System Property |
Meaning |
spark.deploy.recoveryMode |
Set to zookeeper to enable standby master recovery mode (default: none ). |
spark.deploy.zookeeper.url |
The Zookeeper cluster URL (e.g., 192.168.1.100: 2181,192.168 .1.101: 2181 ). |
spark.deploy.zookeeper.dir |
The directory in zookeeper to store recovery state (default:/spark ). |
The master can be added or removed at any time. In case of failover, the new master will contact all previously registered applications and worker to inform the master of the change.
Note: The Master cannot be defined in the conf/spark-env.sh, but directly in the application. The parameter involved is export spark_master_ip = bigdata001, which is not configured or empty. Otherwise, multiple masters cannot be started.
2.1 Configuration
[[Email protected] spark] # vi CONF/spark-env.sh
Add Property
#ZK HAexport SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bigdata001:2181,bigdata002:2181,bigdata003:2181 -Dspark.deploy.zookeeper.dir=/spark"
2.2 Test
1. Prerequisites: The Zookeeper cluster has been started.
2. Close the cluster and restart the spark cluster:
[[email protected] spark]# ./sbin/stop-all.sh [[email protected] spark]# ./sbin/start-all.sh
3. Start the new master on another node: [[email protected] spark] #./sbin/start-master.sh
4. view the Web UI: http: // bigdata001: 8081/
5. Start a spark-shell client: [[email protected] spark] # Master = spark: // bigdata001: 7077, bigdata002: 7077 bin/spark-shell
MASTER is spark://bigdata001:7077,bigdata002:7077=-====-----------------------/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar::/src/java/target/mesos-0.19.0.jar:/src/java/target/mesos-0.19.0.jar:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar*********RUNNER=/home/zjw/jdk1.7/jdk1.7.0_51//bin/java*********CLASSPATH=/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:/home/zjw/tachyon/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar::/src/java/target/mesos-0.19.0.jar:/src/java/target/mesos-0.19.0.jar:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar*********JAVA_OPTS=-Dspark.executor.uri=hdfs://192.168.1.101:8020/user/spark/spark-0.9.2.tar.gz -Dspark.akka.frameSize=20 -Djava.library.path= -Xms512m -Xmx512m
6. Stop the active master: [[email protected] spark] #./sbin/stop-master.sh
Spark-shell output the following information: Using sbin/stop-master.sh to kill the master process of bigdata001, then saprk-shell took about 30 seconds to switch to the master on bigdata002.
14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN AppClient$ClientActor: Could not connect to akka.tcp://[email protected]:7077: akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:7077]14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN AppClient$ClientActor: Could not connect to akka.tcp://[email protected]:7077: akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:7077]14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:01 WARN AppClient$ClientActor: Could not connect to akka.tcp://[email protected]:7077: akka.remote.EndpointAssociationException: Association failed with [akka.tcp://[email protected]:7077]14/08/26 13:54:01 WARN AppClient$ClientActor: Connection to akka.tcp://[email protected]:7077 failed; waiting for master to reconnect...14/08/26 13:54:30 INFO AppClient$ClientActor: Master has changed, new master is at spark://bigdata002:7077
7. Check the UI monitor. The active master is bigdata002. The running application resource does not change.
Http: // bigdata002: 8082/
Design Concept
To solve the spof of the master in standalone mode, spark uses the election function provided by zookeeper. Spark does not use the native Java API of zookeeper, but uses curator, a framework for zookeeper encapsulation. With curator, spark does not need to manage connections with zookeeper, which is transparent to spark. Spark uses only 100 lines of code to implement the ha of the master.
Advanced source code learning: Spark Technology Insider: master high availability (HA) source code implementation based on zookeeper
References:
Http://www.cnblogs.com/hseagle/p/3673147.html
Https://spark.apache.org/docs/0.9.0/spark-standalone.html#standby-masters-with-zookeeper
Spark: two implementations of master high availability (HA) High Availability Configuration