Tag: blog http OS file 2014 Art
Preface:
Spark has been very popular recently. This article does not talk about spark principles, but studies how to compile spark cluster construction and service scripts. We hope to understand spark clusters from the perspective of running scripts. spark is 1.0.1. spark clusters are built in standalone mode. The basic architecture is master-slave (worker mode), which is composed of a single master node and multiple slave (worker) nodes.
Script directory
Role of start-all.sh: Start the entire cluster
Stop-all.sh: shutting down the entire cluster
Start-master.sh role: Start master nodes
Stop-master function: Disable the master node
Start-slaves.sh role: Start the slave node of the entire cluster
Start-slave.sh role: Start a single-node slave
The dependency graph of the overall script is as follows:
*) Analysis script start-all.sh
# Load the Spark configuration. "$sbin/spark-config.sh"# Start Master"$sbin"/start-master.sh $TACHYON_STR# Start Workers"$sbin"/start-slaves.sh $TACHYON_STR
Comments:
#1. Load execute spark-config.sh
#2. Start the master node
#3. Start each slave (worker) node
*) First study the sbin/spark-config.sh script
export SPARK_PREFIX=`dirname "$this"`/..export SPARK_HOME=${SPARK_PREFIX}export SPARK_CONF_DIR="$SPARK_HOME/conf"
Comments:
# The role of spark-config.sh is the common environment variables spark_home, spark_conf_dir Export
*) Script start-master.sh Analysis
. "$sbin/spark-config.sh" . "$SPARK_PREFIX/bin/load-spark-env.sh""$sbin"/spark-daemon.sh start org.apache.spark.deploy.master.Master 1 --ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT
Comments:
# Source spark-config.sh, after load-spark-env.sh
# Start the master service with spark-daemon.sh script, and pass in relevant parameters, Master binding IP/port, and webui Port
*) Interpret the load-spark-env.sh script
if [ -z "$SPARK_ENV_LOADED" ]; then export SPARK_ENV_LOADED=1 # Returns the parent of the directory this script lives in. parent_dir="$(cd `dirname $0`/..; pwd)" use_conf_dir=${SPARK_CONF_DIR:-"$parent_dir/conf"} if [ -f "${use_conf_dir}/spark-env.sh" ]; then # Promote all variable declarations to environment (exported) variables set -a . "${use_conf_dir}/spark-env.sh" set +a fifi
Comments:
# An important step is to import the conf/spark-env.sh, take all user-defined variable parameters into effect, replace the default value
*) Analysis of start-slaves.sh
# Launch the slavesif [ "$SPARK_WORKER_INSTANCES" = "" ]; then exec "$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin/start-slave.sh" 1 spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORTelse if [ "$SPARK_WORKER_WEBUI_PORT" = "" ]; then SPARK_WORKER_WEBUI_PORT=8081 fi for ((i=0; i<$SPARK_WORKER_INSTANCES; i++)); do "$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin/start-slave.sh" $(( $i + 1 )) spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT --webui-port $(( $SPARK_WORKER_WEBUI_PORT + $i )) donefi
Comments:
# $ Spark_worker_instances specify the number of worker processes running on a single machine
# Specific process: each worker instance executes sbin/slaves. sh, the execution parameter for this script is to execute "sbin/start-slave.sh", and each slave (worker) node has its own web UI port specified
*) Sbin/slaves. Sh
. "$SPARK_PREFIX/bin/load-spark-env.sh"if [ "$HOSTLIST" = "" ]; then if [ "$SPARK_SLAVES" = "" ]; then export HOSTLIST="${SPARK_CONF_DIR}/slaves" else export HOSTLIST="${SPARK_SLAVES}" fifi# By default disable strict host key checkingif [ "$SPARK_SSH_OPTS" = "" ]; then SPARK_SSH_OPTS="-o StrictHostKeyChecking=no"fifor slave in `cat "$HOSTLIST"|sed "s/#.*$//;/^$/d"`; do ssh $SPARK_SSH_OPTS $slave $"${@// /\\ }" 2>&1 | sed "s/^/$slave: /" & if [ "$SPARK_SLAVE_SLEEP" != "" ]; then sleep $SPARK_SLAVE_SLEEP fidone
Comments:
# The sbin/slaves. Sh script loads the conf/slaves file (configure the slaves node). For details, see the previous article.
# Execute simultaneously for each slave Node
# Sbin/start-slave.sh $ ($ I + 1) spark: // $ spark_master_ip: $ spark_master_port \
# -- Webui-port $ ($ spark_worker_webui_port + $ I ))
*) Sbin/start-slave.sh script Parsing
"$sbin"/spark-daemon.sh start org.apache.spark.deploy.worker.Worker "[email protected]"
Comments:
# Run org. Apache. Spark. Deploy. Worker. worker with spark-daemon.sh
*) Sbin/spark-daemon.sh script analysis
Finally, the spark-daemon.sh script is implemented by using bin/spark-class, and JVM parameters are set.