Analysis and Solution of the reason why the Spark cluster cannot be stopped

Last Update:2015-08-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today I want to stop the spark cluster and find that the spark-related processes cannot stop when the stop-all.sh is executed. Tip:

No org. apache. spark. deploy. master. Master to stop

No org. apache. spark. deploy. worker. Worker to stop

I checked some information online, and then looked at the stop-all.sh, stop-master.sh, stop-slaves.sh, spark-daemon.sh, spark-daemons.sh and other scripts, found that it is likely because of an environment variable $ SPARK_PID_DIR.

1. Cause Analysis

I set up a Hadoop2.6.0 + Spark1.1.0 + Yarn cluster. Spark, Hadoop, and Yarn are all stopped through some xxx. pid files. Take the spark stop-master as an example. The stop statement is as follows:

View the operation in the spark-daemon.sh:

$ SPARK_PID_DIR: pid of the process to be stopped. $ SPARK_PID_DIR is in the/tmp directory of the system by default:

The system clears the content in the/tmp directory at intervals. Go to/tmp and check that there is no pid file for the related process. This causes the stop-all.sh to fail to stop the cluster.

2. Stop the Spark Cluster

Worried that using kill to force stop spark-related processes will corrupt the cluster, so consider replying to the pid file under/tmp and then using the stop-all.sh to stop the cluster.

Analyze the spark-daemon.sh script and see the following naming rules for the pid file:

Pid = $ SPARK_PID_DIR/spark-$ SPARK_IDENT_STRING-$ command-$ instance. pid

Where

$ SPARK_PID_DIR is/tmp

$ SPARK_IDENT_STRING is the Login USER $ USER, and the username in my cluster is cdahdp

$ Command is the parameter when calling the spark-daemon.sh, there are two:

Org. apache. spark. deploy. master. Master

Org. apache. spark. deploy. worker. Worker

$ Instance is also a parameter for calling the spark-daemon.sh, which is 1 in my Cluster

Therefore, the pid file name is as follows:

// Tmp/spark-cdahdp-org.apache.spark.deploy.master.Master-1.pid

// Tmp/spark-cdahdp-org.apache.spark.deploy.worker.Worker-1.pid

View the pid of the related process through jps:

Save the pid to the corresponding pid file.

Call the spark stop-all.sh to stop the spark cluster.

3. Stop Hadoop and Yarn Clusters

When you stop hadoop and yarn clusters, this can also happen when you call the stop-all.sh. Among them, NameNode, SecondaryNameNode, DataNode, NodeManager, ResourceManager and so on are related processes of hadoop and yarn. During the stop operation, the pid cannot be stopped because it cannot be found. The analysis method is the same as spark, and the corresponding pid file name is different.

Hadoop pid naming rules:

Pid = $ HADOOP_PID_DIR/hadoop-$ HADOOP_IDENT_STRING-$ command. pid

Pid File Name:

// Tmp/hadoop-cdahdp-namenode.pid

// Tmp/hadoop-cdahdp-secondarynamenode.pid

// Tmp/hadoop-cdahdp-datanode.pid

Yarn pid naming rules:

Pid = $ YARN_PID_DIR/yarn-$ YANR_IDENT_STRING-$ command. pid

Pid File Name:

// Tmp/yarn-cdahdp-resourcemanager.pid

// Tmp/yarn-cdahdp-nodemanager.pid

Recover these pid files to stop the hadoop and yarn processes with the stop-all.sh.

4. radical solution

To solve this problem, you only need to set $ SPARK_PID_DIR, $ HADOOP_PID_DIR and $ YARN_PID_DIR on all nodes in the cluster.

Modify hadoop-env.sh, add:

Export HADOOP_PID_DIR =/home/ap/cdahdp/app/pids

Modify yarn-env.sh, add:

Export YARN_PID_DIR =/home/ap/cdahdp/app/pids

Modify spark-env.sh, add:

Export SPARK_PID_DIR =/home/ap/cdahdp/app/pids

After the cluster is started, view the/home/ap/cdahdp/app/pids directory as follows:

-------------------------------------- Split line --------------------------------------

Spark1.0.0 Deployment Guide

Install Spark0.8.0 in CentOS 6.2 (64-bit)

Introduction to Spark and its installation and use in Ubuntu

Install the Spark cluster (on CentOS)

Hadoop vs Spark Performance Comparison

Spark installation and learning

Spark Parallel Computing Model

-------------------------------------- Split line --------------------------------------

Spark details: click here
Spark: click here

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis and Solution of the reason why the Spark cluster cannot be stopped

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis and Solution of the reason why the Spark cluster cannot be stopped

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support