1. How spark submits the task
1), Spark on yarn:
$./bin/spark-submit--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
Lib/spark-examples*.jar \
10
2), spark on yarn when submitting a task: in yarn-cluster cluster mode, the driver runs on different machines than the client, so Sparkcontext.addjar will not solve the client's local file box. In the Sparkcontext.addjar client file, include their--jars option in the Start command.
$./bin/spark-submit--class my.main.Class \
--master yarn-cluster \
--jarsmy-other-jar.jar,my-other-other-jar.jar
My-main-jar.jar
App_arg1 APP_ARG2
Test the PI program that comes with Spark,
./bin/spark-submit--class org.apache.spark.examples.SparkPi \
--master Yarn-cluster\
--num-executors 1 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
Lib/spark-examples*.jar\
3), Spark-submit:
Spark-submit Test pi:
The Spark-submit script in the bin subdirectory of Spark is a tool for submitting programs to run in a cluster, and we use this tool to do a calculation about pi. The command is as follows:
./bin/spark-submit--master spark://spark113:7077 \
--class org.apache.spark.examples.SparkPi \--name spark-pi--executor-memory 400M \--driver-memory 512M \
/home/hadoop/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop2.0.0-cdh4.5.0.jar
Spark-submit Test:
/home/hadoop/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit\
--CLASSORG.APACHE.SPARK.EXAMPLES.SPARKPI \
--masterspark://192.168.6.71:7077 \
--executor-memory100m \
--executor-cores 1 \
1000
4), start Spark-shell in cluster mode:
./spark-shell--master spark://hadoop1:7077--executor-memory 500m
2, Spark start mode:
1), local mode start Spark:./spark-shell--master local[2] Note: Multiple threads can be specified
2), cluster mode start spark:
[Hadoop@hadoop1 spark-1.3.0-bin-hadoop2.4]$./bin/spark-shell--masterspark://hadoop1:7077--executor-memory500m Note: This startup mode specifies that the executor memory on each machine that is Spark-shell run is 500m
Spark-shell--masteryarn-client--driver-memory 10g--num-executors--executor-memory 20g--executor-cores 3--queue Spark
3), start Spark:bin/pyspark--master local[3 in the Python interpreter]
4), start Spark:bin/sparkr--master local[2] in the interpreter of the R language
5), yarn of the way to start Spark:yarn cluster boot spark:$./bin/spark-shell--master Yarn-cluster
Yarn client starts spark:$./bin/spark-shell--masteryarn-client
Spark-sql--masteryarn-client--driver-memory 10g--num-executors--executor-memory 20g--executor-cores 3--queue Spark
Spark-sql--masterspark://master:7077--driver-memory 10g--executor-memory 20g--driver-cores 3