Run spark-1.6.0 on yarn
Run Spark-1.6.0.pdf on yarn
Directory
Catalog 1
1. Convention 1
2. Install Scala 1
2.1. Download 2
2.2. Installation 2
2.3. Setting Environment Variables 2
3. Install Spark 2
3.1. Download 2
3.2. Installation 2
3.3. Configuration 3
3.3.1. modifying conf/spark-env.sh 3
4. Start Spark 3
4.1. Run your own example 3
4.2.sparksqlcli4
5. and Hive Integration 4
6. Common Error 5
6.1. Error 1:unknownqueue:thequeue 5
6.2.spark_classpathwasdetected6
7. Related Documents 6
1. Conventions
This document Hadoop2.7.1 installed in/data/hadoop/current, while Spark1.6.0 is installed in/data/hadoop/spark, where/data/hadoop/spark is pointing to/data/hadoop/ Spark
Spark Official website is: http://spark.apache.org/(Shark official website: Http://shark.cs.berkeley.edu/,Shark has become a module of spark, no need to install separately).
Run spark in cluster mode without introducing the client mode.
2. Install Scala
Martinodersky of the Federal Polytechnic Institute in Lausanne (EPFL) began designing Scala in 2001 based on Funnel's work.
Scala is a multi-paradigm programming language designed to integrate the various features of pure object-oriented programming and functional programming. Runs on top of the Java Virtual Machine JVM, is compatible with existing Java programs, and can invoke Java class libraries. Scala contains compilers and class libraries, which are published as BSD licenses.
2.1. Download
Spark is developed in Scala and installs Scala in each section before installing Spark. Scala's official website is: http://www.scala-lang.org/, download URL: http://www.scala-lang.org/download/, this article downloads the binary installation package scala-2.11.7.tgz.
2.2. Installation
This document installs Scala in/data/scala, which is a soft link to/data/scala-2.11.7, with the root user (but also the non-root user, which is recommended to be planned beforehand).
The installation method is very simple, upload the scala-2.11.7.tgz to the/data directory, and then unzip the scala-2.11.7.tgz in the/data/directory.
Next, create a soft link: ln-s/data/scala-2.11.7/data/scala.
2.3. Setting Environment variables
Once Scala is installed, it needs to be added to the PATH environment variable, and you can directly modify the/etc/profile file to include the following:
Exportscala_home=/data/scala Exportpath= $SCALA _home/bin: $PATH |
3. Install Spark
The installation of Spark is performed by a non-root user, and this article installs it with a Hadoop user.
3.1. Download
This article downloads the binary installation package, recommended this way, otherwise the compilation will have to toss. Download URL: http://spark.apache.org/downloads.html, this article download is spark-1.6.0-bin-hadoop2.6.tgz, this can run directly on yarn.
3.2. Installation
1) Upload the spark-1.6.0-bin-hadoop2.6.tgz to the directory/data/hadoop
2) Decompression: tarxzfspark-1.6.0-bin-hadoop2.6.tgz
3) Establish a soft link: ln-sspark-1.6.0-bin-hadoop2.6spark
Running spark on yarn does not require each machine to have spark installed and can be installed on only one machine. However, spark can only be run on the machine that is installed because it is simple: you need to invoke the files of Spark.
3.3. Configuration
3.3.1. Modifying conf/spark-env.sh
You can spark-env.sh.template a copy, and then add the following:
Hadoop_conf_dir=/data/hadoop/current/etc/hadoop Yarn_conf_dir=/data/hadoop/current/etc/hadoop |
4. Start Spark
Because it runs on yarn, there is no process to start spark. Instead, the yarn is dispatched to run Spark when the command spark-submit is executed.
4.1. Run your own sample
./bin/spark-submit--classorg.apache.spark.examples.sparkpi\ --masteryarn--deploy-modecluster\ --driver-memory4g\ --executor-memory2g\ --executor-cores1\ --queuedefault\ Lib/spark-examples*.jar10 |
Run output:
16/02/0316:08:33infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:34infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:35infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:36infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:37infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:38infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:39infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING) 16/02/0316:08:40infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:FINISHED) 16/02/0316:08:40infoyarn.client: clienttoken:n/a diagnostics:n/a applicationmasterhost:10.225.168.251 applicationmasterrpcport:0 Queue:default starttime:1454486904755 finalstatus:succeeded trackingurl:http://hadoop-168-254:8088/proxy/application_1454466109748_0007/ User:hadoop 16/02/0316:08:40infoutil.shutdownhookmanager:shutdownhookcalled 16/02/0316:08:40infoutil.shutdownhookmanager:deletingdirectory/tmp/spark-7fc8538c-8f4c-4d8d-8731-64f5c54c5eac |
4.2.SparkSQLCli
By running to enter the Sparksqlcli interface, but to run with cluster on yarn, you need to specify a parameter--master value of yarn (note that the value of--deploy-mode does not support parameter cluster, That is, it can only run in client mode on yarn):
./bin/spark-sql--masteryarn |
Why can sparksqlcli only run in client mode? Actually very good understanding, since is the interaction, needs to see the output, this time cluster mode cannot do. Because of the cluster mode, the Applicationmaster is run on which machine, which is determined by yarn dynamics.
5. Integration with Hive
Spark Integrated Hive is very simple and takes the following steps:
1) Add Hive_home to the spark-env.sh, such as: exporthive_home=/data/hadoop/hive
2) Copy the Hive-site.xml and hive-log4j.properties two files from hive to the Conf directory of Spark.
Once done, execute spark-sql again into Spark's SQLCLI and run the command showtables to see the table created in hive.
Example:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar
6. Common errors
6.1. Error 1:unknownqueue:thequeue
Run:
./bin/ spark-submit--classorg.apache.spark.examples.sparkpi--masteryarn--deploy-modecluster--driver-memory4g--executor-memory2g- -executor-cores1--queuethequeuelib/spark-examples*.jar10
The Times the following error, only need to change "--queuethequeue" to "--queuedefault" can be.
16/02/0315:57:36infoyarn.client:applicationreportforapplication_1454466109748_0004 (State:FAILED) 16/02/0315:57:36infoyarn.client: clienttoken:n/a Diagnostics:Applicationapplication_1454466109748_0004submittedbyuserhadooptounknownqueue:thequeue applicationmasterhost:n/a Applicationmasterrpcport:-1 Queue:thequeue starttime:1454486255907 Finalstatus:failed trackingurl:http://hadoop-168-254:8088/proxy/application_1454466109748_0004/ User:hadoop 16/02/0315:57:36infoyarn.client:deletingstagingdirectory.sparkstaging/application_1454466109748_0004 Exceptioninthread "Main" org.apache.spark.SparkException:Applicationapplication_1454466109748_ 0004finishedwithfailedstatus Atorg.apache.spark.deploy.yarn.Client.run (client.scala:1029) Atorg.apache.spark.deploy.yarn.client$.main (client.scala:1076) Atorg.apache.spark.deploy.yarn.Client.main (Client.scala) ATSUN.REFLECT.NATIVEMETHODACCESSORIMPL.INVOKE0 (Nativemethod) Atsun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:57) Atsun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:43) Atjava.lang.reflect.Method.invoke (method.java:606) atorg.apache.spark.deploy.sparksubmit$.org$apache$spark$deploy$sparksubmit$ $runMain (SparkSubmit.scala:731) Atorg.apache.spark.deploy.sparksubmit$.dorunmain$1 (sparksubmit.scala:181) Atorg.apache.spark.deploy.sparksubmit$.submit (sparksubmit.scala:206) Atorg.apache.spark.deploy.sparksubmit$.main (sparksubmit.scala:121) Atorg.apache.spark.deploy.SparkSubmit.main (Sparksubmit.scala) 16/02/0315:57:36infoutil.shutdownhookmanager:shutdownhookcalled 16/02/0315:57:36infoutil.shutdownhookmanager:deletingdirectory/tmp/spark-54531ae3-4d02-41be-8b9e-92f4b0f05807 |
6.2.spark_classpathwasdetected
spark_classpathwasdetected (Setto '/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar: ').
thisisdeprecatedinspark1.0+.
Pleaseinsteaduse:
-./spark-submitwith--driver-class-pathtoaugmentthedriverclasspath
-spark.executor.extraclasspathtoaugmenttheexecutorclasspath
It is not recommended to set the environment variable Spark_classpath in spark-env.sh, which can be changed to the following recommended way:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar
7. Related Documents
"HBase-0.98.0 Distributed Installation Guide"
"Hive0.12.0 Installation Guide"
"ZooKeeper-3.4.6 Distributed Installation Guide"
"HADOOP2.3.0 Source Reverse engineering"
"Compiling Hadoop-2.4.0 on Linux"
"Accumulo-1.5.1 Installation Guide"
"Drill1.0.0 Installation Guide"
"Shark0.9.1 Installation Guide"
For more information, please follow the technical blog: http://aquester.cublog.cn.
http://www.bkjia.com/PHPjc/1103191.html www.bkjia.com true http://www.bkjia.com/PHPjc/1103191.html techarticle yarn runs on spark-1.6.0 yarn on Run spark-1.6.0.pdf directory directory 1 1. Convention 1 2. Install Scala 1 2.1. Download 2 2.2. Install 2 2.3. Set the environment variable 2 3. Install spark 2 3.1. Download 2 3.2 . Install ...