Run spark-1.6.0_php tutorial on yarn

Source: Internet
Author: User

Run spark-1.6.0 on yarn


Run Spark-1.6.0.pdf on yarn

Directory

Catalog 1

1. Convention 1

2. Install Scala 1

2.1. Download 2

2.2. Installation 2

2.3. Setting Environment Variables 2

3. Install Spark 2

3.1. Download 2

3.2. Installation 2

3.3. Configuration 3

3.3.1. modifying conf/spark-env.sh 3

4. Start Spark 3

4.1. Run your own example 3

4.2.sparksqlcli4

5. and Hive Integration 4

6. Common Error 5

6.1. Error 1:unknownqueue:thequeue 5

6.2.spark_classpathwasdetected6

7. Related Documents 6

1. Conventions

This document Hadoop2.7.1 installed in/data/hadoop/current, while Spark1.6.0 is installed in/data/hadoop/spark, where/data/hadoop/spark is pointing to/data/hadoop/ Spark

Spark Official website is: http://spark.apache.org/(Shark official website: Http://shark.cs.berkeley.edu/,Shark has become a module of spark, no need to install separately).

Run spark in cluster mode without introducing the client mode.

2. Install Scala

Martinodersky of the Federal Polytechnic Institute in Lausanne (EPFL) began designing Scala in 2001 based on Funnel's work.

Scala is a multi-paradigm programming language designed to integrate the various features of pure object-oriented programming and functional programming. Runs on top of the Java Virtual Machine JVM, is compatible with existing Java programs, and can invoke Java class libraries. Scala contains compilers and class libraries, which are published as BSD licenses.

2.1. Download

Spark is developed in Scala and installs Scala in each section before installing Spark. Scala's official website is: http://www.scala-lang.org/, download URL: http://www.scala-lang.org/download/, this article downloads the binary installation package scala-2.11.7.tgz.

2.2. Installation

This document installs Scala in/data/scala, which is a soft link to/data/scala-2.11.7, with the root user (but also the non-root user, which is recommended to be planned beforehand).

The installation method is very simple, upload the scala-2.11.7.tgz to the/data directory, and then unzip the scala-2.11.7.tgz in the/data/directory.

Next, create a soft link: ln-s/data/scala-2.11.7/data/scala.

2.3. Setting Environment variables

Once Scala is installed, it needs to be added to the PATH environment variable, and you can directly modify the/etc/profile file to include the following:

Exportscala_home=/data/scala

Exportpath= $SCALA _home/bin: $PATH

3. Install Spark

The installation of Spark is performed by a non-root user, and this article installs it with a Hadoop user.

3.1. Download

This article downloads the binary installation package, recommended this way, otherwise the compilation will have to toss. Download URL: http://spark.apache.org/downloads.html, this article download is spark-1.6.0-bin-hadoop2.6.tgz, this can run directly on yarn.

3.2. Installation

1) Upload the spark-1.6.0-bin-hadoop2.6.tgz to the directory/data/hadoop

2) Decompression: tarxzfspark-1.6.0-bin-hadoop2.6.tgz

3) Establish a soft link: ln-sspark-1.6.0-bin-hadoop2.6spark

Running spark on yarn does not require each machine to have spark installed and can be installed on only one machine. However, spark can only be run on the machine that is installed because it is simple: you need to invoke the files of Spark.

3.3. Configuration

3.3.1. Modifying conf/spark-env.sh

You can spark-env.sh.template a copy, and then add the following:

Hadoop_conf_dir=/data/hadoop/current/etc/hadoop

Yarn_conf_dir=/data/hadoop/current/etc/hadoop

4. Start Spark

Because it runs on yarn, there is no process to start spark. Instead, the yarn is dispatched to run Spark when the command spark-submit is executed.

4.1. Run your own sample

./bin/spark-submit--classorg.apache.spark.examples.sparkpi\

--masteryarn--deploy-modecluster\

--driver-memory4g\

--executor-memory2g\

--executor-cores1\

--queuedefault\

Lib/spark-examples*.jar10

Run output:

16/02/0316:08:33infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:34infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:35infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:36infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:37infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:38infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:39infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:RUNNING)

16/02/0316:08:40infoyarn.client:applicationreportforapplication_1454466109748_0007 (State:FINISHED)

16/02/0316:08:40infoyarn.client:

clienttoken:n/a

diagnostics:n/a

applicationmasterhost:10.225.168.251

applicationmasterrpcport:0

Queue:default

starttime:1454486904755

finalstatus:succeeded

trackingurl:http://hadoop-168-254:8088/proxy/application_1454466109748_0007/

User:hadoop

16/02/0316:08:40infoutil.shutdownhookmanager:shutdownhookcalled

16/02/0316:08:40infoutil.shutdownhookmanager:deletingdirectory/tmp/spark-7fc8538c-8f4c-4d8d-8731-64f5c54c5eac

4.2.SparkSQLCli

By running to enter the Sparksqlcli interface, but to run with cluster on yarn, you need to specify a parameter--master value of yarn (note that the value of--deploy-mode does not support parameter cluster, That is, it can only run in client mode on yarn):

./bin/spark-sql--masteryarn

Why can sparksqlcli only run in client mode? Actually very good understanding, since is the interaction, needs to see the output, this time cluster mode cannot do. Because of the cluster mode, the Applicationmaster is run on which machine, which is determined by yarn dynamics.

5. Integration with Hive

Spark Integrated Hive is very simple and takes the following steps:

1) Add Hive_home to the spark-env.sh, such as: exporthive_home=/data/hadoop/hive

2) Copy the Hive-site.xml and hive-log4j.properties two files from hive to the Conf directory of Spark.

Once done, execute spark-sql again into Spark's SQLCLI and run the command showtables to see the table created in hive.

Example:

./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar

6. Common errors

6.1. Error 1:unknownqueue:thequeue

Run:

./bin/ spark-submit--classorg.apache.spark.examples.sparkpi--masteryarn--deploy-modecluster--driver-memory4g--executor-memory2g- -executor-cores1--queuethequeuelib/spark-examples*.jar10

The Times the following error, only need to change "--queuethequeue" to "--queuedefault" can be.

16/02/0315:57:36infoyarn.client:applicationreportforapplication_1454466109748_0004 (State:FAILED)

16/02/0315:57:36infoyarn.client:

clienttoken:n/a

Diagnostics:Applicationapplication_1454466109748_0004submittedbyuserhadooptounknownqueue:thequeue

applicationmasterhost:n/a

Applicationmasterrpcport:-1

Queue:thequeue

starttime:1454486255907

Finalstatus:failed

trackingurl:http://hadoop-168-254:8088/proxy/application_1454466109748_0004/

User:hadoop

16/02/0315:57:36infoyarn.client:deletingstagingdirectory.sparkstaging/application_1454466109748_0004

Exceptioninthread "Main" org.apache.spark.SparkException:Applicationapplication_1454466109748_ 0004finishedwithfailedstatus

Atorg.apache.spark.deploy.yarn.Client.run (client.scala:1029)

Atorg.apache.spark.deploy.yarn.client$.main (client.scala:1076)

Atorg.apache.spark.deploy.yarn.Client.main (Client.scala)

ATSUN.REFLECT.NATIVEMETHODACCESSORIMPL.INVOKE0 (Nativemethod)

Atsun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:57)

Atsun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:43)

Atjava.lang.reflect.Method.invoke (method.java:606)

atorg.apache.spark.deploy.sparksubmit$.org$apache$spark$deploy$sparksubmit$ $runMain (SparkSubmit.scala:731)

Atorg.apache.spark.deploy.sparksubmit$.dorunmain$1 (sparksubmit.scala:181)

Atorg.apache.spark.deploy.sparksubmit$.submit (sparksubmit.scala:206)

Atorg.apache.spark.deploy.sparksubmit$.main (sparksubmit.scala:121)

Atorg.apache.spark.deploy.SparkSubmit.main (Sparksubmit.scala)

16/02/0315:57:36infoutil.shutdownhookmanager:shutdownhookcalled

16/02/0315:57:36infoutil.shutdownhookmanager:deletingdirectory/tmp/spark-54531ae3-4d02-41be-8b9e-92f4b0f05807

6.2.spark_classpathwasdetected

spark_classpathwasdetected (Setto '/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar: ').

thisisdeprecatedinspark1.0+.

Pleaseinsteaduse:

-./spark-submitwith--driver-class-pathtoaugmentthedriverclasspath

-spark.executor.extraclasspathtoaugmenttheexecutorclasspath

It is not recommended to set the environment variable Spark_classpath in spark-env.sh, which can be changed to the following recommended way:

./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar

7. Related Documents

"HBase-0.98.0 Distributed Installation Guide"

"Hive0.12.0 Installation Guide"

"ZooKeeper-3.4.6 Distributed Installation Guide"

"HADOOP2.3.0 Source Reverse engineering"

"Compiling Hadoop-2.4.0 on Linux"

"Accumulo-1.5.1 Installation Guide"

"Drill1.0.0 Installation Guide"

"Shark0.9.1 Installation Guide"

For more information, please follow the technical blog: http://aquester.cublog.cn.


http://www.bkjia.com/PHPjc/1103191.html www.bkjia.com true http://www.bkjia.com/PHPjc/1103191.html techarticle yarn runs on spark-1.6.0 yarn on Run spark-1.6.0.pdf directory directory 1 1. Convention 1 2. Install Scala 1 2.1. Download 2 2.2. Install 2 2.3. Set the environment variable 2 3. Install spark 2 3.1. Download 2 3.2 . Install ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.