7. Yarn-based Spark cluster setup

Source: Internet
Author: User

Constructing a distributed Spark1.0.2 cluster

Download Scala 2.10.4, specifically:

Http://www.scala-lang.org/download/2.10.4.html

Scala on the Ubuntu machine will help us automatically select "Scala-2.10.4.tgz" to download;

Installing and configuring Scala

We need to install Scala separately on master, Slave1, and Slave2

Install Scala

Copy the Scala installation package to each machine

Extract

New Catalog/usr/lib/scala

Copy the extracted folder scala-2.10.4 to/usr/lib/scala

Modify configuration: Vim ~/.BASHRC

Modify configuration/etc/environment, modify Path,classpath and Java_home

Once the installation is complete on each machine, it can be verified:

Download Spark 1.0.2, specifically:

Http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz

Install and configure Spark 1.0 on master. 2 clusters

Extract the Download "spark-1.0.2-bin-hadoop2.tgz" to the "/usr/local/spark" directory:

New Catalog/usr/local/spark

Copy the installation package to/usr/local/spark and unzip

Configure "~/.BASHRC", set "Spark_home" and add SPARK's Bin directory to path (modify environment file), and use the source command to make the configuration work after configuration is complete.

Modifying the path in/etc/environment

Enter the Conf directory for Spark:

The first step is to modify the slaves file to open the file first:

We have modified the contents of the slaves file to:

Step Two: Configure spark-env.sh

First copy the spark-env.sh.template to the spark-env.sh:

Open the "spark-env.sh" file

Add the following to the end of the file

Slave1 and slave2 Use the same spark installation configuration as master.

Start the spark distributed cluster and view the information.

First step: Start the Hadoop cluster, use the JPS command in master, and use the JPS on the slave1 and Slave2

Step two: Start the spark cluster

On the basis of a successful Hadoop cluster launch, launching the spark cluster requires "start-all.sh" in the Sbin directory of Spark:

Using JPS to view cluster information

Accessing the Spark cluster http://master:8080 on a Web page

You can see the work node and its information from the page

At this point, go to the bin directory of Spark and use the Spark-shell console

At this point we entered the shell environment of Spark, according to the output of information, we can "http://master:4040" from the web point of view of the Sparkui situation, as shown in:

Of course, you can also look at some other information, such as environment:

At the same time, we can also look at executors:

At this point, our spark cluster has been built successfully.

7. Yarn-based Spark cluster setup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.