Spark Learning Note 6-spark Distributed Build (5)--ubuntu Spark distributed build

Source: Internet
Author: User
Tags ssh
0. Preface

The cluster I'm building is a master and 3 worker. First of all, follow the previous blog content configuration.
Spark distributed Build (1)--ubuntu14.04 set root to log in automatically
http://blog.csdn.net/xummgg/article/details/50630583
Spark Distributed Build (2) modify hostname and hosts under--ubuntu14.04
http://blog.csdn.net/xummgg/article/details/50634327
Spark distributed Build (3)--ubuntu SSH login without password (set SSH public key authentication)
http://blog.csdn.net/xummgg/article/details/50634730
Spark distributed Build (4)--ubuntu Hadoop pseudo-distributed construction
http://blog.csdn.net/xummgg/article/details/50641096 1. Install scala 1.1 download Scala

Download Address:
http://www.scala-lang.org/download/
I chose 2.10. Version 4 in all downloads:
1.2 Decompression

The download is saved in the downloads directory, so the directory is extracted directly to the current directory with the Tar command, where the operation is now under Master.

Copy to the/usr/local/scala directory
1.3 Configuring the Scala environment variable

Open the system's BASHRC file with the VIM command:

Add the following content, including the bin directory to the path

Make it effective with source
1.4 Verification

The input Scala version can be displayed as follows:

Scala can also be programmed directly with Scala:
2. Install Spark 2.1 Downloads Spark

Download Address:
Http://spark.apache.org/downloads.html
For learning purposes, I downloaded the pre-compiled version 1.6.
2.2 Decompression

The download is also in the downloads directory, with tar extracted to the current:

Copy to the/usr/local/spark directory:
2.3 Configuring the SPARK environment variable

The configuration is also first done with the master, with the VIM command to enter the BASHRC:

Add the following content, including the bin directory:
2.4 Single Set spark file configuration

We configure 3 files here, configure spark-env.sh
Because only temporary files spark-env.sh.template, first copy out spark-env.sh

Add the following content:
Configure Slaves
Because only temporary files slaves.template, first copy out slaves

The contents of the amendment are as follows:
Configure spark-defaults.conf
Because only temporary files spark-defaults.conf.template, first copy out spark-defaults.conf

Add the following content:

A single spark has been completed here. 3. Running the Spakr cluster

Spread the SCALA,SPARK,BASHRC to other workers.
Pass the Scala folder (take the screenshot worker1,worker2,worker3 please transfer the past by yourself):

Upload the Spark folder (take a screenshot of Worker1,worker2,worker3 please transfer the past by yourself):

Pass the BASHRC file and make it valid (take the screenshot on the Worker1,worker2,worker3 please transfer the past by yourself):

Run the start-all.sh directly under Sbin, and then use the JPS command to view it, as shown below to start successfully:

Log in to the worker node and use JPS to view:

Login master:80080 on the page to view:

Create a Historyserverforspark folder to place the history log.

This folder can be found in master:50070:

Turn on Historyserver again and use JPS to see if it's turned on successfully.

This time you can log in to master:18080 to view Historyserver

can also run in and inside, Spark-shell. See the version is good. Success.

The entire construction process is complete. Readers should follow these 5 blogs carefully.
Here is a simple test with the shell (this is the post-add content, so the picture color is different), in the Spark-shell write the following code:

You can see the tasks running in the Spark-shell in Master:4040/jobs:

You can also view historical tasks in master:18080:

Xianming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.