Constructing a distributed Spark1.0.2 cluster
Download Scala 2.10.4, specifically:
Http://www.scala-lang.org/download/2.10.4.html
Scala on the Ubuntu machine will help us automatically select "Scala-2.10.4.tgz" to download;
Installing and configuring Scala
We need to install Scala separately on master, Slave1, and Slave2
Install Scala
Copy the Scala installation package to each machine
Extract
New Catalog/usr/lib/scala
Copy the extracted folder scala-2.10.4 to/usr/lib/scala
Modify configuration: Vim ~/.BASHRC
Modify configuration/etc/environment, modify Path,classpath and Java_home
Once the installation is complete on each machine, it can be verified:
Download Spark 1.0.2, specifically:
Http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz
Install and configure Spark 1.0 on master. 2 clusters
Extract the Download "spark-1.0.2-bin-hadoop2.tgz" to the "/usr/local/spark" directory:
New Catalog/usr/local/spark
Copy the installation package to/usr/local/spark and unzip
Configure "~/.BASHRC", set "Spark_home" and add SPARK's Bin directory to path (modify environment file), and use the source command to make the configuration work after configuration is complete.
Modifying the path in/etc/environment
Enter the Conf directory for Spark:
The first step is to modify the slaves file to open the file first:
We have modified the contents of the slaves file to:
Step Two: Configure spark-env.sh
First copy the spark-env.sh.template to the spark-env.sh:
Open the "spark-env.sh" file
Add the following to the end of the file
Slave1 and slave2 Use the same spark installation configuration as master.
Start the spark distributed cluster and view the information.
First step: Start the Hadoop cluster, use the JPS command in master, and use the JPS on the slave1 and Slave2
Step two: Start the spark cluster
On the basis of a successful Hadoop cluster launch, launching the spark cluster requires "start-all.sh" in the Sbin directory of Spark:
Using JPS to view cluster information
Accessing the Spark cluster http://master:8080 on a Web page
You can see the work node and its information from the page
At this point, go to the bin directory of Spark and use the Spark-shell console
At this point we entered the shell environment of Spark, according to the output of information, we can "http://master:4040" from the web point of view of the Sparkui situation, as shown in:
Of course, you can also look at some other information, such as environment:
At the same time, we can also look at executors:
At this point, our spark cluster has been built successfully.
7. Yarn-based Spark cluster setup