Step 1: software required by the spark cluster;
Build a spark cluster on the basis of the hadoop cluster built from scratch in Articles 1 and 2. We will use the spark 1.0.0 version released in May 30, 2014, that is, the latest version of spark, to build a spark Cluster Based on spark 1.0.0, the required software is as follows:
1. Spark 1.0.0, I use the spark-1.0.0-bin-hadoop1.tgz here, the specific is the http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-bin-hadoop1.tgz
As shown in:
The author stores the data on the master node, as shown in figure:
2. Download the scala version corresponding to spark 1.0.0. The official requirement is that Scala must be Scala 2.10.x:
The author downloaded "Scala 2.10.4", the specific official download for the http://www.scala-lang.org/download/2.10.4.html after saving on the master node:
Step 2: Install each software
Install Scala
Open the terminal and create a new directory "/usr/lib/Scala", as shown in:
2. decompress the scala file, as shown in:
Move the extracted Scala to the created "/usr/lib/Scala", as shown in
3. Modify environment variables:
Go to the configuration file as shown in:
Press "I" to enter the insert mode and add the scala environment compiling information, as shown in:
From the configuration file, we can see that we have set "scala_home" and set the scala bin directory to path.
Press the "ESC" key to return to normal mode, save and exit the configuration file:
Run the following command to modify the configuration file:
4. display the installed Scala version on the terminal, as shown in
We found that the version is "2.10.4", which is what we expect.
When we enter the "Scala" command, we can directly enter the scala command line interactive interface:
In this case, enter the expression "9*9:
At this point, we found that Scala correctly helped us calculate the results.
Now we have installed Scala on the master;
Because spark is running on the master, slave1, and slave2 machines, we need to install the same Scala on slave1 and slave2. Use the SCP command to install the scala directory and /. Bashrc "is copied to the same directory of slave1 and slave2. Of course, you can also manually install slave1 and slave2 on the master node.
After Scala is installed on slave1, the test results are as follows:
After Scala is installed on slave2, the test results are as follows:
So far, scala has been successfully deployed on the master, slave1, and slave2 machines.
[Spark Asia Pacific Research Institute Series] the path to spark practice-Chapter 1 building a spark cluster (Step 3) (1)