Install spark
Spark must be installed on the master, slave1, and slave2 machines.
First, install spark on the master. The specific steps are as follows:
Step 1: Decompress spark on the master:
Decompress the package directly to the current directory:
In this case, create the spark directory "/usr/local/spark ":
Copy the extracted spark-1.0.0-bin-hadoop1 to/usr/local/spark:
Step 2: Configure Environment Variables
Enter the configuration file:
Add "spark_home" to the configuration file and add the spark bin directory to the path:
Save the configuration and exit, and make the configuration take effect:
Step 3: Configure spark
Go to the spark conf directory:
Add "spark_home" to the configuration file and add the spark bin directory to the path:
Copy spark-env.sh.template to spark-env.sh:
Open the spark-env.sh with vim:
Add the following configuration information to the configuration file:
Where:
Java_home: Specifies the Java installation directory;
Scala_home: Specifies the scala installation directory;
Spark_master_ip: Specifies the IP address of the master node of the spark cluster;
Spark_worker_memoery: The maximum memory size that can be allocated to the specified worker node to the excutors. Because the three servers are configured with 2 GB memory, this parameter is set to 2 GB for the sake of full memory usage;
Hadoop_conf_dir: Specifies the directory of the configuration file of our original hadoop cluster;
Save and exit.
Next, configure the slaves file under SPARK conf and add all worker nodes to it:
Content of the opened file:
We need to modify the content:
We can see that we set all three machines as worker nodes, that is, our master node is the master node and the worker node.
Save and exit.
The above is the installation of spark on the master.
Step 4: slave1 and slave2 adopt the same spark installation configuration as the master.
[Spark Asia Pacific Research Institute Series] the path to spark practice-Chapter 1 building a spark cluster (Step 3) (2)