Install Scala and Spark in CentOS
1. Install Scala
Scala runs on the Java Virtual Machine (JVM). Therefore, before installing Scala, you must first install Java in linux. You can go to my article http://blog.csdn.net/xqclll/article/details/54256713to continue without installing the SDK.
Download the Scala version of the corresponding operating system from the official scala website, decompress it to the installation path, and modify the file permissions so that hadoop users have permissions on the scala directory.
chown -R hadoop ./scala-2.11.8
Configure environment variables:
sudo gedit ~/.bashrc
export SCALA_HOME=/home/hadoop/hadoop/scala-2.11.8export PATH=$PATH:$SCALA_HOME/bin
Make environment variables take effect:
source ~/.bashrc
To check whether the settings are correct, enter the scala command:
scala
Result:
II. install Spark before installing Spark to install hadoop, go to the Spark official website to download Spark, select hadoop2.6: spark-2.0.2-bin-hadoop2.6 Spark can install a single machine, you can also install distributed, because hadoop clusters have been configured before, Spark is also configured in a distributed manner.
Step 1: Decompress Spark and set the permissions for hadoop users.
sudo chown -R hadoop:hadoop /home/hadoop/hadoop/spark-2.0.2-bin-hadoop2.6
2. Configure environment variables:
gedit ~/.bashrc
export SPARK_HOME=/home/hadoop/hadoop/spark-2.0.2-bin-hadoop2.6export PATH=$PATH:$SPARK_HOME/bin
source ~/.bashrc
3. modify the configuration file modify the spark-env.sh copy under the spark-env.sh.template conf file to the spark-env.sh
cp spark-env.sh.template spark-env.sh
Add the environment variables of java, Scala, hadoop, and spark to this file.
Modify slaves
cp slaves.template slaves
Upload the configured scala file to the other three slave hosts:
scp -r /home/hadoop/hadoop/scala-2.11.8 hadoop-slave1:/home/hadoop/hadoop/scp -r /home/hadoop/hadoop/scala-2.11.8 hadoop-slave2:/home/hadoop/hadoop/scp -r /home/hadoop/hadoop/scala-2.11.8 hadoop-slave3:/home/hadoop/hadoop/
Upload the configured spark file to the other three slave hosts:
scp -r /home/hadoop/hadoop/spark-2.0.2-bin-hadoop2.6 hadoop-slave1:/home/hadoop/hadoop/scp -r /home/hadoop/hadoop/spark-2.0.2-bin-hadoop2.6 hadoop-slave2:/home/hadoop/hadoop/scp -r /home/hadoop/hadoop/spark-2.0.2-bin-hadoop2.6 hadoop-slave3:/home/hadoop/hadoop/
Then configure scala and spark environment variables on several other slave hosts.
Start testing Spark and enter spark sbin. Run the following command to start spark. Before starting Spark, start hadoop.
./start-all.sh
If in addition to other processes of hadoop in the hadoop-master1 of the Master process, hadoop-slave1, hadoop-slave2, hadoop-slave3 on the emergence of the Worker process. The Spark installation and configuration are successful.
You can also go to the web page to see: hadoop-master1: 8080