Installation Environment: Virtual machine: Vmware®workstation 8.0.1 (network bridging) Os:centos Version 7JDK: Jdk-7u79-linux-x64.tarscala version: Scala-2.11.7spark version: spark-1.4.0-bin-hadoop2.4 User: Hadoop created when installing CentOS, belongs to the Administrators group
First step: Configure SSH
Use the Hadoop login system to run at the terminal:
Yum Install Openssh-server
If prompted:
This is because the Yum service is occupied and requires a forced unlock:
Rm-rf/var/run/yum.pid
The terminal will be networked to download the installation package and install it on its own. After the installation is complete, enter the following command to verify that Port 22 is open:
Netstat-nat
Make sure that Port 22 is turned on, and then check that SSH is installed correctly, enter
SSH localhost
Enter the current user name and password press ENTER to confirm that the installation is successful, and SSH login requires a password.
Here's a little bit of emphasis:
SSH configuration is actually configured without password access, using identity credentials instead of password authentication, access only need to provide an identity credential, do not need to enter a password. That is to say, each user has a unique credential, to whom to access, the credentials to whom (that is, copied to someone else's directory)
Next, enter the command in the terminal to enter the Hadoop account directory:
Cd/home/hadoop
Re-enter:
SSH-KEYGEN-T RSA
and then return.
Then we go into the. ssh folder again and append the id_rsa.pub to the Authorized_keys file with the following command:
CD. SSH
By the way, see what files are in this directory. Id_rsa is the private key belonging to the account, Id_rsa.pub is the public key of the account, which is to be handed out.
Need to say more here, if a master server has more than one account to configure password-free access to do?
The main server below should have a file called Authorized_keys, who need to configure password-free access, you can append your public key in this file.
CP Id_rsa.pub Authorized_keys
Test password-Free login again
SSH localhost
It is best to open several more terminals, test SSH logins, or restart the service test:
Service sshd Restart Restart services
Service sshd Start Services
Service sshd Stop Services
NETSTAT-ANTP | grep sshd to see if Port 22 is started
Any time you encounter a permission refusal to add sudo to the command, the following is rejected:
Chkconfig sshd on set boot up
Chkconfig sshd off disables SSH boot
Step Two: Configure Java, Scala, SAPRK
We will copy all the required software to the/home/data directory at once. Need to log off using root login, other user permissions are insufficient.
The root user can work directly under the UI and unzip well.
Then switch back to the Hadoop user (* important)
To configure the Java path:
sudo gedit/etc/profile
In the last line enter add:
#JAVA VARIABLES START
Export java_home=/home/data/jdk1.7.0_79
Export path= $PATH: $JAVA _home/bin
#JAVA VARIABLES END
Then refresh the system configuration to check for Java installation:
Source/etc/profile
Java-version
See this note that the Java environment was installed successfully.
Next, configure the Scala environment:
sudo gedit/etc/profile
In the last line add:
#SCALA VARIABLES START
Export scala_home=/home/data/scala-2.11.7
Export path= $PATH: $SCALA _home/bin
#SCALA VARIABLES END
Then refresh the system configuration to check for Java installation:
Source/etc/profile
Scala-version
See description success:
Next, configure the spark environment:
sudo gedit/etc/profile
In the last line add:
#SPARK VARIABLES START
Export spark_home=/home/data/spark-1.4.0-bin-hadoop2.4
Export path= $PATH: $SPARK _home/bin
#SPARK VARIABLES END
Profile files that are configured to complete should be as follows:
Then refresh the system configuration:
Source/etc/profile
Enter the Conf directory for Spark:
Backup files:
sudo mv Spark-env.sh.template spark-env.sh
Then edit the newly created file:
sudo gedit spark-env.sh
At the bottom of the file, add:
Export scala_home=/home/data/scala-2.11.7
Export java_home=/home/data/jdk1.7.0_79
Export Spark_master_ip=localhost
Export spark_worker_memory=1024m
Export Master=spark://localhost 7070
Finally, you should edit the machine name in the slaves directory, because my machine name is localhost, so I don't edit it.
Step three: Run Spark
Start the spark cluster.
Enter the Sbin directory:
Then access in the browser: localhost:8080
You can see information for a worker node from the page.
We enter the bin directory of Spark, using the "Spark-shell" console:
The following interface should appear without errors:
Under test:
Go to the Spark-shell Web console page by accessing "http://localhost:4040":
Input:
Hello World
Hello Hadoop
Pls say hello
Then enter the read program on the Scala command line:
Val readFile = Sc.textfile ("File:///home/file/test1.txt")
Re-execution:
Readfile.collect
To view the Spark-shell Web console:
Install and configure spark under CentOS 7.0