Build Hadoop cluster Complete process notes
One, virtual machines and operating systems
Environment: ubuntu14+hadoop2.6+jdk1.8
Virtual machine: Vmware12
Second, installation steps:
First configure the JDK and Hadoop on a single machine:
1. Create a new Hadoop user
With command: AddUser Hadoop
2. In order for Hadoop users to have sudo privileges:
Open the Sudors file with the root user to add the contents of the Red box:
Open File:
Add Content:
3. To configure the JDK, I put the JDK's tarball in the Hadoop user directory, and then unzip it in the current directory.
Modify the configuration file (configure environment variables): Add the contents of the red box in this location, where the red underline content is modified according to the installation path of the personal JDK.
To make the configuration file work after modifying the configuration file, enter the following command:
Input command: Java-version, if the version of JDK appears, the installation is successful, as follows:
Here, the JDK was successfully configured, followed by the configuration hadoop*********************
4. Also place the Hadoop tarball in the user home directory of Hadoop (/home/hadoop) and unzip it in the current directory:
5. Modify the configuration file (Configure the Hadoop environment variable) to add content on the JDK environment variable you just configured:
Make the configuration file work again after you modify it
Then go into the bin directory of the Hadoop installation directory
Enter the following command to view the version of Hadoop, and if you can see the version information for Hadoop, prove the configuration was successful:
Above and configured with a standalone version of the Hadoop environment ***********************************
Next clone the configured machine, clone two: Open Vmvare: Virtual machine > Manage > Clone. (It is suggested that the newly cloned two machines are ordered as Slave1,slave2 respectively)
Always click Next to complete the clone. Where the clone type chooses to create a full clone.
1. Modify the hostname of each virtual machine separately for Master,slave1,slave2
2. Modify the Hosts file for the three virtual machines so that you don't have to remember the IP address, and use the hostname instead of the IP address.
(IP address is the IP address of three machines, can be viewed by ifconfig command on three machines respectively)
Once this is done, it is a good idea to reboot the system to take effect. You can then use ping Master (or slave1, slave2) to try, normal, should be able to ping .
NOTE: hostname do not name "xxx.01,xxx.02" or End with ". Number", otherwise the Namenode service to the last Hadoop will fail to start.
3. Set the static IP
Master host set static IP, on slave also refer to the settings to modify the specific IP
Execute command
sudo gedit/etc/network/interfaces
Open file modified to content
Auto Lo
Iface Lo inet Loopback
Auto Eth0
Iface eth0 inet Static
Address 192.168.140.128//This is the IP of this machine
netmask 255.255.255.0//No modification
Network 192.168.140.0//network segment, based on IP modification
Boardcast 192.168.140.255//Modified by IP
Gateway 192.168.140.2//gateways, change the IP address behind the department to 2
4. Configure SSH Interface Password login
Install online on Ubuntu
Execute command
sudo apt-get install SSH
**********************************************
Configure the implementation of SSH ideas:
Use Ssh-keygen to generate public key,private key on each machine
All the public keys on the machine are copied to a computer such as Master
Generate an authorization key file on Master Authorized_keys
Finally, the Authorized_keys is copied to all the machines in the cluster, can guarantee no password login
***************************************************
Implementation steps:
1. First on master, generate the public key, private key pair in the current user directory
Execute command
$CD/home/hadoop
$ssh-keygen-t Rsa-p "
That is: With the RSA algorithm, generate the public key, the private key pair,-P "" indicates a blank password.
After the command runs, the. SSH directory will be generated in the personal home directory with two files Id_rsa (private key), id_rsa.pub (public key)
2. Import the Public key
Execute command
Cat. Ssh/id_rsa.pub >>. Ssh/authorized_keys
After execution, you can use SSH to connect with yourself under test on this computer.
Execute command
$SSH Master
If you're still prompted to enter your password, the instructions don't work yet, and there's a key operation
View permissions, if it belongs to another user, you need to modify the file to other user rights
Execute command
chmod 644. Ssh/authorized_keys
Modify the file permissions, and then test under SSH master, if you do not need to enter the password, the connection is successful, indicating OK, a machine has been done.
If there is a problem try to solve
Please check whether the SSH service is started or not, if not, start!
If you do not have a. SSH directory, create one:
Execute command
$CD/home/hadoop
$mkdir. SSH
If you do not have permission, use the command to modify the owner of the folder you want to manipulate as the current user:
Execute command
sudo chown-r hadoop/home/hadoop
3. Generate the public key, key on the other machine, and copy the public key file to master
Log in as Hadoop two other machines slave1, Slave2, execute ssh-keygen-t rsa-p ' generate public key, key
Then use the SCP command to issue the public key file to master (that is, the machine that has just been taken care of)
Execute command
On the slave1:
SCP. ssh/id_rsa.pub [Email protected]:/home/hadoop/id_rsa_1.pub
On the Slave2:
SCP. ssh/id_rsa.pub [Email protected]:/home/hadoop/id_rsa_2.pub
After the two lines have been executed, go back to master and look at the next/home/hadoop directory, there should be two new files id_rsa_1.pub, Id_rsa_2.pub,
Then on master, import the two public keys
Execute command
$cat id_rsa_1.pub >>. Ssh/authorized_keys
$cat id_rsa_2.pub >>. Ssh/authorized_keys
In this case, Master has a public key for all 3 machines on this machine.
4. Copy the "Most complete" public key on master to another machine
Keep it on the master
Execute command
$SCP. Ssh/authorized_keys [Email Protected]:/home/hadoop/.ssh/authorized_keys
$SCP. Ssh/authorized_keys [Email Protected]:/home/hadoop/.ssh/authorized_keys
To modify the permissions of Authorized_keys files on other machines
Execute commands on slave1 and slave2 machines.
chmod. Ssh/authorized_keys
5. Verification
On each virtual machine, with the command ssh+ other machine hostname to verify that if the connection is successful without a password, indicating OK
As in slave1
Execute command
SSH slave1
SSH Master
SSH slave2
Execute the above command separately to ensure that all commands can be successfully logged in without a password.
5. Modify the Hadoop configuration file
Configure HDFs First, so modify the 4 configuration files First: Core-site.xml, Hdfs-site.xml, hadoop-env.sh, Slaves
To this directory in Hadoop:
1). Modify Core-site.xml
The path configured above/home/hadoop/tmp, if the TMP folder does not exist, you need to create a new TMP folder yourself
2. Modify Hdfs-site.xml
3. Modify the hadoop-env.sh, (there is a tutorial also need to configure Hadoop_home environment variables, I am not configured here but no problem, because it has been configured before)
4. Modify the slaves, delete the original content, add the host name of the other two nodes
5. Other machines distributed to the cluster
Copy the hadoop-2.6.0 folder, along with the modified configuration file, to the other 2 machines via SCP.
Execute command
$SCP-R hadoop-2.6.0/[email protected]: hadoop-2.6.0
After modifying these four files, the HDFS service is configured successfully. Start the HDFs service by running start-dfs.sh to check if the configuration was successful.
After the boot is complete, enter JPS, and if Namenode and JPS are displayed, the configuration is successful.
6. Next configure MapReduce, to modify Yarn-site.xml, Mapred-site.xml file
Modify the Yarn-site.xml file
7. Modify Mapred-site.xml
8. Other machines distributed to the cluster
Copy the hadoop-2.6.0 folder, along with the modified configuration file, to the other 2 machines via SCP.
Execute command
$SCP-R hadoop-2.6.0/[email protected]: hadoop-2.6.0
Run the start-yarn.sh script to start the MapReduce service. Displaying the three contents of the Red box indicates a successful configuration.
VMware builds Hadoop cluster complete process notes