VMware builds Hadoop cluster complete process notes

Last Update:2017-08-12 Source: Internet

Author: User

Tags tmp folder scp command

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Build Hadoop cluster Complete process notes

One, virtual machines and operating systems

Environment: ubuntu14+hadoop2.6+jdk1.8

Virtual machine: Vmware12

Second, installation steps:

First configure the JDK and Hadoop on a single machine:

1. Create a new Hadoop user

With command: AddUser Hadoop

2. In order for Hadoop users to have sudo privileges:

Open the Sudors file with the root user to add the contents of the Red box:

Open File:

Add Content:

3. To configure the JDK, I put the JDK's tarball in the Hadoop user directory, and then unzip it in the current directory.

Modify the configuration file (configure environment variables): Add the contents of the red box in this location, where the red underline content is modified according to the installation path of the personal JDK.

To make the configuration file work after modifying the configuration file, enter the following command:

Input command: Java-version, if the version of JDK appears, the installation is successful, as follows:

Here, the JDK was successfully configured, followed by the configuration hadoop*********************

4. Also place the Hadoop tarball in the user home directory of Hadoop (/home/hadoop) and unzip it in the current directory:

5. Modify the configuration file (Configure the Hadoop environment variable) to add content on the JDK environment variable you just configured:

Make the configuration file work again after you modify it

Then go into the bin directory of the Hadoop installation directory

Enter the following command to view the version of Hadoop, and if you can see the version information for Hadoop, prove the configuration was successful:

Above and configured with a standalone version of the Hadoop environment ***********************************

Next clone the configured machine, clone two: Open Vmvare: Virtual machine > Manage > Clone. (It is suggested that the newly cloned two machines are ordered as Slave1,slave2 respectively)

Always click Next to complete the clone. Where the clone type chooses to create a full clone.

1. Modify the hostname of each virtual machine separately for Master,slave1,slave2

2. Modify the Hosts file for the three virtual machines so that you don't have to remember the IP address, and use the hostname instead of the IP address.

(IP address is the IP address of three machines, can be viewed by ifconfig command on three machines respectively)

Once this is done, it is a good idea to reboot the system to take effect. You can then use ping Master (or slave1, slave2) to try, normal, should be able to ping .

NOTE: hostname do not name "xxx.01,xxx.02" or End with ". Number", otherwise the Namenode service to the last Hadoop will fail to start.

　　3. Set the static IP

Master host set static IP, on slave also refer to the settings to modify the specific IP

Execute command

sudo gedit/etc/network/interfaces

Open file modified to content

Auto Lo

Iface Lo inet Loopback

Auto Eth0

Iface eth0 inet Static

Address 192.168.140.128//This is the IP of this machine

netmask 255.255.255.0//No modification

Network 192.168.140.0//network segment, based on IP modification

Boardcast 192.168.140.255//Modified by IP

Gateway 192.168.140.2//gateways, change the IP address behind the department to 2

4. Configure SSH Interface Password login

Install online on Ubuntu

Execute command

sudo apt-get install SSH

**********************************************

Configure the implementation of SSH ideas:

Use Ssh-keygen to generate public key,private key on each machine

All the public keys on the machine are copied to a computer such as Master

Generate an authorization key file on Master Authorized_keys

Finally, the Authorized_keys is copied to all the machines in the cluster, can guarantee no password login

***************************************************

Implementation steps:

1. First on master, generate the public key, private key pair in the current user directory

Execute command

$CD/home/hadoop

$ssh-keygen-t Rsa-p "

That is: With the RSA algorithm, generate the public key, the private key pair,-P "" indicates a blank password.

After the command runs, the. SSH directory will be generated in the personal home directory with two files Id_rsa (private key), id_rsa.pub (public key)

2. Import the Public key

Execute command

Cat. Ssh/id_rsa.pub >>. Ssh/authorized_keys

After execution, you can use SSH to connect with yourself under test on this computer.

Execute command

$SSH Master

If you're still prompted to enter your password, the instructions don't work yet, and there's a key operation

View permissions, if it belongs to another user, you need to modify the file to other user rights

Execute command

chmod 644. Ssh/authorized_keys

Modify the file permissions, and then test under SSH master, if you do not need to enter the password, the connection is successful, indicating OK, a machine has been done.

If there is a problem try to solve

Please check whether the SSH service is started or not, if not, start!

If you do not have a. SSH directory, create one:

Execute command

$CD/home/hadoop

$mkdir. SSH

If you do not have permission, use the command to modify the owner of the folder you want to manipulate as the current user:

Execute command

sudo chown-r hadoop/home/hadoop

3. Generate the public key, key on the other machine, and copy the public key file to master

Then use the SCP command to issue the public key file to master (that is, the machine that has just been taken care of)

Execute command

On the slave1:

SCP. ssh/id_rsa.pub [Email protected]:/home/hadoop/id_rsa_1.pub

On the Slave2:

SCP. ssh/id_rsa.pub [Email protected]:/home/hadoop/id_rsa_2.pub

After the two lines have been executed, go back to master and look at the next/home/hadoop directory, there should be two new files id_rsa_1.pub, Id_rsa_2.pub,

Then on master, import the two public keys

Execute command

$cat id_rsa_1.pub >>. Ssh/authorized_keys

$cat id_rsa_2.pub >>. Ssh/authorized_keys

In this case, Master has a public key for all 3 machines on this machine.

4. Copy the "Most complete" public key on master to another machine

Keep it on the master

Execute command

$SCP. Ssh/authorized_keys [Email Protected]:/home/hadoop/.ssh/authorized_keys

To modify the permissions of Authorized_keys files on other machines

Execute commands on slave1 and slave2 machines.

chmod. Ssh/authorized_keys

5. Verification

On each virtual machine, with the command ssh+ other machine hostname to verify that if the connection is successful without a password, indicating OK

As in slave1

Execute command

SSH slave1

SSH Master

SSH slave2

Execute the above command separately to ensure that all commands can be successfully logged in without a password.

5. Modify the Hadoop configuration file

Configure HDFs First, so modify the 4 configuration files First: Core-site.xml, Hdfs-site.xml, hadoop-env.sh, Slaves

To this directory in Hadoop:

1). Modify Core-site.xml

The path configured above/home/hadoop/tmp, if the TMP folder does not exist, you need to create a new TMP folder yourself

2. Modify Hdfs-site.xml

3. Modify the hadoop-env.sh, (there is a tutorial also need to configure Hadoop_home environment variables, I am not configured here but no problem, because it has been configured before)

4. Modify the slaves, delete the original content, add the host name of the other two nodes

5. Other machines distributed to the cluster

Copy the hadoop-2.6.0 folder, along with the modified configuration file, to the other 2 machines via SCP.

Execute command

$SCP-R hadoop-2.6.0/[email protected]: hadoop-2.6.0

After modifying these four files, the HDFS service is configured successfully. Start the HDFs service by running start-dfs.sh to check if the configuration was successful.

After the boot is complete, enter JPS, and if Namenode and JPS are displayed, the configuration is successful.

6. Next configure MapReduce, to modify Yarn-site.xml, Mapred-site.xml file

Modify the Yarn-site.xml file

7. Modify Mapred-site.xml

8. Other machines distributed to the cluster

Copy the hadoop-2.6.0 folder, along with the modified configuration file, to the other 2 machines via SCP.

Execute command

$SCP-R hadoop-2.6.0/[email protected]: hadoop-2.6.0

Run the start-yarn.sh script to start the MapReduce service. Displaying the three contents of the Red box indicates a successful configuration.

VMware builds Hadoop cluster complete process notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More