The entire process of installing hadoop with Vmware

Source: Internet
Author: User
Tags cloud computing platforms free ssh scp command

Preface:

Although it seems that there are not many implementation problems in the process of building a large-scale learning hadoop platform since middle July, for a person who has never been familiar with Linux, Java, and cloud computing platforms before, it took a while. The biggest emotion is that the version of various tools is very important. VMWare, Ubuntu, JDK, hadoop, hbase, Zookeeper, and any version is fatal.

At the beginning, I started my experiment on the second version of cloud computing by Liu Peng. It seems like a simple sentence. It takes me a whole day to implement it. For example:

1. Virtual Machine usage problems. In the latest version of VMware, after installing vmware-tools, the/mnt/HGFS/share directory does not exist and there is no shared folder, I was unable to get various software (although I know that virtual machines can access the Internet directly, but at first I thought that various software could only be obtained through sharing with the host, I have used all kinds of methods, and almost all web pages installed with vmware-tools have been replaced by Baidu. The latest kernel is used to perform a new vmwaretest, and then the mware-workstation-full-7.1.1-282343.exe is created.
After the change, when Cd, MNT, and HGFS appear in the command line, the excitement cannot be explained. But I didn't realize that there were even more problems waiting for me.

2. Install the JDK and Java environment configurations. Now it seems very simple to write the paths of the various tools we installed in the/ECT/profile file. But at the beginning, I was confused. I always accidentally wrote a path error and finally had to modify it over and over again.

3. about Linux File permissions. When I first started using a common user, I always indicated that the permission was insufficient. Finally, I was too lazy to use The sudo command to directly jump into the root user and always used it to solve the problem. This habit has caused a huge problem when hbase is used later.

If you are not talking about it, go to the experiment process.

Various versions used:

1)vmware-workstation-full-7.1.1-282343.exe is recommended not to be written in Chinese. The latest version has a problem.

2) ubuntu-10.04.1-desktop-i386.iso

32.16hadoop-0.%2.tar.gz

Experiment Process

1. Change the root user password:

Sudo passwd root change Root Password

Su root enters the root user

Ii. Install virtual machine tools:

1. Mount-o, loop/dev/CDROM/mnt means to mount the optical drive to the/mnt directory.

2. Enter the CD/mnt directory

3. Tar zxvf VmwareTools-8.4.2-261024.tar.gz-C ~

Extract the VmwareTools-8.4.2-261024.tar.gz to the/root directory, and write C in uppercase.

4.../vmware-install.pl run the installation file in the/root/vmware-tools-distrib directory

5. Press ENTER or yes

6. restart the system, remove the attachment to the CD in VM setting, and set the shared folder.

7,/usr/bin/vmware-config-tools.pl is the location and name of the configuration file, to configure the shared folder, you need to use the root user, still one way press enter, yes, no

8. The CD/mnt/HDFS/share folder exists, indicating that the VM tool is successfully installed.

3. Install SSH

Sudo apt-Get Install SSH

4. install and configure Java

1. In the/usr directory, create a Java Folder: mkdir Java requires the root user

2, in the/usr/Java directory, run:/mnt/HGFS/share/jdk-6u26-linux-i586.bin

Enter Java javac Java-version with information

3. Install the vim software to edit the file apt-Get install Vim in the future. (We strongly recommend that you install it because the VI tool is inconvenient to use)

4. Configure the Java environment:

1) Vim/etc/profile edit the profile file

2) Add the following information to the end of the file:

Java_home =/usr/Java/jdk1.6.0 _ 26
Jre_home =/usr/Java/jdk1.6.0 _ 26/JRE
Classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath
Path = $ java_home/bin: $ jre_home/bin: $ path
Export java_home
Export jre_home
Export classpath
Export path

3) Edited: WQ is saved and exited.

V. Installing hadoop

1. Copy the hadoop-0.21.0.tar.gz of the installation package to/usr

CP, MNT, HGFS, share, hadoop-0.20.2.tar.gz, USR

2,/usr directory decompression package: tar-zvxf hadoop-0.20.2.tar.gz

Decompress, the folder: hadoop-0.20.2 appears

6. Configure hadoop

1. Configure hadoop environment parameters:

Vim/etc/profile

Add the following information
: WQ save and exit

2. Reboot

Input hadoop version. Version Information is installed.

3. Edit usr/hadoop2-0.20.2/CONF/hadoop-env.sh File

Vim CONF/hadoop-env.sh

 

VII. standalone Mode

1. Use root user under usr/hadoop2-0.20.2 directory:

2. view the result cat output /*

VIII. pseudo Distribution Mode

1. hadoop Configuration:

1) core-site.xml document content, (location in VIM/usr/hadoop-0.20.2/CONF/core-site.xml)

2) hdfs-site.xml document content:

3) content of the mapred-site. XML document:

2. Password-free SSH settings:

1) generate a key pair: SSH-keygen-T RSA

Press enter to save the file in/root/. Ssh.

2) enter the. Ssh directory and run the following command:

CP id_rsa.pub authorized_keys

SSH localhost

3. hadoop running

1) format the Distributed File System in the usr/hadoop2-0.20.2 directory:

Bin/hadoop namenode-format

2) Start the hadoop daemon and start five processes:

Bin/start-all.sh

3) run the wordconut instance:

Copy the input directory in the Local System to the root directory of HDFS, rename it in, and run the wordconut instance that comes with hadoop. "Out" is the output directory after data processing. It is in the root directory of hadoop by default. You must clear or delete the "out" directory before running. Otherwise, an error is returned.

4) after the task is executed, view the data processing result:

You can also copy the output file from the hadoop Distributed File System to the local file system for viewing.

5) Stop the hadoop daemon.

Bin/stop-all.sh

9. Full Distribution Mode

1. Configure the IP address values of each host:

Three sub-hosts are set up here: unbuntunamenode, unbuntu1, and unbuntu2.

Unbuntunamenode: 192.168.122.136

Unbuntu1 configuration: 192.168.122.140

Unbuntu2 configuration: 192.168.122.141

Their subnet mask is 255.255.255.0 and the gateway is 192.168.122.255. (Note: Once the IP address of etho is modified, it cannot be connected to the Internet)

2. Configure the hosts of namenode and datanode: (preferably consistent)

Use unbuntunamenode as the namenode to configure its/etc/hosts. You need to add the host name corresponding to the IP addresses of all machines in the cluster to the file.

Run the SCP command to copy/etc/hosts to another node:

SCP/etc/hosts ub1-deskop:/etc

SCP/etc/hosts ub2-desktop:/etc

3. Ssh configuration. to execute commands between machines, you do not need to enter a password:

1) Create the. Ssh directory on all machines and run: mkdir. SSH

2) generate a key pair on unbuntunamenode and execute: SSH-keygen-T RAS

All the way to enter, the generated key pair will be saved in the. Ssh/id_rsa file according to the default options

3) run the following command on unbuntunamenode:

Cd ~ /. SSH

CP id_rsa.pub authorized_keys

SCP authorized_keys ub1-desktop:/home/GRID/. SSH

SCP authorized_keys ub2-desktop:/home/GRID/. SSH

4) Go To The. Ssh directory of all machines and change the permission for the authorized_keys file:

4. Configure hadoop:

On the namenode host, make sure that hadoop has been installed.

1) EDIT core-site.xml, hdfs-site.xml, and mapred-site. xml

2) edit conf/masters, change it to the Host Name of the master, and add 192.168.122.136 (or UBN)

3) edit conf/slaves and add all Server Load balancer hostnames, that is, ub1 and ub2.

192.168.122.140

192.168.122.141

4) copy the namenode configuration file to another machine.

Hadoop2-0.20.2 SCP-r ub1-desktop:/usr/

Hadoop2-0.20.2 SCP-r ub2-desktop:/usr/

5. hadoop running

Format the Distributed File System: Bin/hadoop namenode-format

Start the hadoop daemon: Bin/start-all.sh

Run the JPS command to check the startup status:/usr/Java/jdk1.6.0 _ 26/bin/JPs.

6. wordcount test case

1) copy the local input folder to the root directory of HDFS and rename it in:

Bin/hadoop DFS-put input in

2) running case:

Bin/hadoop jars hadoop2-0.20.2-examples.jar wordcount in out

3) view the processing result:

Bin/hadoop DFS-cat out /*

4) Stop the hadoop daemon.

Bin/stop-all.sh

7. Some Common commands in HDFS:

1) delete the file in the root directory of HDFS: Bin/hadoop DFS-RMR in

2) bin/hadoop dfsadmin-help can list all currently supported commands

3) view namenode logs in Bin/hadoop dfsadmin-Report

4) disable the security mode:

Bin/hadoop dfsadmin-safemode leave

5) enter the Security Mode:

Bin/hadoop dfsadmin-safemode enter

6) Server Load balancer:

Bin/start-balancer.sh

7) Benchmark Test:

Bin/hadoop jar hadoop2-0.20.2-test.jar testdfsio-Write-nrfile 20-filezize 200

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.