Preface:
Although it seems that there are not many implementation problems in the process of building a large-scale learning hadoop platform since middle July, for a person who has never been familiar with Linux, Java, and cloud computing platforms before, it took a while. The biggest emotion is that the version of various tools is very important. VMWare, Ubuntu, JDK, hadoop, hbase, Zookeeper, and any version is fatal.
At the beginning, I started my experiment on the second version of cloud computing by Liu Peng. It seems like a simple sentence. It takes me a whole day to implement it. For example:
1. Virtual Machine usage problems. In the latest version of VMware, after installing vmware-tools, the/mnt/HGFS/share directory does not exist and there is no shared folder, I was unable to get various software (although I know that virtual machines can access the Internet directly, but at first I thought that various software could only be obtained through sharing with the host, I have used all kinds of methods, and almost all web pages installed with vmware-tools have been replaced by Baidu. The latest kernel is used to perform a new vmwaretest, and then the mware-workstation-full-7.1.1-282343.exe is created.
After the change, when Cd, MNT, and HGFS appear in the command line, the excitement cannot be explained. But I didn't realize that there were even more problems waiting for me.
2. Install the JDK and Java environment configurations. Now it seems very simple to write the paths of the various tools we installed in the/ECT/profile file. But at the beginning, I was confused. I always accidentally wrote a path error and finally had to modify it over and over again.
3. about Linux File permissions. When I first started using a common user, I always indicated that the permission was insufficient. Finally, I was too lazy to use The sudo command to directly jump into the root user and always used it to solve the problem. This habit has caused a huge problem when hbase is used later.
If you are not talking about it, go to the experiment process.
Various versions used:
1)vmware-workstation-full-7.1.1-282343.exe is recommended not to be written in Chinese. The latest version has a problem.
2) ubuntu-10.04.1-desktop-i386.iso
32.16hadoop-0.%2.tar.gz
Experiment Process
1. Change the root user password:
Sudo passwd root change Root Password
Su root enters the root user
Ii. Install virtual machine tools:
1. Mount-o, loop/dev/CDROM/mnt means to mount the optical drive to the/mnt directory.
2. Enter the CD/mnt directory
3. Tar zxvf VmwareTools-8.4.2-261024.tar.gz-C ~
Extract the VmwareTools-8.4.2-261024.tar.gz to the/root directory, and write C in uppercase.
4.../vmware-install.pl run the installation file in the/root/vmware-tools-distrib directory
5. Press ENTER or yes
6. restart the system, remove the attachment to the CD in VM setting, and set the shared folder.
7,/usr/bin/vmware-config-tools.pl is the location and name of the configuration file, to configure the shared folder, you need to use the root user, still one way press enter, yes, no
8. The CD/mnt/HDFS/share folder exists, indicating that the VM tool is successfully installed.
3. Install SSH
Sudo apt-Get Install SSH
4. install and configure Java
1. In the/usr directory, create a Java Folder: mkdir Java requires the root user
2, in the/usr/Java directory, run:/mnt/HGFS/share/jdk-6u26-linux-i586.bin
Enter Java javac Java-version with information
3. Install the vim software to edit the file apt-Get install Vim in the future. (We strongly recommend that you install it because the VI tool is inconvenient to use)
4. Configure the Java environment:
1) Vim/etc/profile edit the profile file
2) Add the following information to the end of the file:
Java_home =/usr/Java/jdk1.6.0 _ 26
Jre_home =/usr/Java/jdk1.6.0 _ 26/JRE
Classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath
Path = $ java_home/bin: $ jre_home/bin: $ path
Export java_home
Export jre_home
Export classpath
Export path
3) Edited: WQ is saved and exited.
V. Installing hadoop
1. Copy the hadoop-0.21.0.tar.gz of the installation package to/usr
CP, MNT, HGFS, share, hadoop-0.20.2.tar.gz, USR
2,/usr directory decompression package: tar-zvxf hadoop-0.20.2.tar.gz
Decompress, the folder: hadoop-0.20.2 appears
6. Configure hadoop
1. Configure hadoop environment parameters:
Vim/etc/profile
Add the following information
: WQ save and exit
2. Reboot
Input hadoop version. Version Information is installed.
3. Edit usr/hadoop2-0.20.2/CONF/hadoop-env.sh File
Vim CONF/hadoop-env.sh
VII. standalone Mode
1. Use root user under usr/hadoop2-0.20.2 directory:
2. view the result cat output /*
VIII. pseudo Distribution Mode
1. hadoop Configuration:
1) core-site.xml document content, (location in VIM/usr/hadoop-0.20.2/CONF/core-site.xml)
2) hdfs-site.xml document content:
3) content of the mapred-site. XML document:
2. Password-free SSH settings:
1) generate a key pair: SSH-keygen-T RSA
Press enter to save the file in/root/. Ssh.
2) enter the. Ssh directory and run the following command:
CP id_rsa.pub authorized_keys
SSH localhost
3. hadoop running
1) format the Distributed File System in the usr/hadoop2-0.20.2 directory:
Bin/hadoop namenode-format
2) Start the hadoop daemon and start five processes:
Bin/start-all.sh
3) run the wordconut instance:
Copy the input directory in the Local System to the root directory of HDFS, rename it in, and run the wordconut instance that comes with hadoop. "Out" is the output directory after data processing. It is in the root directory of hadoop by default. You must clear or delete the "out" directory before running. Otherwise, an error is returned.
4) after the task is executed, view the data processing result:
You can also copy the output file from the hadoop Distributed File System to the local file system for viewing.
5) Stop the hadoop daemon.
Bin/stop-all.sh
9. Full Distribution Mode
1. Configure the IP address values of each host:
Three sub-hosts are set up here: unbuntunamenode, unbuntu1, and unbuntu2.
Unbuntunamenode: 192.168.122.136
Unbuntu1 configuration: 192.168.122.140
Unbuntu2 configuration: 192.168.122.141
Their subnet mask is 255.255.255.0 and the gateway is 192.168.122.255. (Note: Once the IP address of etho is modified, it cannot be connected to the Internet)
2. Configure the hosts of namenode and datanode: (preferably consistent)
Use unbuntunamenode as the namenode to configure its/etc/hosts. You need to add the host name corresponding to the IP addresses of all machines in the cluster to the file.
Run the SCP command to copy/etc/hosts to another node:
SCP/etc/hosts ub1-deskop:/etc
SCP/etc/hosts ub2-desktop:/etc
3. Ssh configuration. to execute commands between machines, you do not need to enter a password:
1) Create the. Ssh directory on all machines and run: mkdir. SSH
2) generate a key pair on unbuntunamenode and execute: SSH-keygen-T RAS
All the way to enter, the generated key pair will be saved in the. Ssh/id_rsa file according to the default options
3) run the following command on unbuntunamenode:
Cd ~ /. SSH
CP id_rsa.pub authorized_keys
SCP authorized_keys ub1-desktop:/home/GRID/. SSH
SCP authorized_keys ub2-desktop:/home/GRID/. SSH
4) Go To The. Ssh directory of all machines and change the permission for the authorized_keys file:
4. Configure hadoop:
On the namenode host, make sure that hadoop has been installed.
1) EDIT core-site.xml, hdfs-site.xml, and mapred-site. xml
2) edit conf/masters, change it to the Host Name of the master, and add 192.168.122.136 (or UBN)
3) edit conf/slaves and add all Server Load balancer hostnames, that is, ub1 and ub2.
192.168.122.140
192.168.122.141
4) copy the namenode configuration file to another machine.
Hadoop2-0.20.2 SCP-r ub1-desktop:/usr/
Hadoop2-0.20.2 SCP-r ub2-desktop:/usr/
5. hadoop running
Format the Distributed File System: Bin/hadoop namenode-format
Start the hadoop daemon: Bin/start-all.sh
Run the JPS command to check the startup status:/usr/Java/jdk1.6.0 _ 26/bin/JPs.
6. wordcount test case
1) copy the local input folder to the root directory of HDFS and rename it in:
Bin/hadoop DFS-put input in
2) running case:
Bin/hadoop jars hadoop2-0.20.2-examples.jar wordcount in out
3) view the processing result:
Bin/hadoop DFS-cat out /*
4) Stop the hadoop daemon.
Bin/stop-all.sh
7. Some Common commands in HDFS:
1) delete the file in the root directory of HDFS: Bin/hadoop DFS-RMR in
2) bin/hadoop dfsadmin-help can list all currently supported commands
3) view namenode logs in Bin/hadoop dfsadmin-Report
4) disable the security mode:
Bin/hadoop dfsadmin-safemode leave
5) enter the Security Mode:
Bin/hadoop dfsadmin-safemode enter
6) Server Load balancer:
Bin/start-balancer.sh
7) Benchmark Test:
Bin/hadoop jar hadoop2-0.20.2-test.jar testdfsio-Write-nrfile 20-filezize 200