Summary: This article describes how to install three Ubuntu virtual machines in virtualbox, build a hadoop environment, and finally run the wordcount routine in hadoop's built-in example.
1. Lab Environment
Virtualbox version: 4.3.2 r90405
Ubuntu virtual machine version: ubuntu11.04
Ubuntu Virtual Machine JDK version: jdk-1.6.0_45
Ubuntu Virtual Machine hadoop version: hadoop-0.20.2
2. Overview
To implement hadoop multi-node distributed computing on a single computer, you need to create multiple hosts through the Virtual Machine. In this article, we use the virtualbox Virtual Machine to build a multi-node platform. Create a new virtual machine, install SSH, configure the key for password-less access, install JDK, install hadoop and configure it, and run the wordcount program in hadoop to verify the environment configuration.
3. Detailed steps
3.1 Virtual Machine Installation
To run multiple virtual machines at the same time, considering the system load, this experiment selects early Ubuntu version 10.04. After downloading the system image file, open virtualbox and create a new OS, after simple configuration, a virtual machine is created, and then the virtual machine is started, the image file directory is selected. After the System option is selected, the system is created, and then another one is created using the same method, the three virtual machines are named ub01, ub02, and ub03 respectively. The username is set to vbox, And the logon password is also vbox. Then, use ifconfig to view the IP addresses of the three virtual machines, we can see that their IP addresses are different and Ping each other, as shown in. The virtual machine installation is completed here.
After Ping is enabled, configure the aliases of the three machines so that you do not need to access each other through IP addresses. Open/etc/hosts and add the following content:
223.3.77.207 ub01
223.3.73.102 ub02
223.3.85.84 ub03
If ub01 is written as above and ub02 is as follows, ub03 can be released at the same time (each IP address here should be seen after ifconfig, and varies with the machine)
After the alias is set, ping the Virtual Machine alias. The result is as follows:
3.2 SSH installation and configuration
Ping is not enough. In order to complete the distributed computing system, three machines need to be able to access each other without a password (or the master can access slave without a password ). First Install SSH-openserver on three VMS:
Sudo apt-Get Install SSH rsync
After that, create a new. Ssh folder under the/home/vbox/directory and execute the following in. Ssh:
Ssh-keygen-T RSA
The system will ask you some configuration items. Because this is the first experiment, you don't need this content. Just press enter to continue. The id_rsa and id_rsa.pub files are generated under. Ssh/. The three machines perform the same processing.
After the key is generated, we need to exchange the keys of the three VMS, for example, in ub01
SCP ~ /. Ssh/id_rsa.pub ub02:/home/vbox/. Ssh/id_rsa.pub.ub01
SCP ~ /. Ssh/id_rsa.pub ub03:/home/vbox/. Ssh/id_rsa.pub.ub01
The above two statements are used to copy the id_rsa.pub file (the key on ub01) on the local machine to the same location on ub02 and ub03, and rename it id_rsa.pub.ub01.
Perform the same processing on ub02 and ub03 /. SSH/should have three keys. One is your own and the other two are others'. add your own keys together with the other two keys to the authorization key (for ub01)
Cat id_rsa.pub> authorized_keys;
Cat id_rsa.pub.ub02> authorized_keys;
Cat id_rsa.pub.ub03> authorized_keys;
Add the three keys to the same authorization key authorized_keys.
Perform the same operations on ub02 and ub03. In this way, all three machines have access rights to the other two machines.
Next, check whether password-less access can be achieved through SSH. Enter the following information on the terminal (for ub01 ):
SSH ub02;
If the access succeeds, the welcome information is displayed. If yes is required for the first access, you can directly access it. The experiment is as follows:
3.3 JDK installation and configuration
JDK must be installed on all three virtual machines. You only need to install the configuration on one machine and copy the JDK folder to the other two machines.
The JDK file we selected is the jdk-6u45-linux-i586.bin, after the download is complete, move the. Bin package to the personal directory/home/vbox/, execute
Chmod U + x jdk-6u45-linux-i586.bin;
Sudo-S./jdk-6u45-linux-i586.bin;
After the installation is complete, the JDK directory is generated in the current path. Next, set the environment variables and add the java_home, classpath, and path values to/etc/environment, after reboot, enter Java-version in terminal and you will see the version information, proving that the installation is successful. The installation result is as follows:
3.4 hadoop installation and configuration
The same operation is performed on three machines. You can operate on one machine and copy the operation to other machines.
Move the downloaded hadoop-0.20.2.tar.gz file to the/home/vbox/directory and perform the following installation:
Tar-xzvf hadoop-0.20.2.tar.gz // extract the file
The hadoop folder is generated in the current path, and the folder owner is modified:
Chown vbox: vbox hadoop-0.20.2
Then add the hadoop environment variables to the/etc/environment file. After the environment variables are added, the file is shown as follows:
Next you need to modify the configuration file under the hadoop/CONF/directory, a total of 6 files need to be modified, respectively, masters, slaves, core-site.xml, mapred-site.xml, hdfs-site.xml, hadoop-env.sh, modify such:
Add java_home variable value in the file hadoop-env.sh, set to: java_home =/home/vbox/jdk1.6.0 _ 45. The above configuration information is exactly the same for ub01, ub02 and ub03.
3.5wordcount program test
In this test, the file size is 128 MB. During the test, enter the/home/vbox/hadoop/directory, format the file system, and start all services.
Hadoop namenode-format;
Start-all.sh
After the service is enabled, you can view the hadoop system status through JPS commands and web pages, such:
Create input and write the file to input
Hadoop FS-mkdir input;
Hadoop FS-put fileinput; // file is the path of the text file and uploaded to HDFS.
Execute wordcount and view the counting result
Hadoop jarhadoop-0.20.2-examples.jar wordcount Input Output
The experiment process and result are as follows:
Hadoop-0.20.2 installation Configuration