Earlier, we were already running Hadoop on a single machine, but we know that Hadoop supports distributed, and its advantage is that it is distributed, so let's take a look at the environment.
Here we use a strategy to simulate the environment. We use three Ubuntu machines, one for the master and the other two for the slaver. At the same time, this host, we use the first chapter to build a good environment.
We operate with steps similar to those in Chapter 1:
1, the operating environment to build
In the front, we know that running hadoop is running on linux. So we stand alone on ubuntu running. So the same, 2 from the machine, the same linux system. In order to save resources, I used two test centos system, and is the way the command line, no graphical way.
System software preparation, the first chapter we prepared subversion ssh ant and jdk. That's here, from the plane we do not have so much, we do not have to download, compile the code, copy from the host on the line. So in the machine only need to install ssh and jdk these two:
First use sudo apt-get install ssh this command, SSH installed.
Note: In centOS, use yum install ssh.
java environment, you can download a JDK installation package online, such as: jdk-6u24-linux-i586.bin
Installation directly in the directory to run ./jdk-6u24-linux-i586.bin can.
Then configure jdk directory:
First enter the installation directory cd jdk-6u24- ...
Then enter the PWD you can see the java installation directory, copied down:
Command line execution: sudo gedit / etc / profile
In the open file, append:
export JAVA_HOME = / home / administrator / hadoop / jdk1.6.0_27 / / Here to write the installation directory
export PATH = $ {JAVA_HOME} / bin: $ PATH
The implementation of source / etc / profile effective immediately
2, network configuration
To run a distributed environment, then these three computers (virtual machines) must be connected to the Internet Caixing. At the same time, three must also be smooth before.
If you directly use the virtual machine, it is more convenient, all in the virtual machine using NAT networking can be:
Into three systems, respectively, with the ifconfig command, you can find the current assigned IP address:
As can be seen in the figure is 10.0.0.11.
If you find no eth0, which means that the card has not enabled or assigned a good address, you can manually assign:
ifconfig eth0 10.0.http: //www.aliyun.com/zixun/aggregation/37104.html "> 0.12 netmask 255.255.255.0 // Set eht0's IP address
route add default gw 10.0.0.2 // Set gateway
In VMware, how to see the gateway, you can see in the menu editor -> virtual network editor:
The gateway must be configured on the right, or light IP address, before the machine each other ping nowhere.
After configuring the IP, you can try to ping the gateway and other machines to see if it can pass.
Here, we 3 machine IP is:
Host master: 10.0.0.10
Slave 1 salter1: 10.0.0.11
Slave 2 salter2: 10.0.0.12
With three machines IP address, we think, behind the configuration will certainly be used, but in order to facilitate future IP address changes, so we still use another name. In the window, we know that in C: \ Windows \ System32 \ driver \ etc, there is a host file, modified, you can change the IP into alias.
In linux, the same file, in / etc / hosts. So edit it: $ vi / etc / hosts:
After the file is saved, you can try the ping master ping node1 instead of the IP address.
This operation needs to operate on all three machines.
Now for the web, for the rest of the operation, the same account must be used on all machines where hadoop is deployed. Therefore, we need to create a machine with the same account on the two machines, password:
Such as using zjf account: $ user add zjf Set password: $ passwd zjf into the account: $ su zjf
There may be a firewall on the machine, affecting the back of the remote, so you can close it:
$ service iptables stop
3, configure SSH
In Chapter 1, we learned about the capabilities of SSH, which is where it really works.
We are in the master machine, use ssh try to connect node1:
Can see that you need to enter the password to enter. Remote start all from the machine, one by one to enter the password, nor is it a good thing to configure:
1) in the slave node1 first to log in yourself do not enter a password.
This is already described in Chapter 1. Here is not much to say. The result is:
2) Allow the master to log in to two sub-nodes via SSH password-free (slave)
In order to achieve this function, the public key file of the two slave nodes must contain the information of the public key of the master node, so that the master can safely and safely access the two slave nodes. The operation is as follows:
3 $ cd ~ / .ssh
$ scp zjf @ master: ~ / .ssh / id_dsa.pub ./master_dsa.pub
$ cat master_dsa.pub >> authorized_keys
Well, after configuration, back to the master machine, try ssh node1:
OK, successfully entered, there is no need to enter a password.
Similarly, the node2 is also safe to operate this way.
4, configured hadoop
Based on the first chapter configuration, we need to add two configurations:
In the conf folder, find the masters file, edit, enter master inside save:
In the same folder, find the slaves, edit, enter node1 node2 save inside:
Open the conf-core-site.xml under:
Inside the localhost into master.
Open conf under the marped-site.xml:
Similarly, it is replaced by master inside localhost.
5, copy the hadoop package
In the front, deployed on a machine, our Hadoop package is downloaded through the SVN source code, and then use Ant to compile, but here, from the plane will not have to be so troublesome, we can copy from the host past. How to copy? Remote login We use SSH, remote copy on SCP. Attention should be paid before copying, we hadoop in the host where stored in the machine should also be stored in the position Caixing.
For example, in the host, we stored in test, so in 2 nodes, create a test folder.
Then execute on the host: scp-r hadoop-0.20.2 / node1: ~ / test And then you will see the brush screen, said the copy.
Also execute: scp -r hadoop-0.20.2 / node2: ~ / test
Well, now both from the plane hadoop package.
6, run
On the host, enter hadoop-0.20.2 directory, run bin / start-all.sh, you can start the entire distributed system.
Then run jps on the host:
Running jps from the machine
Open the main server http: // localhost: 50070, you can see:
There are two activities of the node, click in, you can see:
Click the following node can view the details, if the open page can not be opened, it may be blocked by the machine firewall.
Can enter the appropriate machine, the implementation
$ service iptables stop
To turn off the firewall.
We can try to upload the file:
$ bin / hadoop fs -put ~ / Tool / eclipse-SDK-3.7.1-linux-gtk.tar.gz test1.tar.gz
can be seen:
Then upload the file:
$ bin / hadoop fs -put ~ / Tool / eclipse-SDK-3.7.1-linux-gtk.tar.gz test2.tar.gz
can be seen:
But I found an imbalance, have run on a stage. So you can execute the command
$ bin / hadoop balancer -threshold 1
So, look again:
Balanced.
Original link: http://www.kwstu.com/ArticleView/hadoop_201408181042089382