Implementing truly distributed Hadoop is not pseudo-distributed.
first, System and configuration
A total of 2 machines were prepared to build Hadoop clusters. Based on the ubuntu14.04,jdk1.6.0_45,hadoop1.0.3 version, the virtual machine uses VMware10.0
192.168.1.10 NameNode Master (Master)
192.168.1.20 datenode slave1 (Slave)
My user name is Hadoop
The next step is to install some common software: vim,ssh
sudo apt-get updatesudo apt-get install vimsudo apt-get install SSH
First modify the native IP
Then enter the following command to modify the hosts
sudo vim/etc/hosts
Then set up SSH, login without password, enter the following command, generate the secret key
Ssh-keygen-t rsa-p ""
always press ENTER to generate the secret key, and then Id_rsa and id_rsa.pub Two files are generated in the. SSH directory, each of which is the private key and the public key of SSH.
build Authorized_keys file: Cat id_rsa.pub >> authorized_keys implement SSH with no password to login to localhost,
Ii. installing Hadoop and JDKUnzip the Hadoop tarball, my installation directory is/home/hadoop (this is the user name)/hadoop (this is the folder)/
TAR-ZXVF Hadoop1.0.3.tar
unzip the JDK's compressed package, the command is similar to the above, but the name of the file, this is not listed here and then down is to modify the environment variable
sudo vim/etc/profileexport java_home =/home/hadoop/hadoop/jdk1.6.0_45export PATH = $JAVA _home/bin: $PATHexport HADOOP_ HOME =/home/hadoop/hadoop/hadoop-1.0.3export PATH = $JAVA _home/bin: $HADOOP _home/bin: $PATH
don't forget to do it. Source/etc/profile , let the path take effect immediately
Finally configure the files in the Conf folder under Hadoop Modify Hadoop-env.sh
Modify Core-site.xml
Modify Hdfs-site.xml
Modify Mapred-site.xml
Modify the Masters, and slaves files, in Masters write only Master is the above mentioned 192.168.1.0, slaves fill in the master and slave.
Then format the Namenode and write the following command in the hadoop-1.0.3 file
Bin/hadoop Namenode-format
Right here has not mentioned slave configuration, in fact, especially simple, close the current virtual machine, a copy of just the virtual machine files, and then re-name, open again, modify the username and IP is good, my Ubuntu name is the same, as long as not a disk on the line.
Finally, enter the following command in the master (username, which is the main node of Ubuntu), also in the hadoop-1.0.3 file
bin/start-all.sh
Then enter JPS to view the Java process, which is successful if the following 5 processes are present (excluding JPS)
can view Web pages
There are already two nodes, and this entire Hadoop distributed deployment is complete.
Configuring the Hadoop environment under Ubuntu