After a day, I finally installed a hadoop-0.21.0 on centos. I would like to record it for later use.
Operating System: centos 5.5
Hadoop: hadoop-0.21.0
JDK: 1.6.0 _ 17
Namenode Host Name: Master, namenode IP: 192.168.90.91
Datanode Host Name: slave, datanode IP: 192.168.90.94
Step 1: Install and start the SSH service
After centos 5.5 is installed and the sshd service is started by default, you can check whether the sshd service is started in "system"> "management"> "service. Of course, if the SSH service is not installed on the machine, run the sudo Yum Install SSH command to install it. Install rsync, which is a remote data synchronization tool. You can use LAN/wan to quickly synchronize files between multiple hosts and run the command sudo Yum install rsync. Modify the/etc/hosts file of each node and add the IP information of namenode and datanode to the end of the file:
192.168.90.91 master
192.168.90.94 slave
Step 2: configure the SSH service
(1), (2) is for each machine
(1) create a hadoop user name and user group
Run the command Su-root. Note that the command su root is not used. The latter cannot carry the parameter information of the root user, and cannot execute the command to create a user group or user. Run groupadd hadoop and useradd-G hadoop. Note that you cannot create a hadoop directory in the/home directory. Otherwise, the hadoop user fails to be created. After creating a user, it is best to restart the computer and log on to the system as a hadoop user. In this way, you do not need to Su to the hadoop user in subsequent operations, and will not be entangled in the file owner problem.
(2) generate an SSH key
If other users log on, switch to the hadoop user, run the Su-hadoop command, and run the command: SSH-keygen-t rsa in the/home/hadoop directory (Press enter all the way, select the default storage path). After the key is generated successfully, enter. run the CD command in the SSH directory. SSH: Run CP id_rsa.pub authorized_keys. Run SSH localhost at this time, and let the system remember the user. Then SSH localhost does not need to be entered again.
(3) Public exchange
Copy the public key on namenode to datanode and run the ssh-copy-ID-I $ home/command in the user directory of hadoop (/home/hadoop /. SSH/id_rsa.pub hadoop @ slave. Likewise, you can copy the public key on datanode to namenode, but this is not necessary. In this way, the two machines do not need a password for mutual SSH under the hadoop user.
Step 3: Install jdk1.6 or above (each machine)
(1) execute the command Yum install JDK
(2) If the first step did not find the source package, then you need to download the official website, https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_Developer-Site/en_US/-/USD/ViewProductDetail-Start? Productref = jdk-6u22-oth-JPR @ CDS-CDS_Developer.
(3) New directory/usr/Java, copy the source package jdk-6u22-linux-i586.bin to the directory, execute the command chmod A + x jdk-6u22-linux-i586.bin
Give the current user permission to execute on the jdk-6u22-linux-i586.bin. Run the sudo./jdk-6u22-linux-i586.bin command to install
(4) Modify/etc/profile to add environment variables. The environment variables set in/etc/profile are the same as those in Windows environment variables. All users can use them.
Open/etc/profile in a text editor
# Vi/etc/profile
Add the following lines at the end:
Export java_home =/usr/Java/jdk1.6.0 _ 22
Export classpath =.: $ java_home/JRE/lib/RT. jar: $ java_home/lib/dt. jar: $ java_home/lib/tools. Jar
Export Path = $ path: $ java_home/bin
In this way, we have set JDK, and the source/etc/profile under centos will take effect.
Run the Java-version command to check whether the installation is successful.
Step 4: Install hadoop
It turns out that hadoop is only installed now, And the preparation work is too much.
(1) New directory/usr/local/hadoop, decompress hadoop-0.21.0.tar.gz to this directory, execute the command sudo tar-xvzf hadoop-0.21.0.tar.gz, modify the/etc/profile file, append the hadoop installation directory to the end of the file:
Export hadoop_home =/usr/local/hadoop/hadoop-0.21.0
Export Path = $ hadoop_home/bin: $ path
(2) Configure/CONF/hadoop-env.sh file, modify java_home Environment Variables
Export java_home =/usr/Java/jdk1.6.0 _ 22/
(3) configure the core-site.xml File
<Configuration>
<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/usr/local/hadoop/hadoop-0.21.0/tmp </value>
(Note: Create the TMP folder in the hadoopinstall directory first)
<Description> a base for other temporary directories. </description>
</Property>
<! -- File System Properties -->
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS: // master: 54310 </value>
</Property>
</Configuration>
(4) configure the hdfs-site.xml File
<Configuration>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value> (there are two machines in total. If you configure the master node as datanode, you can write 2 here)
</Property>
<Configuration>
(5) configure the mapred-site.xml File
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> master: 54311. </value>
</Property>
</Configuration>
(6) configure the conf/masters file and add the namenode IP Address
Master
(7) configure the slaves file and add the IP addresses of all datanode
Slave
(If the number of copies in the previous hdfs-site.xml file is set to 2, you need to add the master to the slaves file as well)
(8) The hadoop configured on namenode in the file folder hadoop-0.21.0 reset
In the/usr/lcoal/hadoop/directory of datanode (in fact, it is not necessary to copy the masters and slavers files.
No problem ).
(9) configure the/etc/profile file of datanode and append the following content at the end of the file:
Export hadoop_home =/usr/local/hadoop/hadoop-0.21.0
Export Path = $ hadoop_home/bin: $ path
Step 5: Start hadoop
First, disable the system firewall. Run the command/etc/init. d/iptables stop and run the command/etc/init. d/iptables status to check the Firewall Status. Under the hadoop user, open the terminal under the/usr/local/hadoop/hadoop-0.21.0/bin directory of namenode and execute the command hadoop namenode-format to format the directory node. Note that the/usr/local/hadoop/hadoop-0.21.0/tmp directory can be written, otherwise an exception occurs during formatting. Run the command start-all.sh to start the hadoop cluster, run the command JPs to view the process, run the command hadoop dfsadmin-Report to view the status. Enter http: // master: 50070 in the browser to view the cluster status on the web. View jobtraker running status: http://www.ibm.com/developerworks/cn/linux/l-hadoop-2/index.html
PS: it is best to clear the tmp directory of the node and delete the files in the logs directory when formatting namenode.
Now, the hadoop Cluster Based on centos5.5 has been built!
References: http://www.ibm.com/developerworks/cn/linux/l-hadoop-2/index.html