Hardware environment
A total of 3 machines, all using the FC5 system, Java is using jdk1.6.0. The IP configuration is as follows:
dbrg-1:202.197.18.72
dbrg-2:202.197.18.73
dbrg-3:202.197.18.74
One thing to emphasize here is that it is important to ensure that each machine's hostname and IP address are resolved correctly.
A very simple test is to ping the host name, such as Ping dbrg-2 on the dbrg-1, if you can ping the ok! If not correctly resolved, you can modify the/etc/hosts file, if the machine for namenode use, you need to add in the Hosts file all the machines in the cluster IP address and its corresponding host name; If the machine is Datanode, You only need to add the native IP address and the IP address of the Namenode machine to the Hosts file.
For example, the/etc/hosts file in dbrg-1 should look like this:
127.0.0.0 localhost localhost
202.197.18.72 dbrg-1 dbrg-1
202.197.18.73 dbrg-2 dbrg-2
202.197.18.74 dbrg-3 dbrg-3
The/etc/hosts file in dbrg-2 should look like this:
127.0.0.0 localhost localhost
202.197.18.72 dbrg-1 dbrg-1
202.197.18.73 dbrg-2 dbrg-2
As mentioned in the previous study note, for Hadoop, in HDFs's view, nodes are divided into Namenode and Datanode, where Namenode only one, Datanode can be a lot; in MapReduce's view, Nodes are divided into Jobtracker and Tasktracker, of which jobtracker only one, Tasktracker can be a lot.
I was deploying Namenode and Jobtracker on Dbrg-1, dbrg-2,dbrg-3 as Datanode and Tasktracker. Of course, you can also deploy Namenode,datanode,jobtracker,tasktracker to a single machine.
Directory structure
Because Hadoop requires that the deployment directory structure of Hadoop on all machines be the same, and that all have an account with the same user name.
My three machines are like this: there is a DBRG account, the main directory is/HOME/DBRG
The Hadoop deployment directory structure is as follows:/home/dbrg/hadoopinstall, all versions of Hadoop are placed in this directory.
Unzip the hadoop0.12.0 compression pack into Hadoopinstall, and to facilitate later upgrades, it is recommended that you create a link to the version of Hadoop you want to use, as a Hadoop
[Dbrg@dbrg-1:hadoopinstall] $ln-S hadoop0.12.0 Hadoop
As a result, all of the configuration files are in the/hadoop/conf/directory, and all the execution programs are in the/hadoop/bin directory.
However, because the configuration files of Hadoop in the above directory are together with the installation directory of Hadoop, it is recommended that the configuration file be separated from the installation directory once all profiles are overwritten when the Hadoop version is later upgraded. A better approach would be to create a directory of configuration files,/home/dbrg/hadoopinstall/hadoop-config/, and then/hadoop/conf/the HADOOP_ in the directory Site.xml,slaves,hadoop_env.sh three files to the hadoop-config/directory (the question is very strange, on the official web getting started with Hadoop says it is only necessary to copy the three files to the directory that you created, but I found it necessary to copy the Masters file into the hadoop-conf/directory when I was actually configured. Otherwise, when you start HADOOP, you'll get an error saying you can't find Masters this file, and specify that the environment variable $hadoop_conf_dir point to the directory. Environment variables are set in/HOME/DBRG/.BASHRC and/etc/profile.
To sum up, in order to facilitate later upgrades, we need to do the configuration file and installation directory separation, and by setting a point to the version of the Hadoop we want to use the link, which can reduce our maintenance of the configuration file. In the following sections, you will experience the benefits of such separation and links.