first, the purpose of the experiment
1. There is only one namenode for the existing Hadoop cluster, and a namenode is now being added.
2. Two namenode constitute the HDFs Federation.
3. Do not restart the existing cluster without affecting data access.
second, the experimental environment
4 CentOS Release 6.4 Virtual machines with IP address
192.168.56.101 Master
192.168.56.102 slave1
192.168.56.103 Slave2
192.168.56.104 Kettle
One of the kettle is a new "clean" machine that has been configured with password-free SSH and will be added as a namenode.
Software version:
Hadoop 2.7.2
HBase 1.1.4
Hive 2.0.0
Spark 1.5.0
Zookeeper 3.4.8
Kylin 1.5.1
Existing configuration:
Master as the Namenode, Secondarynamenode, resourcemanager,hbase of Hadoop Hmaster
Slave1, slave2 as the datanode of Hadoop, Nodemanager,hbase Hregionserver
At the same time master, Slave1, slave2 as three zookeeper servers
third, the configuration steps
1. Edit the Hdfs-site.xml file on master and the contents of the modified file are as follows.
<?xml version= "1.0" encoding= "UTF-8"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>< Configuration><property><name>dfs.namenode.name.dir</name><value>file:/home/grid/ hadoop-2.7.2/hdfs/name</value></property><property><name>dfs.datanode.data.dir</ name><value>file:/home/grid/hadoop-2.7.2/hdfs/data</value></property><property>< Name>dfs.replication</name><value>1</value></property><property><name> dfs.webhdfs.enabled</name><value>true</value></property><!--New Properties--><property > <name>dfs.nameservices</name> <value>ns1,ns2</value></property><property > <name>dfs.namenode.rpc-address.ns1</name> <value>master:9000</value></property ><property> <name>dfs.namenode.http-address.ns1</name> <value>master:50070</value></property><property> <name>dfs.namenode.secondary.http-address.ns1</name> <value>master:9001</value></property><property> <name> Dfs.namenode.rpc-address.ns2</name> <value>kettle:9000</value></property><property > <name>dfs.namenode.http-address.ns2</name> <value>kettle:50070</value></property ><property> <name>dfs.namenode.secondary.http-address.ns2</name> <value>kettle:9001 </value></property></configuration>
2. Copy the Hdfs-site.xml file on master to the other nodes on the cluster
SCP hdfs-site.xml SLAVE1:/HOME/GRID/HADOOP-2.7.2/ETC/HADOOP/SCP Hdfs-site.xml slave2:/home/grid/hadoop-2.7.2/etc/ hadoop/
3. Copy the Java directory, Hadoop directory, environment variable files from master to Kettle
scp-rp/home/grid/hadoop-2.7.2 kettle:/home/grid/scp-rp/home/grid/jdk1.7.0_75 kettle:/home/grid/# Execute scp-p/etc with Root /profile.d/* kettle:/etc/profile.d/
4. Start a new Namenode, Secondarynamenode
# Execute Source/etc/profileln-s hadoop-2.7.2 hadoop$hadoop_home/sbin/hadoop-daemon.sh start namenode$hadoop_home on kettle /sbin/hadoop-daemon.sh Start Secondarynamenode
The Namenode, Secondarynamenode process is started after execution, as shown in 1.
Figure 1
5. Refresh Datanode Collection of newly added Namenode
# Execute on any machine in the cluster can be $hadoop_home/bin/hdfs dfsadmin-refreshnamenodes Slave1:50020$hadoop_home/bin/hdfs dfsadmin- Refreshnamenodes slave2:50020
At this point, the HDFS Federation configuration is complete, viewing the status of two namenode from the web as shown in Figure 3, respectively 2.
Figure 2
Figure 3
Iv. Testing
# Upload a text file to HDFs Hadoop dfs-put/home/grid/hadoop/notice.txt/# run Hadoop on both Namenode nodes # # on Master, execute Hadoop jar/ Home/grid/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount/notice.txt/output# Perform Hadoop jar/home/grid/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar on Kettle wordcount /notice.txt/output1
Use the following command to view the two output results, as shown in Figure 5, respectively, 4.
Hadoop Dfs-cat/output/part-r-00000hadoop dfs-cat/output1/part-r-00000
Figure 4
Figure 5
Reference:
Http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/Federation.html
Configuring HDFs Federation for a Hadoop cluster that already exists