Transfer from http://www.linuxidc.com/Linux/2012-04/58182p3.htm
Objective
Ensuring HDFs high availability is a problem that many technicians have been concerned about since Hadoop was popularized, and many programs can be found through search engines. Coinciding with the Federation of HDFS, this paper summarizes the meanings and differences of Namenode, Secondarynamenode, Backupnode, and the HDFs ha framework. In addition, the article concludes with an introduction to how Hadoop-0.23.0 configures Namenode, Secondarynamenode, and Backupnode.
1. How does the HDFs meta-data server work?
Namenode is formatted before the first start time, $bin/hdfs Namenode-format-clusterid Yourid Generate version as required: For example, I Hadoop0.23.0 in 2 using GB17 as the Namenode, after format will generate the following version in the Dfs.hadoop.name.dir directory
$ cat/opt/jiangbing/hdfs23/current/version
#Thu Dec 16:27:30 CST 2011
namespaceid=1787450988
Clusterid=klose
Ctime=0
Storagetype=name_node
blockpoolid=bp-950324238-10.10.102.17-1322483515854
layoutversion=-38
You can see a blockpoolid and clusteid,version a bit more like Namenode's ID than the previous Hadoop-0.21.0 of HDFs. Every time-format was born on behalf of Namenode.
At the same time as-format, two files are generated for fsimage_** and edits_**. Using $bin/hdfs Namenode will start the namenode in the normal way.
Fsimage: meta-Data snapshot
Editslog: operation of Meta data
When Namenode is started, the state of HDFs is read from the image file Fsimage and the actions recorded in the edits file are applied to the fsimage, that is, merged into Fsimage. After merging, update the HDFS status of Fsimage and empty the edits file to record the changes of the file system. Therefore, during the Namenode boot process, the Namenode is in Safe mode, can only provide read operation, does not provide write operation, and so on, and so on start-up will exit from Safe mode. Therefore, namenode under normal circumstances will not merge edits to Fsimage, as checkpoint. This will cause the system edits files to become larger. This way, when you need to restart Namenode, you will find that the boot process is very long.
2, Secondarynamenode is not namenode hot standby
SNN is primarily a backup of the nn file information and does not back up the data block information. It gets the fsimage and edits files from the nn, merges the fsimage and edits logs into a new fsimage, and then passes the new fsimage to the nn so that the edits in the NN does not always increment, making it easy to restart the NN.
The main work of SNN in the Docheckpoint () method, the checkpoint is started at intervals and is controlled by two configuration parameters:
Fs.checkpoint.period, which specifies the maximum time interval for successive checkpoints, the default value is 1 hours.
Fs.checkpoint.size defines the maximum value of the edits log file, which, once exceeded, causes checkpoints to be enforced even if the maximum time interval to checkpoints is not reached. The default value is 64MB.
3, Backupnode emphasis on real-time synchronization
This node mode is a bit like the master-slave node copy function in MySQL, nn can send logs to bn in real time, and SNN is to download fsimage and edits files at every interval, and bn is to get the operation log in real time, then merge the operation into Fsimage.
Two log stream interfaces are provided in the NN: Editlogoutputstream and Editloginputstream. That is, when the NN has a log, not only will write a copy to the local edits log file, but also to the BN network stream to write a copy, when the stream buffer reaches the threshold, will be written to the BN node, BN received after the merger operation, so as to complete the low-latency log replication function.
4, the difference between bn and SNN
1) different ways to obtain metadata in NN.
NN will update the fsimage and edits to the BN machine with good real-time availability. In this way, after the nn outage, the BN has a basic consistent meta-data. SNN is getting the content of fsimage and edits through HTTP server, so the degree of data completion is affected by the frequency of the fetch file.
2) The checkpoint results are handled differently.
SNN will take the initiative to send the fsimage after checkpoint to NN, and bn only in the local checkpoint, There is a local dfs.name.dir, in fact it is equivalent to a namenode hot standby, but Datanode has not recognized its existence, if you are a programmer, you should know how to do. :-)
5. Some HDFs ha schemes
1) Set RAID for Dfs.name.dir directory.
2) Set up multiple Dfs.name.dir directories, one of which is set to a directory where other nodes are mounted to the machine via NFS. This will write two copies of the same data at the same time. This binds a zookeeper to track Namenode's condition, after the main namenode goes down,
Standbynamenode switch to normal namenode, in order to be simple, standbynamenode start just listen to zookeeper events can, after the error, quickly and normally start a namenode. Datanode after the link expires, to get the updated Namenode URL from zookeeper, this block needs a bit of coding. :-), if we don't move at all, we won't find a job ... is a.
3) In addition are some practices somewhat more complicated. Please refer to:
http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/
Http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
6. Configuration of Snn and bn in Hadoop-0.23.0 HDFs
1) SNN configuration, this is very simple, modify all the Conf/hdfs-site.xml files under the machine:
${hdfs_conf_dir}/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/jiangbing/hdfs23</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1,ns2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>gb17:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>gb17:23001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>gb17:23002</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>gb18:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>gb18:23001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>gb18:23002</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>600</value>
<description>the number of seconds between and periodic checkpoints. </description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>the size of the current edit log (in bytes) that triggers
A periodic checkpoint even if the fs.checkpoint.period hasn ' t expired.
</description>
</property>
</configuration>
If necessary, you can also set the path of the Dfs.name.checkpoint.dir, by default, Hadoop.tmp.dir for the directory to establish the corresponding checkpoint directory.
The rest of the configuration and Hadoop0.23.0 2 the same content, using sbin/start-dfs.sh boot.
2) sbin/start-dfs.sh does not read the Backupnode information from the configuration file, and then starts it. Therefore, at the point where Dfs is set up, a separate configuration is required on a node. Note that when setting up Backupnode, its configuration file conf/hdfs-site.xml a little different, in fact, all the node configuration files are set to Backupnode node is not a problem, because the script does not start backupnode independently.
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/jiangbing/hdfs23/backup</value>
</property>
<property>
<name>dfs.federation.nameservice.id</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.backup.address.ns1</name>
<value>gb22:50100</value>
<description>
The Backup node server address and port.
If the port is 0 then the server would start on a free port.
</description>
</property>
<property>
<name>dfs.namenode.backup.http-address.ns1</name>
<value>gb22:50105</value>
<description>
The Backup node HTTP server address and port.
If the port is 0 then the server would start on a free port.
</description>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1,ns2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>gb17:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>gb17:23001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>gb18:23002</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>gb18:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>gb18:23001</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>60</value>
<description>the number of seconds between, periodic checkpoints.</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>the size of the current edit log (in bytes) that triggers
A periodic checkpoint even if the fs.checkpoint.period hasn ' t expired.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>gb17:23002</value>
</property>
</configuration>
Here set Gb22 as the Backupnode of NS1 's Namenode, Note here Backupnode and Namenode Common Namenode.dir directory, this can also think Backupnode is Namenode's twin brother, this is not responsible for the brother outside of what (Datanode RPC link).
Start Backupnode on Gb22, $bin/hdfs namenode-backup
By $bin/hadoop Fs-ls hdfs://gb22:50100 can be obtained with $bin/hadoop Fs-ls hdfs://gb17:9000
The same content, the metadata content was completed in Gb17,gb22. So after the gb17 down, just stop gb22 Backupnode, reconfigure the location of the ns1 so that the service can continue. In fact, this is a program without any code development. But you need to pause the service. Can you accept this ha? :-)
HDFS Nn,snn,bn and Ha