Separation experiment of namenode and secondarynamenode Based on Hadoop0.20.2
When configuring a Hadoop cluster, we often store namenode and secondarynamenode on one node. This is actually very dangerous. If this node crashes, the entire cluster cannot be recovered. The following describes how to separate namenode from secondarynamenode. Of course, there are still many shortcomings and issues to be improved. You are welcome to give your advice.
Note: I originally thought that the content (host name) in the masters configuration file refers to the namenode host name, but it actually refers to secondarynamenode, the slavers configuration file indicates all nodes that run datanode and tasktracker (generally the same node. And these two files only run namenode and jobtracker (generally both on namenode nodes namenode by core-site.xml fs. default. name specified, jobtracker by mapred-site.xml mapred. job. the node specified by tracker is used, so other nodes can not be configured.
Do not forget to modify the content in the masters file of the namenode node.
Back to context (this experiment is based on the environment created by the cluster in this article)
1. Clone the node where namenode is located, that is, create a new node, including file configuration under the conf directory.
All files, directory structures, and environment variables must be the same. You can refer to add a new node to the cluster. The related configurations are as follows:
Host Name secondary
IP address 192.168.5.16
Hosts file:
192.168.5.13 namenode
192.168.5.16 secondary
SSH password-free Login
Concerning the hosts file and ssh, I think secondarynamenode only communicates with namenode, so you only need to establish a password-free connection with the namenode node, and the content of the hosts file can only write information about the namenode node and itself, note that the hosts file in the namenode node must also add the information of the secondarynamenode node.
2 file configuration
(1) modify the hdfs-site.xml file in the namenode node:
<Property>
<Name> dfs. secondary. http. address </name>
<Value> 192.168.5.16: 50090 </value>
<Description> NameNode get the newest fsimage via dfs. secondary. http. address </description>
</Property>
In the masters file, modify it to secondary.
(2) modify the hdfs-site.xml file in the secondarynamenodenamenode node:
<Property>
<Name> dfs. http. address </name>
<Value> 192.168.5.13: 50070 </value>
<Description> Secondary get fsimage and edits via dfs. http. address </description>
</Property>
Modify core-site.xml files
<Property>
<Name> fs. checkpoint. period </name>
<Value> 3600 </value>
<Description> The number of seconds between two periodic checkpoints. </description>
</Property>
<Property>
<Name> fs. checkpoint. size </name>
<Value> 67108864 </value>
</Property>
<Property>
<Name> fs. checkpoint. dir </name>
<Value>/home/zhang/hadoop0202/secondaryname </value>
</Property>
Fs. checkpoint. period and fs. checkpoint. size is a condition for SecondaryNameNode nodes to start backup. when either of the two conditions is met, the SecondaryNameNode node will start backup. The first one is set to the interval (one hour by default) fs. checkpoint. the time (in seconds) set by period, and the second is the size of the operation log file up to fs. checkpoint. size.
3. Restart hadoop or run the command directly on secondary.
Hadoop-daemon.sh start secondarynamenode command to start secondaryNamenode
After restarting, we can see
In namenode, there is no SecondaryNameNode Java Process (Sorry, I forgot to detach it. There is indeed a SecondaryNameNode Java Process on the namenode node before detach)
The Java Process of SecondaryNameNode appears on the secondary node.
Verify that there is an image file in the secondaryname directory on the secondary node (since fs in the setup core-siet.xml file. checkpoint. the period parameter is 3600, representing an hour. We need to modify the parameters for the experiment effect. For the modification effect, refer to the article "how to control the occurrence frequency of namenode checkpoints)
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)