Introduction:
During this time, hadoop and Lucene were involved. I have summarized the solutions for hadoop problems during operation. Please advise!
Emergency solutions for HDFS (0.20.2) Operation 1
Namenode disconnection (secondarynamenode is not affected)
If namenode fails, if it can get up immediately, the start-dfs.sh can be used again. Otherwise, follow the steps below. The complete secondarynamenode is provided for all the following operations.
1)
On a non-secondarynamenode server, select datanode as the namenode. (Currently not found in the official documentation. The second type is recommended, but no problems are found in the test)
A)
Kill all services.
B)
Modify the new namenode server configuration file: core-site.xml, masters, slaves and other related files.
C)
Modify the hosts file
D)
Reconfigure SSH for each node so that the new namenode can log on to another datanode normally without a password.
E)
Copy hadoop. tmp. DIR/dfs/namesecondary on the machine running secondarynamedode to the hadoop. tmp. DIR/DFS directory on the new namenode server.
F)
Rename namesecondary to name
G)
Bin/start-dfs.sh start HDFS.
2)
On a non-secondarynamenode server, select datanode as the namenode. Restore the namenode by importing the previous checkpoint.
A)
Kill all services.
B)
Modify the new namenode server configuration file: core-site.xml, masters, slaves and other related files.
C)
Modify the hosts file
D)
Reconfigure SSH for each node so that the new namenode can log on to another datanode normally without a password.
E)
Configure fs. Checkpoint. dir in the namenode server core-site.xml (default is in $ hadoop. tmp. DIR/dfs/namesecondary ).
<Property>
<Name> fs. Checkpoint. dir </Name>
<Value>/home/hadoop-data/dfs/namesecondary </value>
</Property>
F)
Copy hadoop. tmp. DIR/dfs/namesecondary on the machine running secondarynamedode to the fs. Checkpoint. dir directory of the namenode server.
G)
Run bin/hadoop namenode-importcheckpoint to import the checkpoint.
H)
Run the bin/start-dfs.sh to start DFS.
2
Datanode disconnection (without secondarynamenode)
1)
The original server is completely damaged and cannot be started. Only New datanode can be introduced.
I.
Copy all hadoop configurations from other datanode to the new server
II.
Set hosts and set hosts for all datanodes and namenode
III.
Set SSH password-less login and Test
IV.
Configure the newly added datanode In the slaves of namenode Conf
V.
Start datanode through the bin/hadoop-daemon.sh on the newly added datanode
Start the new datanode.
2)
The original server can be started immediately.
I.
Because namenode slaves has this datanode, you can directly execute bin/start-dfs.sh startup in namenode
II.
You can also use the bin/hadoop-daemon.sh start
Start datanode
3
Datanode disconnection (with secondarynamenode)
1)
If namenode is running normally, if this datanode can be immediately put into use, execute bin/start-dfs.sh launch directly in namenode
2)
When namenode is running normally, if the datanode cannot be used, consider adding a datanode and configuring secondarynamenode.
On the new node
Configuration in the profile hdfs-site.xml:
<Property>
<Name> DFS. http. address </Name>
<Value> Netease-namenode-test: 50070 </value>
</Property>
Use the default configuration in namenode. If you access Netease-namenode-test: 50070 over the Internet, the access may fail due to different network segments.
Enable secondarynamenode to post requests to namenode.
Add a new secondarynamenode in namenode masters and configure hosts.
Start with the bin/start-dfs.sh.