Hadoop operation emergency solution

Source: Internet
Author: User

 

Introduction:
During this time, hadoop and Lucene were involved. I have summarized the solutions for hadoop problems during operation. Please advise!
Emergency solutions for HDFS (0.20.2) Operation 1
Namenode disconnection (secondarynamenode is not affected)

If namenode fails, if it can get up immediately, the start-dfs.sh can be used again. Otherwise, follow the steps below. The complete secondarynamenode is provided for all the following operations.

 

1)
On a non-secondarynamenode server, select datanode as the namenode. (Currently not found in the official documentation. The second type is recommended, but no problems are found in the test)

 

A)
Kill all services.

B)
Modify the new namenode server configuration file: core-site.xml, masters, slaves and other related files.

C)
Modify the hosts file

D)
Reconfigure SSH for each node so that the new namenode can log on to another datanode normally without a password.

E)
Copy hadoop. tmp. DIR/dfs/namesecondary on the machine running secondarynamedode to the hadoop. tmp. DIR/DFS directory on the new namenode server.

F)
Rename namesecondary to name

G)
Bin/start-dfs.sh start HDFS.

2)
On a non-secondarynamenode server, select datanode as the namenode. Restore the namenode by importing the previous checkpoint.

 

A)
Kill all services.

B)
Modify the new namenode server configuration file: core-site.xml, masters, slaves and other related files.

C)
Modify the hosts file

D)
Reconfigure SSH for each node so that the new namenode can log on to another datanode normally without a password.

E)
Configure fs. Checkpoint. dir in the namenode server core-site.xml (default is in $ hadoop. tmp. DIR/dfs/namesecondary ).

<Property>

<Name> fs. Checkpoint. dir </Name>

 
<Value>/home/hadoop-data/dfs/namesecondary </value>

</Property>

F)
Copy hadoop. tmp. DIR/dfs/namesecondary on the machine running secondarynamedode to the fs. Checkpoint. dir directory of the namenode server.

G)
Run bin/hadoop namenode-importcheckpoint to import the checkpoint.

H)
Run the bin/start-dfs.sh to start DFS.

 

 

 

 

2
Datanode disconnection (without secondarynamenode)

1)
The original server is completely damaged and cannot be started. Only New datanode can be introduced.


I.
Copy all hadoop configurations from other datanode to the new server


II.
Set hosts and set hosts for all datanodes and namenode


III.
Set SSH password-less login and Test


IV.
Configure the newly added datanode In the slaves of namenode Conf


V.
Start datanode through the bin/hadoop-daemon.sh on the newly added datanode
Start the new datanode.

2)
The original server can be started immediately.


I.
Because namenode slaves has this datanode, you can directly execute bin/start-dfs.sh startup in namenode


II.
You can also use the bin/hadoop-daemon.sh start
Start datanode

3
Datanode disconnection (with secondarynamenode)

1)
If namenode is running normally, if this datanode can be immediately put into use, execute bin/start-dfs.sh launch directly in namenode

2)
When namenode is running normally, if the datanode cannot be used, consider adding a datanode and configuring secondarynamenode.

On the new node
Configuration in the profile hdfs-site.xml:

<Property>

<Name> DFS. http. address </Name>


<Value> Netease-namenode-test: 50070 </value>

</Property>

Use the default configuration in namenode. If you access Netease-namenode-test: 50070 over the Internet, the access may fail due to different network segments.

 

Enable secondarynamenode to post requests to namenode.

Add a new secondarynamenode in namenode masters and configure hosts.

Start with the bin/start-dfs.sh.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.