HDFs and HBase dynamically add and remove nodes

Source: Internet
Author: User
Tags hdfs dfs

A question to know: Can the HBase region server and Hadoop Datanode be deployed on a single server? If so, is it a one-to-one relationship?
Deployed on the same server, you can reduce the amount of traffic that data travels across the network. But not a pair of relationships, first, the data also save N in HDFs, the default is three points, that is, the data will be distributed on three datanode, even if the Regionserver only save a region, it can also interact with three datanode, not to mention, A single regionserver can hold multiple region.

Reference: https://www.zhihu.com/question/20376001/answer/15602027

Then take a look at the dynamic increase and dynamic deletion of nodes

1, HDFs increased Datanode
1> prepare the operating system of the new node, install the required software, implement SSH login without password
2> configuration files for each node need to be changed
$HBASE _home/conf/regionservers
$HADOOP _home/etc/hadoop/slaves
/etc/hosts
3> execute the following command on the new node

Hadoop-daemon. SH start Datanodeyarn-daemon. sh start NodeManager

4> Refresh

Yarn Rmadmin---report

5> Set the bandwidth, configure the Equalizer balancer, generally do not run on the primary node to avoid affecting the business, can have a dedicated balancer node

1048576 # If a Datanode disk is 5 higher than the average, blocks transmits the start-balancer to other datanode below the average level. SH 5

Description
Over time, the block distribution on each datanode becomes increasingly unbalanced, which reduces Mr Locality and causes some datanode to be relatively busier.
Balancer is a Hadoop daemon that moves blocks from busy datanode to relatively idle datanode while sticking to the block replica placement strategy, spreading the replicas to different machines, racks.
The balancer will cause the usage of each datanode to be close to the overall cluster usage, which is specified by the-threashold parameter and is 10% by default.
The bandwidth for replicating data between different nodes is limited, by default 1mb/s, which can be specified by the Dfs.balance.bandwithPerSec attribute in the Hdfs-site.xml file (in bytes).
It is recommended to perform the equalizer regularly, such as daily or weekly.

2. HDFs Delete Datanode

1> $HADOOP The _home/etc/hadoop/excludes file, add the hostname you want to delete, usually on the main node where the command is executed.
2> Refresh

Yarn Rmadmin--refreshnodes

3> the following configuration files after deletion,
$HBASE _home/conf/regionservers
$HADOOP _home/etc/hadoop/slaves
/etc/hosts

Note: About the Hadoop cluster Delete data node has been in the decommission in progress State issue
In a small cluster (for example, 3 machines), if the amount of datanode data is less than the number of backup settings for the file (by default, 3), it is possible that the data node is in the decommission in progress state.
This is an unhandled issue with Hadoop because large clusters generally do not have this situation, that is, the amount of datanode data is less than the number of file backup settings
The solution is to try to set the number of file backups to 1 or 2, and then try to exclude off one of the three Datanode
Here are the commands to modify the number of replicas you have, which are generally not recommended and should be optimized in the configuration file in advance to avoid such problems
HDFs dfs-setrep-w 2-r/File

3. HBase Add regionserver Node

1> execute the following command to start Regionserver

Hbase-daemon. sh start regionserver

2> on the newly-started node
Open the HBase shell with the following settings:

True

4. HBase Delete Regionserver node

1> Execute command

Graceful_stop. SH data1

2> because the balancer of HBase is closed, it needs to be on other regionserver nodes

Open HBase Shell to check hbase status
Reset at the same time:

True

In addition, note the order of execution, if a datanodet at the same time as Regionserver, first delete regionserver, then delete Datanode, conversely, the new node, first set to Datanode, and then set to Regionserver



HDFs and HBase dynamically add and remove nodes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.