A question to know: Can the HBase region server and Hadoop Datanode be deployed on a single server? If so, is it a one-to-one relationship?
Deployed on the same server, you can reduce the amount of traffic that data travels across the network. But not a pair of relationships, first, the data also save N in HDFs, the default is three points, that is, the data will be distributed on three datanode, even if the Regionserver only save a region, it can also interact with three datanode, not to mention, A single regionserver can hold multiple region.
Reference: https://www.zhihu.com/question/20376001/answer/15602027
Then take a look at the dynamic increase and dynamic deletion of nodes
1, HDFs increased Datanode
1> prepare the operating system of the new node, install the required software, implement SSH login without password
2> configuration files for each node need to be changed
$HBASE _home/conf/regionservers
$HADOOP _home/etc/hadoop/slaves
/etc/hosts
3> execute the following command on the new node
Hadoop-daemon. SH start Datanodeyarn-daemon. sh start NodeManager
4> Refresh
Yarn Rmadmin---report
5> Set the bandwidth, configure the Equalizer balancer, generally do not run on the primary node to avoid affecting the business, can have a dedicated balancer node
1048576 # If a Datanode disk is 5 higher than the average, blocks transmits the start-balancer to other datanode below the average level. SH 5
Description
Over time, the block distribution on each datanode becomes increasingly unbalanced, which reduces Mr Locality and causes some datanode to be relatively busier.
Balancer is a Hadoop daemon that moves blocks from busy datanode to relatively idle datanode while sticking to the block replica placement strategy, spreading the replicas to different machines, racks.
The balancer will cause the usage of each datanode to be close to the overall cluster usage, which is specified by the-threashold parameter and is 10% by default.
The bandwidth for replicating data between different nodes is limited, by default 1mb/s, which can be specified by the Dfs.balance.bandwithPerSec attribute in the Hdfs-site.xml file (in bytes).
It is recommended to perform the equalizer regularly, such as daily or weekly.
2. HDFs Delete Datanode
1> $HADOOP The _home/etc/hadoop/excludes file, add the hostname you want to delete, usually on the main node where the command is executed.
2> Refresh
Yarn Rmadmin--refreshnodes
3> the following configuration files after deletion,
$HBASE _home/conf/regionservers
$HADOOP _home/etc/hadoop/slaves
/etc/hosts
Note: About the Hadoop cluster Delete data node has been in the decommission in progress State issue
In a small cluster (for example, 3 machines), if the amount of datanode data is less than the number of backup settings for the file (by default, 3), it is possible that the data node is in the decommission in progress state.
This is an unhandled issue with Hadoop because large clusters generally do not have this situation, that is, the amount of datanode data is less than the number of file backup settings
The solution is to try to set the number of file backups to 1 or 2, and then try to exclude off one of the three Datanode
Here are the commands to modify the number of replicas you have, which are generally not recommended and should be optimized in the configuration file in advance to avoid such problems
HDFs dfs-setrep-w 2-r/File
3. HBase Add regionserver Node
1> execute the following command to start Regionserver
Hbase-daemon. sh start regionserver
2> on the newly-started node
Open the HBase shell with the following settings:
True
4. HBase Delete Regionserver node
1> Execute command
Graceful_stop. SH data1
2> because the balancer of HBase is closed, it needs to be on other regionserver nodes
Open HBase Shell to check hbase status
Reset at the same time:
True
In addition, note the order of execution, if a datanodet at the same time as Regionserver, first delete regionserver, then delete Datanode, conversely, the new node, first set to Datanode, and then set to Regionserver
。
HDFs and HBase dynamically add and remove nodes