Alibabacloud.com offers a wide variety of articles about hadoop cluster capacity planning, easily find your hadoop cluster capacity planning information here online.
During online hadoop cluster O M, hadoop's balance tool is usually used to balance the distribution of file blocks in each datanode in the hadoop cluster, to avoid the high usage of some datanode disks (this problem may also lead to higher CPU usage of the node than other servers ).
1) usage of the
(MR1:
12-24 1-4 TB hard drive (batch disks)
2 quad-/hex-/octo-core CPUs, running at least 2-2.5 GHz
64-512 GB of RAM (memory)
Bind Gigabit Ethernet (Gigabit network, more storage density, requiring higher network input)
Standard configuration specifications for NameNode/JobTracker (MR1) in a Hadoop cluster:
3-6 1 TB hard disks (batch disks)
2 quad-/hex-/octo-core CPUs, running at least 2-2.5 GH
performance will be better. This is why the previous article proposed configuration, the X 1TB disk than the X 3TB disk for a better reason. The space constraints inside the blade server tend to constrain the possibility of adding more hard drives. From here, we are better able to see why Hadoop is so-called running on a standalone commercial server, and its deliberately Share architecture of nothing. Task Independent, Io Independent for
reaches equilibrium, what is equilibrium? The usage of each datanode (percentage of space and space capacity used by the current node), and cluster utilization (percentage of the cluster's used space and cluster space capacity), If the node usage is close to the cluster
Add hard disks to the Hadoop cluster.
Hadoop worker nodes expand hard disk space
After receiving the task from the boss, the hard disk space in the Hadoop cluster is insufficient, and a machine is required to be added to the Hadoop
Hadoop's balance tools are typically used to balance the file block distribution in each datanode in a Hadoop cluster while on-line Hadoop cluster operations. To avoid the problem of a high percentage of datanode disk usage (which is also likely to cause the node to have higher CPU utilization than other servers).
1) u
job pool, and the compute capacity scheduling is to allocate Tasktracker (a node in the cluster) in the queue, which configures multiple queues. Each queue is configured with a minimum amount of tasktracker, similar to a fair scheduling policy, when a queue has idle tasktracker, the scheduler allocates the idle to other queues, and when there are idle Tasktracker, Since there may be multiple queues that do
Sometimes it may be necessary to remove Datanode from the Hadoop cluster because of a temporary adjustment, as follows:
First add the machine name of the node you want to delete in/etc/hadoop/conf/dfs.exclude
In the console page, you see a dead datanodes
To refresh node information using commands:
[HDFS@HMC ~]$ Hadoop
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.