Strategies for delving into the Hadoop node deployment
Source: Internet
Author: User
KeywordsDisk space each deep
Each http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster includes more than one node." These nodes can be grouped into several groups, such as the master group, the worker group, the client group, and so on. The nodes in each group have different resource requirements, such as CPU, memory, storage.
has one or more virtual clusters (Cluster) in the Virtual Data Center (DataCenter), one or more hosts (host) in each virtual cluster.
The
node deployment is to deploy all nodes in the Hadoop cluster to the physical host under a certain policy under the premise of ensuring each node's resource requirements.
defines the Hadoop cluster
The
Hadoop cluster can be defined by the UI or by a file in JSON format. Here is an example of a JSON file to illustrate what parameters are available.
{
"nodegroups": [
{
"name": "Master",
"Roles": [
"Hadoop_namenode",
"Hadoop_jobtracker"
],
"Instancenum": 1,
"Instancetype": "SCM",
"Cpunum": 2,
"MEMCAPACITYMB": 4096,
"Storage": {
"type": "SHARED",
"SIZEGB": 20
},
"Haflag": "On",
"Rpnames": [
"Rp1"
]
},
The file fragment above defines the master node group (line 2nd "name"), which includes a node (line 9th "Instancenum"), requiring 2 virtual CPUs per node (line 11th "Cpunum"), 4096MB of memory (12th line) Memcapacitymb "), 20GB shared storage space (Paras. line).
Simple node Deployment strategy
There are a number of strategies for deploying
nodes, and the simplest one is introduced first. Because in general, the disk space that the Hadoop node requires is the most stressful resource, the most basic node deployment strategy is based on the size of the disk space. First, all Hadoop nodes are sorted from large to small according to disk space requirements, and the available host is sorted from large to small according to the remaining disk space. The first step is to deploy the Hadoop node with the largest disk space requirement to the host with the largest remaining disk space, and the second step is to deploy the Hadoop node with the required disk space requirement to the secondary mainframe of the remaining disk space, and so on. Of course, if the rest of the host's disk space is not enough to deploy a node, the previous host with large remaining disk space will be considered. The goal of this strategy is to deploy the Hadoop nodes as much as possible with all available hosts so that each node has more resources to use.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.