Strategies for delving into the Hadoop node deployment

Source: Internet
Author: User
Tags json

Brief introduction

Each Hadoop cluster includes more than one node. These nodes can be grouped into several groups, such as the master group, the worker group, the client group, and so on. The nodes in each group have different resource requirements, such as CPU, memory, storage.

There is one or more virtual clusters (Cluster) in the Virtual Data Center (DataCenter), one or more hosts (host) in each virtual cluster.

The node deployment is to deploy all nodes in the Hadoop cluster to the physical host according to a certain policy under the precondition of ensuring the resource requirements of each node.

Defining Hadoop clusters

The Hadoop cluster can be defined by the UI or by a file in JSON format. Here is an example of a JSON file to illustrate which parameters are available.

1 {

2 "nodegroups": [

3 {

4 "name": "Master",

5 "Roles": [

6 "Hadoop_namenode",

7 "Hadoop_jobtracker"

8],

9 "Instancenum": 1,

"Instancetype": "LARGE",

One "Cpunum": 2,

"MEMCAPACITYMB": 4096,

"Storage": {

"Type": "SHARED",

"SIZEGB": 20

16},

"Haflag": "On",

"Rpnames": [

"Rp1"

20]

21},

The file fragment above defines the master node group (line 2nd "name"), which includes a node (line 9th "Instancenum"), 2 virtual CPUs per node (11th line "Cpunum"), 4096MB of memory (12th line) Memcapacitymb "), 20GB shared storage space (第13-16 line).

Simple node Deployment strategy

There are a number of strategies for node deployment, and the simplest one is introduced first. Because in general, the disk space that the Hadoop node requires is the most stressful resource, the most basic node deployment strategy is based on the size of the disk space. First, all Hadoop nodes are sorted from large to small according to disk space requirements, and the available host is sorted from large to small according to the remaining disk space. The first step is to deploy the Hadoop node with the largest disk space requirement to the host with the largest remaining disk space, and the second step is to deploy the Hadoop node with the required disk space requirement to the secondary mainframe of the remaining disk space, and so on. Of course, if the rest of the host's disk space is not enough to deploy a node, the previous host with large remaining disk space will be considered. The goal of this strategy is to deploy the Hadoop nodes as much as possible with all available hosts so that each node has more resources to use.

Node deployment strategy for data and compute separation

With the development of Hadoop, the way in which Hadoop data nodes (Datanode) and COMPUTE nodes (COMPUTENODE) are deployed separately in order to provide better flexibility and achieve real multi-tenant is starting to get popular. In this mode, in order to provide the best performance, data nodes and compute nodes need to be deployed on the same host as possible. The simple node deployment strategy described earlier does not consider this requirement, and the following bad deployment scenario occurs.

2 data nodes (DN) are placed on one host, while 2 compute nodes (CN) are placed on hosts that do not have data nodes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.