The overview of the Hadoop cluster deployment model

Source: Internet
Author: User
Keywords DFS can name cluster deployment run
Vsphere Big Data Extensions (BDE) supports multiple deployments to build Hadoop clusters. Press:


Storage/COMPUTE binding model: Deploy Storage nodes (Data node) and compute nodes (Task Tracker) in the same virtual machine. This is the most straightforward and simple deployment model that can be used to validate concepts and carry out data processing tasks for small-scale clusters.


a single COMPUTE model: a cluster type that deploys only compute nodes (Job Tracker and Task Tracker).


Storage/COMPUTE separation model: The storage node (data node) and compute node (Task Tracker) are deployed in different virtual machines, and according to specific business requirements, the cluster is determined by the corresponding distribution algorithm topology on the VSPHEREESX physical host.


Customized cluster: Users can customize the deployment structure, resource model and configuration parameters of the cluster according to the specific business requirements.


This article we will focus on the first 2 deployment models, the storage/compute binding model, and a single computing model.


Storage and compute node binding model (Data-compute Combined Deployment)


Typically, we deploy storage (DataNode) and compute (tasktracker) services to the same node in order to ensure the needs of the Hadoop cluster data localization (Locality). The Big Data extensions also provides such a deployment pattern.


You can use the BDE command-line tool to create a basic default Hadoop cluster by running the cluster create–name cluster_name command.


Such clusters will include a master node (master), Namenode and Jobtracker running the Apache Hadoop 1.2.1 (BDE 1.0 built-in default Hadoop release), three work nodes (worker), Run Datanode and Tasktracker; a client node (Clientnode), run a Hadoop client, pig, and hive.


Here the Datanode and Tasktracker are built within the same virtual machine node, which is the storage and compute node binding model.


Single compute node model (compute-onlydeployment)


If you have a HDFs in your production or development environment, and there are terabytes of analytical data in it, the Business Analytics team develops new features to explore new patterns based on new business requirements, where you can build a single compute node cluster (compute-onlycluster).


a single compute node cluster refers to deploying only MapReduce services, including Jobtracker and Tasktracker, and linking to an existing HDFs. There are many advantages to doing so, first of all, we can avoid large-scale data copy or migration after the complete cluster, reduce the waiting time of the development environment, immediately deploy immediate use, it is very suitable for the temporary development test environment, secondly, it can also achieve performance isolation, security isolation and fault isolation among different computing clusters. In the case of compatibility, you can also use a Third-party commercial version of HDFs such as Isilon, and so on.


for a single compute node cluster, you can also use dynamic scaling (auto-elasiticity) to dynamically deploy your resources.


the instance definition file for this cluster deployment is listed below, which you can use to create compute-onlycluster. The Externalhdfs field defines the existing HDFS service to be used. Please assign the Hadoop_jobtracker role to the Master node group and assign the Hadoop_tasktracker role to the Worker node group. For the HDFS cluster specified by Externalhdfs, set Port_num to 8020 by default. For Hadoop2.0 clusters, such as CDH4 or PIVOTALHD, set Port_num to 9000 by default. In the cluster definition file, the Externalhdfs field and the Hadoop_namenode,hadoop_datanode role cannot exist at the same time, or they may cause the cluster to fail or the cluster created will not function correctly.


  {


"Externalhdfs": "Hdfs://<hostname-of-namenode>:<port_num>",


"nodegroups": [


  {


"name": "Master",


"Roles": [


"Hadoop_jobtracker"


  ],


"Instancenum": 1,


"Cpunum": 2,


"MEMCAPACITYMB": 7500,


  },


  {


"name": "Worker",


"Roles": [


"Hadoop_tasktracker",


  ],


"Instancenum": 4,


"Cpunum": 2,


"MEMCAPACITYMB": 7500,


"Storage": {


"type": "Local",


"SIZEGB": 20


  },


  },


  {


"name": "Client",


"Roles": [


"Hadoop_client",


"Hive",


"Pig"


  ],


"Instancenum": 1,


"Cpunum": 1,


"Storage": {


"type": "Local",


"SIZEGB": 50


  },


  }


  ]


  }
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.