An algorithm to drill down on a Hadoop disk deployment

Source: Internet
Author: User
Keywords disk algorithm depth different
There are different types of nodes in a http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster, and they have different requirements for disk." The primary (master) node focuses on storage reliability, and data nodes require better read and write performance and larger capacity.


in a virtual cluster, storage (datastore) can be divided into two types: local and shared. Local storage can only be accessed by virtual machines on the host on which it resides, while shared storage is accessible by virtual machines on other hosts. Local storage has better read and write performance, and shared storage is more reliable.

The
disk deployment algorithm provides the optimal storage scheme for different types of Hadoop nodes based on user input.


first look at the master node in the Hadoop cluster. Shared storage is necessary because the primary node requires higher reliability and is typically configured with vsphere high-availability (high availability) and fault tolerant (Fault tolerance) features. The following is a fragment of a JSON-formatted configuration file that shows how to specify the storage for the primary group of nodes.


1 {


2 "nodegroups": [


3 {


4 "name": "Master",


5 "Roles": [


6 "Hadoop_namenode",


7 "Hadoop_jobtracker"


8],


9 "Instancenum": 1,


"Instancetype": "SCM",


"Cpunum": 2,


"MEMCAPACITYMB": 4096,


"Storage": {


"type": "SHARED",


"SIZEGB": 20


16},


"Haflag": "On",


"Rpnames": [


"RP1"


20]


21},


from line 13th is about the configuration of the store, specifying the storage using the shared type (line 14th "type": "Shared") with a size of 20GB. Line 17th "Haflag": "On" Specifies the high Availability (HA) attribute to use vsphere. Serengeti allocates a shared type of storage to the primary node when allocating disk.


below look at the data nodes in the Hadoop cluster. These nodes have a large amount of disk read and write operations, and depending on the type of storage available to the user, the system uses a different disk deployment algorithm. For more information please continue to read the next page highlights: http://www.linuxidc.com/Linux/2013-10/92032p2.htm
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.