An algorithm to drill down on a Hadoop disk deployment
Source: Internet
Author: User
Keywordsdisk algorithm depth different
There are different types of nodes in a http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster, and they have different requirements for disk." The primary (master) node focuses on storage reliability, and data nodes require better read and write performance and larger capacity.
in a virtual cluster, storage (datastore) can be divided into two types: local and shared. Local storage can only be accessed by virtual machines on the host on which it resides, while shared storage is accessible by virtual machines on other hosts. Local storage has better read and write performance, and shared storage is more reliable.
The
disk deployment algorithm provides the optimal storage scheme for different types of Hadoop nodes based on user input.
first look at the master node in the Hadoop cluster. Shared storage is necessary because the primary node requires higher reliability and is typically configured with vsphere high-availability (high availability) and fault tolerant (Fault tolerance) features. The following is a fragment of a JSON-formatted configuration file that shows how to specify the storage for the primary group of nodes.
1 {
2 "nodegroups": [
3 {
4 "name": "Master",
5 "Roles": [
6 "Hadoop_namenode",
7 "Hadoop_jobtracker"
8],
9 "Instancenum": 1,
"Instancetype": "SCM",
"Cpunum": 2,
"MEMCAPACITYMB": 4096,
"Storage": {
"type": "SHARED",
"SIZEGB": 20
16},
"Haflag": "On",
"Rpnames": [
"RP1"
20]
21},
from line 13th is about the configuration of the store, specifying the storage using the shared type (line 14th "type": "Shared") with a size of 20GB. Line 17th "Haflag": "On" Specifies the high Availability (HA) attribute to use vsphere. Serengeti allocates a shared type of storage to the primary node when allocating disk.
below look at the data nodes in the Hadoop cluster. These nodes have a large amount of disk read and write operations, and depending on the type of storage available to the user, the system uses a different disk deployment algorithm. For more information please continue to read the next page highlights: http://www.linuxidc.com/Linux/2013-10/92032p2.htm
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.