Hadoop cluster across the engine room

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Engine room Nbsp; name this

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is from the Ali technology Carnival is a share, because in Baidu also considered similar things, so listen to more sentiment, here the relevant content to tidy up.
First respect the copyright, or paste the original link and the author:
Http://adc.alibabatech.org/carnival/history/schedule/2013/detail/main/286?video=0
from the Ali Wuwei Engineer's share
first of all, it is necessary to say that the cross room Hadoop may not be a lot of applications, the domestic like bat such a giant may need, but most of the small and medium-sized companies may not need this, perhaps this is a Mara, hehe.
This problem is divided into three paragraphs, the first paragraph is the background of the problem, the second paragraph is to solve the problem, the third paragraph is the final solution.
(a) background:
To see why the need to do a large cluster across the engine room?
The advantage of a large cluster is that it is easy to manage and authorize data (a problem that is important in a large, multisectoral company) and that it is easy to use data across departments without having to repeatedly pull data.
when the cluster reaches a certain scale, the single room (the capacity of the room is limited) has been unable to meet the needs of the cluster, to solve the problem once and for all, the need to build a cross-room Hadoop cluster.
(ii) Technical challenges:
2.1 namenode Performance problem:
When managing a huge Hadoop cluster, because the original Namenode is a single node, it becomes a performance bottleneck, The performance problems encountered are mainly in two aspects: storage capacity problem (storage metadata) and calculation pressure (processing RPC request, changing the memory tree need global lock) problem. The
where storage capacity issues can depend on the vertical expansion of memory to resolve, but computing pressure is hard to solve by lifting hardware (because the main direction of the manufacturer is multi-core, rather than improving the frequency)
2.2 Room Network restrictions:
The network between the room is always a hardware condition limit, Cross-room network transmission has brought data delay and bandwidth constraints:
1, the delay is generally within 10ms, and Hadoop most of the running off-line operations, the basic acceptable
2, bandwidth limitations of the problem is relatively large, because the single room point to point bandwidth is generally in 1Gbps, And the room between the bandwidth is indeed around 20Mbps, very limited.
Management between resource groups
each department can be viewed as a resource group that may use each other's data, so it is important to plan the location of calculations and storage, otherwise there will be a large number of copies of data between multiple rooms.
(iii) solution:
First look at the entire architecture of Hadoop across the cluster:

The

focuses on three points, which correspond to the above three questions:
1, where you can see that there are two nn (namenode), which actually belong to a Hadoop cluster, a solution in the industry: HDFS Fedaration, In order to solve the problem of metadata node performance,
2, you can see there is a cross node node, which is used to synchronize data between two rooms, it is designed to take into account the network constraints between the room;
3, finally Groupa, GROUPB, This is to solve the data output side and the consumer relationship.
3.1 Federation
Federation related information see:
Http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs /federation.html#hdfs_federation

More than one independent namenode is used to extend namenode,federation horizontally. They do not need to communicate with each other, and each datenode needs to register and send information to all namenode. The
Blockpool is part of a namenode block set, and each blockpool is independent of each other.
in Federation, there is a need to focus on the issue of how multiple Namenode addresses to users transparent? The solution it uses is a directory hanging (the community has a viewfs to solve the problem): A friend who is familiar with Linux or NFS should be aware of the concept of mount, which is the meaning of catalog hanging.
However, there is also a problem with the use of directory hanging, that is, the storage resources under each subdirectory need human intervention management, not serious uneven. The network restrictions for
3.2 Crossnode
Rooms require no large-scale, lengthy copies of data, and a process dedicated to managing the data copies of the machine rooms, called Crossnode. It is a node that is deployed independently, and the metadata node is detached.
The functionality that it can provide generally includes the following three points:
a data copy in accordance with the preset cross room file (
B) Processing real-time data copy request
C) for data flow control across the engine room
How do I know the list of cross room files?
Because offline tasks are essentially timed triggers, you can form a cross-room file list
based on an analysis of the history job 3.3 the management of a resource group
there is a dependency on data between individual resource groups, and we want to manage it through resource groups, can achieve most of the tasks in the room output data, only a small number of data in the engine room output, most of the task read the computer room data copy, only a small number of cross room reading data.
to identify the data dependencies between resource groups, define the concept of distance between a resource group: the more data A resource group accesses to another resource group, the closer the distance between the two, the closer the resource group should be placed in the same room.
to make the calculations and outputs as close as possible, use a mrproxy to handle different types of tasks differently:
A Offline Computing: Data in the cross-room list is being transferred (DC1->DC2), DC2 Job is suspended, waiting for transmission to complete
B) &NBSp; HOC query: DC2 job needs to read the data on the DC1, job suspend Dispatch, notify Crossnode, after the data transfer continue to schedule
C) Special situation: across the room data join,dc1 large table, DC2 small table, Job dispatch to DC1, across the room directly read DC2 data, do not have to wait for

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More