HDFs Rack-aware function principle (rack awareness)

Last Update:2016-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: HTTP://WWW.JIANSHU.COM/P/372D25352D3A

HDFs Namenode is responsible for everything related to File block replication, which periodically receives heartbeat and blockreport information from Datanode, and the placement of the HDFs file block copy is critical to the overall reliability and performance of the system.
A simple but non-optimized copy placement strategy is to put copies in different racks, or even different IDC. This prevents errors caused by the entire rack, or even the entire IDC crash, but it must be transferred between multiple racks and even between IDC, increasing the cost of copy writing.
In the default configuration, the number of replicas is 3, the usual strategy is: the first copy is placed in the same frame as the client node (if the client is not in the cluster scope, the first node is randomly selected not too full or not too busy node) The second copy is placed on node in a different rack than the first node, and the third copy is placed in a different node from the second node's rack.
The Copy placement strategy for Hadoop makes a good balance between reliability (replicas are in different racks) and bandwidth (just across a rack).
However, how does HDFs know the network topology of each datanode? Its rack-aware functionality requires an executable file (or script) defined by the Topology.script.file.name property, which provides translations of the NODEIP corresponding rackid. If Topology.script.file.name is not set, then each IP will be translated into/default-rack.

By default, Hadoop rack awareness is not enabled and requires an option to be configured in the Hadoop-site.xml of the Namenode machine, for example:

<property>      <name>topology.script.file.name</name>    <value>/path/to/script</ Value></property>

The value of this configuration option is specified as an executable program, typically a script that accepts a parameter and outputs a value. The accepted parameter is typically the IP address of the Datanode machine, and the output value is usually the rackid of the datanode that corresponds to the IP address, such as "/rack1". When Namenode starts, it determines whether the configuration option is empty and, if not NULL, indicates that a rack-aware configuration has been enabled, at which point Namenode will look for the script based on the configuration and, when receiving heartbeat for each datanode, Pass the IP address of the Datanode as an argument to the script and save the resulting output as the rack that the datanode belongs to in a map of memory.

As for scripting, it is necessary to understand the real network topology and rack information so that the IP address of the machine can be correctly mapped to the appropriate rack. The official Hadoop script: http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

The following are test results when data is uploaded by Hadoop HDFs without configuring rack-aware information and configuring rack-aware information.

When no rack information is configured, all machine Hadoop defaults to the same default rack, called "/default-rack", in which case any datanode machine, whether physically belonging to the same rack, will be considered to be in the same rack, It is easy to get to the previous mention of adding network load between racks. In the absence of rack information, namenode default to all the slaves machine by default to/default-rack, at this time when the block is written, the choice of three datanode machines is completely random.

When the rack-aware information is configured, Hadoop chooses three datanode to make the appropriate decision:
1. If uploading the machine is not a datanode, but a client, then randomly select a Datanode as the first block of the Write Machine (datanode1) from all the slave machines. At this point, if the upload machine itself is a datanode, then the Datanode itself as the first block written to the machine (DATANODE1).
2. Subsequently, on a different rack other than the Datanode1-owned rack, randomly select one as the second block's write Datanode machine (DATANODE2).
3. Before writing the third block, determine if the first two datanode are on the same rack, and if they are in the same rack, try selecting a third Datanode as the Write Machine (DATANODE3) on the other rack. If Datanode1 and Datanode2 are not on the same rack, select a Datanode as DATANODE3 on the rack where the Datanode2 is located.
4. After getting the list of 3 Datanode, the Namenode returns the list to dfsclient before the Namenode end is sorted from near to far based on the "distance" between the write client and each datanode in the Datanode list. The client has a near-far write to the data block according to this order.
5. Dfsclient creates a block outputstream when the list of Datanode nodes is returned to dfsclient based on the "distance" sequence. and writes the block data to the first node (the nearest node) in the pipeline.
6. After writing the first block, follow the Subfar node in the Datanode list to write until the last block write succeeds, Dfsclient returns successfully, and the block write operation ends.

With the above strategy, namenode when selecting the Write Datanode list for a block, it takes into account the fact that the block copy is scattered across different racks, while minimizing the network overhead described earlier.

Wen/godhehe (author of Jane's book)
Original link: http://www.jianshu.com/p/372d25352d3a
Copyright belongs to the author, please contact the author to obtain authorization, and Mark "book author".

HDFs Rack-aware function principle (rack awareness)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFs Rack-aware function principle (rack awareness)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HDFs Rack-aware function principle (rack awareness)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support