Transferred from: HTTP://WWW.JIANSHU.COM/P/372D25352D3A
HDFs Namenode is responsible for everything related to File block replication, which periodically receives heartbeat and blockreport information from Datanode, and the placement of the HDFs file block copy is critical to the overall reliability and performance of the system.
A simple but non-optimized copy placement strategy is to put copies in different racks, or even different IDC. This prevents errors caused by the entire rack, or even the entire IDC crash, but it must be transferred between multiple racks and even between IDC, increasing the cost of copy writing.
In the default configuration, the number of replicas is 3, the usual strategy is: the first copy is placed in the same frame as the client node (if the client is not in the cluster scope, the first node is randomly selected not too full or not too busy node) The second copy is placed on node in a different rack than the first node, and the third copy is placed in a different node from the second node's rack.
The Copy placement strategy for Hadoop makes a good balance between reliability (replicas are in different racks) and bandwidth (just across a rack).
However, how does HDFs know the network topology of each datanode? Its rack-aware functionality requires an executable file (or script) defined by the Topology.script.file.name property, which provides translations of the NODEIP corresponding rackid. If Topology.script.file.name is not set, then each IP will be translated into/default-rack.
By default, Hadoop rack awareness is not enabled and requires an option to be configured in the Hadoop-site.xml of the Namenode machine, for example:
<property> <name>topology.script.file.name</name> <value>/path/to/script</ Value></property>
The value of this configuration option is specified as an executable program, typically a script that accepts a parameter and outputs a value. The accepted parameter is typically the IP address of the Datanode machine, and the output value is usually the rackid of the datanode that corresponds to the IP address, such as "/rack1". When Namenode starts, it determines whether the configuration option is empty and, if not NULL, indicates that a rack-aware configuration has been enabled, at which point Namenode will look for the script based on the configuration and, when receiving heartbeat for each datanode, Pass the IP address of the Datanode as an argument to the script and save the resulting output as the rack that the datanode belongs to in a map of memory.
As for scripting, it is necessary to understand the real network topology and rack information so that the IP address of the machine can be correctly mapped to the appropriate rack. The official Hadoop script: http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
The following are test results when data is uploaded by Hadoop HDFs without configuring rack-aware information and configuring rack-aware information.
When no rack information is configured, all machine Hadoop defaults to the same default rack, called "/default-rack", in which case any datanode machine, whether physically belonging to the same rack, will be considered to be in the same rack, It is easy to get to the previous mention of adding network load between racks. In the absence of rack information, namenode default to all the slaves machine by default to/default-rack, at this time when the block is written, the choice of three datanode machines is completely random.
When the rack-aware information is configured, Hadoop chooses three datanode to make the appropriate decision:
1. If uploading the machine is not a datanode, but a client, then randomly select a Datanode as the first block of the Write Machine (datanode1) from all the slave machines. At this point, if the upload machine itself is a datanode, then the Datanode itself as the first block written to the machine (DATANODE1).
2. Subsequently, on a different rack other than the Datanode1-owned rack, randomly select one as the second block's write Datanode machine (DATANODE2).
3. Before writing the third block, determine if the first two datanode are on the same rack, and if they are in the same rack, try selecting a third Datanode as the Write Machine (DATANODE3) on the other rack. If Datanode1 and Datanode2 are not on the same rack, select a Datanode as DATANODE3 on the rack where the Datanode2 is located.
4. After getting the list of 3 Datanode, the Namenode returns the list to dfsclient before the Namenode end is sorted from near to far based on the "distance" between the write client and each datanode in the Datanode list. The client has a near-far write to the data block according to this order.
5. Dfsclient creates a block outputstream when the list of Datanode nodes is returned to dfsclient based on the "distance" sequence. and writes the block data to the first node (the nearest node) in the pipeline.
6. After writing the first block, follow the Subfar node in the Datanode list to write until the last block write succeeds, Dfsclient returns successfully, and the block write operation ends.
With the above strategy, namenode when selecting the Write Datanode list for a block, it takes into account the fact that the block copy is scattered across different racks, while minimizing the network overhead described earlier.
Wen/godhehe (author of Jane's book)
Original link: http://www.jianshu.com/p/372d25352d3a
Copyright belongs to the author, please contact the author to obtain authorization, and Mark "book author".
HDFs Rack-aware function principle (rack awareness)