HDFs Access file mechanism

Source: Internet
Author: User

HDFs and HBase are two of the main storage file systems in Hadoop, different scenarios for which HDFS is suitable for large file storage, and hbase for a large number of small file stores. This article mainly explains how the client in the HDFs file system reads and writes data from the Hadoop cluster, and it can also be described as a block policy.

Body

One
Write Data

When no rack information is configured, all machine Hadoop defaults to the same default rack, called "/default-rack", in which case any datanode machine, whether physically belonging to the same rack, will be considered to be in the same rack, It is easy to get to the previous mention of adding network load between racks. In the absence of a rack
Information, Namenode defaults to all slaves machines by default to/default-rack.

When the rack-aware information is configured in a Hadoop cluster, Hadoop chooses three datanode to make the appropriate decision:

1. If uploading the machine is not a datanode, but a client, then randomly select a Datanode as the first block of the Write Machine (datanode1) from all the slave machines.

Note: At this point, if the upload machine itself is a datanode (such as when a task in a MapReduce job writes data to HDFs through Dfsclient), then the Datanode itself is written as the first block to the machine (DATANODE1).

2. Subsequently, on a different rack other than the Datanode1-owned rack, randomly select one as the second block's write Datanode machine (DATANODE2).

3. Before writing the third block, determine if the first two datanode are on the same rack, and if it is in the same rack, then try to select on a different rack
Three Datanode as a write machine (DATANODE3). And if Datanode1 and Datanode2 are not on the same rack, the Datanode2
Select a datanode on the rack as the datanode3.

4. After getting the list of 3 Datanode, the Namenode returns the list to dfsclient before the Namenode side first writes to the client
The "distance" between the end and each datanode in the Datanode list is sorted from near to far. If the DFS write side is not datanode at this time, select the Datanode column
The first row in the table is the first one. The client has a near-far write to the data block according to this order. In this case, the algorithm for judging the "distance" between the two Datanode is more critical, and the Hadoop mesh
The former implementation is as follows, with two objects representing Datanode Datanodeinfo (NODE1,NODE2) as an example:

A) First, based on the Node1 and Node2 objects, the two datanode levels in the entire HDFs cluster are derived respectively. The hierarchical concept here needs to be explained: The hierarchy string in which each datanode is located in the HDFs cluster is described in this way, assuming that the topology of HDFs is as follows:

Each datanode will correspond to its position and hierarchy in the cluster, such as Node1 's location information is "/rack1/datanode1", then it is at the level of
For 2, the rest of the analogy. When you get the two node hierarchy, you will look up in the topology tree where each node is located, such as "/rack1" on the top level of "/rack1/datanode1", at which point the distance between two nodes is 1, and two node is looked up respectively. Until a common ancestor node location is found, the resulting distance is used to represent the distance between the two nodes
From. So, as shown, the distance between Node1 and Node2 is 4.

5. When the Datanode node list is returned to dfsclient based on the "distance" sequence, Dfsclient creates Blockoutputstream and writes the block to the first node in pipeline (the closest node).

6. After writing the first block, follow the Subfar node in the Datanode list to write until the last block write succeeds, Dfsclient returns successfully, and the block write operation ends.

Through the above strategy, Namenode in the selection of data block write Datanode list, it takes into account that the block copy is scattered in different racks, and at the same time to avoid the previously described excessive network overhead.

Add: A Hadoop rack-aware strategy

By default, the rack-aware of Hadoop is not enabled. Therefore, in general, the Hadoop cluster of HDFs in the selection of machines, is randomly selected, that is, very
It is possible that when the data is written, Hadoop writes the first piece of data Block1 to Rack1, and then randomly chooses to write Block2 to Rack2, at which point two rack
Generated data transmission of traffic, and then, in the case of random, and again will block3 back to the Rack1, at this time, two rack between the production of a traffic. In the Job office
Data volume is very large, or the amount of data to be pushed to Hadoop is very large, this situation will cause the network traffic between rack to multiply, become the bottleneck of performance, and then affect the sex of the job
To the service of the entire cluster.

To enable the Hadoop rack-aware feature, the configuration is very simple, and an option is configured in the Hadoop-site.xml configuration file of the machine where the Namenode is located:

<property>

<name>topology.script.file.name</name>

<value>/path/to/RackAware.py</value>

</property>

The value of this configuration option is specified as an executable program, typically a script that accepts a parameter and outputs a value. The accepted parameters are usually the IP address of a datanode machine, and the output value is usually the rack of the datanode that corresponds to the IP address, such as "/rack1". When Namenode is started, it determines whether the configuration option is
NULL, if non-empty, indicates that a rack-aware configuration has been used, at which point the Namenode will look for the script according to the configuration and, when receiving each datanode heartbeat, pass the Datanode IP address as a parameter to the script. The resulting output is stored in a map of memory as the rack that the datanode belongs to.

As for scripting, it is necessary to understand the real network topology and rack information so that the IP address of the machine can be correctly mapped to the appropriate rack. A simple implementation is as follows:

#!/usr/bin/python
#-*-coding:utf-8-*-
Import Sys

Rack = {"hadoopnode-176.tj": "Rack1",
"HADOOPNODE-178.TJ": "Rack1",
"HADOOPNODE-179.TJ": "Rack1",
"HADOOPNODE-180.TJ": "Rack1",
"HADOOPNODE-186.TJ": "Rack2",
"HADOOPNODE-187.TJ": "Rack2",
"HADOOPNODE-188.TJ": "Rack2",
"HADOOPNODE-190.TJ": "Rack2",
"192.168.1.15": "Rack1",
"192.168.1.17": "Rack1",
"192.168.1.18": "Rack1",
"192.168.1.19": "Rack1",
"192.168.1.25": "Rack2",
"192.168.1.26": "Rack2",
"192.168.1.27": "Rack2",
"192.168.1.29": "Rack2",
}


If __name__== "__main__":
Print "/" + Rack.get (sys.argv[1], "rack0")

Because no exact document description was found
Whether the host name or IP address will be passed to the script, so in the script is best compatible with the host name and IP address, if the room architecture is more complex, the script can return such as:/dc1/rack1 similar string.

Two
Reading data

Let's look at how the data is read in the Hadoop cluster configuration. When reading a block of a file, Hadoop takes the same strategy:

1. First get a list of the Datanode that the block is in, and there are several copies of the list that have a few datanode.

2. Sort from small to large according to the distance from the reading end of the Datanode in the list:

A) First look for a local copy of the block and, if present, the local datanode as the first Datanode to read the block

b) Then find out if there is a local rack under the same datanode that holds the block copy

c) Finally, if none is found, or the node reading the data is not the Datanode node itself, then a random order of the Datanode list is returned.

HDFs Access file mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.