Hadoop Rack-aware
1. Background
Hadoop is designed to take into account the security and efficiency of data, data files by default in HDFs storage three copies, the storage policy is a local copy,
A copy of one of the other nodes in the same rack, a node on a different rack.
This way, if the local data is corrupted, the node can get the data from neighboring nodes in the same rack, the speed is certainly faster than the data from the cross-rack node;
At the same time, if the network of the entire rack is abnormal, the data can also be found on the nodes of other racks.
To reduce overall bandwidth consumption and read latency, HDFs tries to get the reader to read the most recent copy from it.
If there is a copy on the same rack of the reader, the copy is read.
If an HDFS cluster spans multiple datacenters, the client will also first read a copy of the data center.
So how does Hadoop determine if any two nodes are in the same rack, or across a rack? The answer is rack perception.
By default, the rack-aware of Hadoop is not enabled. So, in general, the Hadoop cluster HDFs is randomly selected when selecting the machine, i.e.,
Most likely, when writing data, Hadoop writes the first piece of data Block1 to Rack1, and then randomly chooses to write Block2 to Rack2.
At this time, two rack between the data transmission flow, and then, in the case of random, and then Block3 re-write back to the Rack1,
At this point, a data flow is generated between the two rack.
When the amount of data being processed by the job is very large, or the amount of data pushed to Hadoop is very large, this situation can cause the network traffic between rack to multiply and become the bottleneck of performance.
Thus affecting the performance of the job so that the entire cluster of services
2. Configuration
By default, Namenode starts when the log is:
2013-09-22 17:27:26,423 INFO org.apache.hadoop.net.NetworkTopology:Adding A new node:/default-rack/ 192.168.147.92:50010
The rack ID for each IP is/default-rack, which means that the rack-aware of Hadoop is not enabled.
To enable Hadoop rack-aware functionality, the configuration is very simple, and an option is configured in the Core-site.xml configuration file of the/home/bigdata/apps/hadoop/etc/hadoop of the Namenode node:
<property>
<name>topology.script.file.name</name>
<value>/home/bigdata/apps/hadoop/etc/hadoop/topology.sh</value>
</property>
The value of this configuration option is specified as an executable program, typically a script that accepts a parameter and outputs a value.
The accepted parameters are usually the IP address of a datanode machine, and the output value is usually the rack of the datanode that corresponds to the IP address, such as "/rack1".
When Namenode starts, it determines whether the configuration option is empty and, if not NULL, indicates that the rack-aware configuration is enabled, and Namenode looks for the script based on the configuration.
And when each datanode heartbeat is received, the IP address of the Datanode is passed as a parameter to the script and the resulting output is used as the rack ID that the datanode belongs to.
Saved to a map in memory.
As for scripting, it is necessary to understand the real network topology and rack information, through which the machine's IP address and machine name can be correctly mapped to the appropriate rack.
A simple implementation is as follows:
#!/bin/bash
Hadoop_conf=/home/bigdata/apps/hadoop/etc/hadoop
While [$#-GT 0]; Do
Nodearg=$1
Exec<${hadoop_conf}/topology.data
Result= ""
while read line; Do
Ar= ($line)
If ["${ar[0]}" = "$nodeArg"]| | ["${ar[1]}" = "$nodeArg"]; Then
result= "${ar[2]}"
Fi
Done
Shift
If [-Z "$result"]; Then
Echo-n "/default-rack"
Else
Echo-n "$result"
Fi
Done
Topology.data, in the Format: node (IP or hostname)/switch xx/rack XX
192.168.147.91 Tbe192168147091/dc1/rack1
192.168.147.92 Tbe192168147092/dc1/rack1
192.168.147.93 Tbe192168147093/dc1/rack2
192.168.147.94 Tbe192168147094/dc1/rack3
192.168.147.95 Tbe192168147095/dc1/rack3
192.168.147.96 Tbe192168147096/dc1/rack3
It is important to note that on Namenode, the nodes in the file must use IP, the host name is invalid,
On Jobtracker, the node in the file must use the hostname and the IP is not valid, so the best IP and host name are provided.
After this configuration, the Namenode boot time log is this:
2013-09-23 17:16:27,272 INFO org.apache.hadoop.net.NetworkTopology:Adding A new node:/dc1/rack3/192.168.147.94:50010
Indicates that the rack-aware of Hadoop has been enabled.
To view the Hadoop rack Information command:
./hadoop Dfsadmin-printtopology
Rack:/dc1/rack1
192.168.147.91:50010 (tbe192168147091)
192.168.147.92:50010 (tbe192168147092)
Rack:/dc1/rack2
192.168.147.93:50010 (tbe192168147093)
Rack:/dc1/rack3
192.168.147.94:50010 (tbe192168147094)
192.168.147.95:50010 (tbe192168147095)
192.168.147.96:50010 (tbe192168147096)
3. Add data node, do not restart Namenode
Assuming that the Hadoop cluster deploys Namenode and Datanode on 192.168.147.68, rack-aware is enabled, and the results that Bin/hadoop dfsadmin-printtopology see are implemented:
Rack:/dc1/rack1
192.168.147.68:50010 (dbj68)
Now want to add a physical location in the Rack2 data node 192.168.147.69 to the cluster without restarting the Namenode.
First, modify the configuration of the Namenode node Topology.data, add: 192.168.147.69 Dbj69/dc1/rack2, save.
192.168.147.68 Dbj68/dc1/rack1
192.168.147.69 Dbj69/dc1/rack2
Then, sbin/hadoop-daemons.sh start Datanode starts the data node dbj69, and any node executes Bin/hadoop dfsadmin-printtopology sees the result:
Rack:/dc1/rack1
192.168.147.68:50010 (dbj68)
Rack:/dc1/rack2
192.168.147.69:50010 (dbj69)
Indicates that Hadoop has sensed the newly added node dbj69.
Note: If you do not add the dbj69 configuration to the Topology.data,
Execution of sbin/hadoop-daemons.sh start Datanode Start data node Dbj69,datanode log will cause an exception to occur, causing the dbj69 start to be unsuccessful.
2013-11-21 10:51:33,502 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode:Initialization failed for block pool Block Pool bp-1732631201-192.168.147.68-1385000665316 (storage ID ds-878525145-192.168.147.69-50010-1385002292231) Service to dbj68/192.168.147.68:9000
Org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.net.networktopology$invalidtopologyexception): Invalid Network topology. You cannot has a rack and a non-rack node at the same level of the network topology.
At Org.apache.hadoop.net.NetworkTopology.add (networktopology.java:382)
At Org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode (datanodemanager.java:746)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode (fsnamesystem.java:3498)
At Org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode (namenoderpcserver.java:876)
At Org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode ( datanodeprotocolserversidetranslatorpb.java:91)
At Org.apache.hadoop.hdfs.protocol.proto.datanodeprotocolprotos$datanodeprotocolservice$2.callblockingmethod ( datanodeprotocolprotos.java:20018)
At Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (protobufrpcengine.java:453)
At Org.apache.hadoop.ipc.rpc$server.call (rpc.java:1002)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:1701)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:1697)
At java.security.AccessController.doPrivileged (Native Method)
At Javax.security.auth.Subject.doAs (subject.java:415)
At Org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1408)
At Org.apache.hadoop.ipc.server$handler.run (server.java:1695)
At Org.apache.hadoop.ipc.Client.call (client.java:1231)
At Org.apache.hadoop.ipc.protobufrpcengine$invoker.invoke (protobufrpcengine.java:202)
At $Proxy 10.registerDatanode (Unknown Source)
At Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
At Sun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:57)
At Sun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:43)
At Java.lang.reflect.Method.invoke (method.java:601)
At Org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (retryinvocationhandler.java:164)
At Org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (retryinvocationhandler.java:83)
At $Proxy 10.registerDatanode (Unknown Source)
At Org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode ( datanodeprotocolclientsidetranslatorpb.java:149)
At Org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register (bpserviceactor.java:619)
At Org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake (bpserviceactor.java:221)
At Org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run (bpserviceactor.java:660)
At Java.lang.Thread.run (thread.java:722)
4. Calculation of distance between nodes
With rack-aware, Namenode can draw a diagram of the Datanode network topology shown. D1,R1 are switches, the bottom of which is datanode.
Then the parent of the H1 Rackid=/d1/r1/h1,h1 is R1,r1 is D1. These rackid information can be configured through Topology.script.file.name.
With these rackid information, we can calculate the distance between any two datanode, get the optimal storage strategy, optimize the network bandwidth equalization of the whole cluster and the optimal allocation of the data.
Distance (/D1/R1/H1,/D1/R1/H1) =0 the same datanode
Distance (/D1/R1/H1,/D1/R1/H2) =2 different datanode under the same rack
Distance (/D1/R1/H1,/D1/R2/H4) =4 different datanode under the same IDC
Distance (/D1/R1/H1,/D2/R3/H7) =6 under different IDC Datanode
"Hadoop" Hadoop rack-aware configuration, principle