1. Topological distance
Here's a simple way to calculate the network topology distance of Hadoop
In a large number of scenarios, bandwidth is scarce resources, how to make full use of bandwidth, the perfect cost of computing costs and constraints are too many. Hadoop gives a solution like this:
Calculate the spacing between two nodes, using the nearest node to operate, if you are familiar with the data structure, you can see that this is an example of the distance measurement algorithm.
If represented by a data structure, this can be represented as a tree, and the distance calculation of two nodes is the calculation of the common ancestor.
In reality, the more typical scenario is as follows,
The tree structure node is represented by Datacenter Data center, here as D1, D2, rack rack, represented here as R1,R2,R3, and server node nodes, represented here as N1,n2,n3, N4
1.distance (D1/R1/N1,D1/R1/N1) =0 (same node)
2.distance (D1/R1/N1,D1/R1/N2) =2 (same rack different node)
3.distance (D1/R1/N1,D1/R2/N3) =4 (different racks in same data center)
4.distance (D1/R1/N1,D2/R3/N4) =6 (different data centers)
2. Copy storage
First of all, the Namenode node chooses a datanode node to store the block copy of the process is called copy storage, the process of the strategy is in the reliability and read and write bandwidth between the tradeoff. So let's look at two extreme phenomena:
1. Keep all copies on the same node, write bandwidth is guaranteed, but this reliability is completely false, once the node is dead, the data is all gone, and across the rack read bandwidth is very low.
2. All replicas are scattered over different nodes, reliability is improved, but bandwidth is a problem.
Even in the same data center there are many kinds of replica hosting scenarios, 0.17.0 provides a relatively balanced solution, after 1.x, the replica storage scheme is already optional.
Let's say the Hadoop default scenario:
1. Put the first copy on the same node as the client, if the client is not in the cluster, then select a node to store.
2. The second copy will be randomly selected on a different rack from the first replica
3. A third copy will randomly select a different node on the same rack as the second copy
4. The remaining copies are completely random nodes.