More or less I have heard about the rack awareness policy about Hadoop. Whether it is balancer or jobtracker, the data copy placement policy uses rack awareness. What is rack awareness?

First, the so-called rack perception is the perception of the rack. who is aware of it? It's the hadoop system. To be more precise, hadoop can build a server and rack location topology inside the system, and identify the topological location of system nodes, in order to make a copy placement policy, job localization and other high-level design.

Can the hadoop system automatically sense the network topology in the cluster or data center? Think about it. The data center topology or network structure of each company are different, and the device types used are also different. Can hadoop really feel this way? Obviously, no! The hadoop system needs the help of the system administrator to obtain the network topology.

Imagine that hadoop can build a network topology, and the actual network topology is ever-changing. What should the Administrator do? Therefore, it is necessary for hadoop to design a set of standard topology structures. The administrator needs to adapt the actual network topology as much as possible.

With these basic ideas, we can proceed. I have read the datanode code for a while before. We all know that datanode has a registration process with namenode at startup to establish a superior-subordinate relationship with namenode. It can also be considered as the Bay pier. Then follow this route to view the rack perception principle. DatanodeProtocol defines the registration method Interface

Public DatanodeRegistration register (DatanodeRegistration registration
) Throws IOException;

