Principle
The declaration in hadoop is an organic framework-aware function that can improve hadoop performance. The hadoop cluster we use has never actually used this function.
The implementation of rack awareness in hadoop is actually as follows:
- When hadoop is started, one configuration option in the hadoop-default.xml and hadoop-site.xml is checked
Item: topology. Script. file. Name
When jobtracker is connected, the slave IP address is passed as a parameter to this script, and the returned value of this script is expected to return the rack name described in this slave. And this
Specifically, how does one determine the ing between slave and rack hadoop. Therefore, Which machine belongs to the rack is determined by the person who wrote the script.
- In addition, there is another configuration option corresponding to topology. Script. file. Name.
Item: topology. Script. Number. args. This option sets the maximum number of parameters that the above script can accept because more than one parameter is accepted when the script is called.
Number. Each parameter is the IP address of a machine.
Steps
- 1. Add the configuration options in the jobtracker's hadoop-site.xml configuration file:
<property>
<name>topology.script.file.name</name>
<value>/path/to/rackmap.sh</value>
<description> The script name that should be invoked to resolve DNS names to
NetworkTopology names. Example: the script would take host.foo.bar as an
argument, and return /rack1 as the output.
</description>
</property>
<property>
<name>topology.script.number.args</name>
<value>1000</value>
<description> The max number of args that the script configured with
topology.script.file.name should be run with. Each arg is an
IP address.
</description>
</property>
- Write the rackmap. Sh script to output the rack to each address.
- Restart jobtracker