Hadoop configuration file load order

Last Update:2014-12-05 Source: Internet

Author: User

Tags deprecated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

With a period of time of Hadoop, and now come back to see the source found not a taste, warm so know the new, it is true

　　We need to configure some files before using Hadoop, Hadoop-env.sh,core-site.xml,hdfs-site.xml,mapred-site.xml. So when are these files used by Hadoop?

　　Most of the time when starting Hadoop is start-all.sh, so what does this script do?

start-all.sh

# Start all Hadoop daemons. Run this on master node. #特别的地方时要在master节点上启动hadoop所有进程 bin=`dirname " $"' Bin= ' CD"$bin";pwd' #bin = $HADOOP _home/binif[-E"$bin/.. /libexec/hadoop-config.sh"]; Then. "$bin"/.. /libexec/hadoop-config.SHElse. "$bin/hadoop-config.sh"fi# start Dfs daemons"$bin"/start-dfs.SH--config $HADOOP _conf_dir # start mapred daemons"$bin"/start-mapred.SH--config $HADOOP _conf_dir

Load hadoop-env.sh 　　The script first finds the bin directory in Hadoop, which can be directly replaced with $hadoop_home/bin if the Hadoop environment variable is configured. The next step is to execute hadoop-config.sh, which may be in the $hadoop_home/libexec directory or the $hadoop_home/bin directory, in the HADOOP version I'm using $hadoop_home/ In the Libexec directory, there are a few lines of script in the hadoop-config.sh file hadoop-config.sh

if " ${hadoop_conf_dir}/hadoop-env.sh "  Then  "${hadoop_conf_dir}/hadoop-env.sh"fi

Test $hadoop_home/conf/hadoop-env.sh as normal file after passing . "${hadoop_conf_dir}/hadoop-env.sh" executes hadoop-env.sh this script, OK, we configure the environment variable Java_home in hadoop-env.sh to take effect, In fact, I feel this place can not be configured completely, why? Since installing Hadoop on Linux is sure to install Java, the installation will certainly be configured with Java_home, and the environment variables configured in/etc/profile will take effect in any shell process.

load the Core-*.xml,hdfs.*.xml file after executing the hadoop-config.sh command, execute the $hadoop_home/start-dfs.sh. The purpose of this script is to start the Namenode,datename,secondarynamenode three HDFs-related processes start-dfs.sh

# Run this on master node. Usage="usage:start-dfs.sh [-upgrade|-rollback]"bin=`dirname " $"' Bin= ' CD"$bin";pwd` if[-E"$bin/.. /libexec/hadoop-config.sh"]; Then. "$bin"/.. /libexec/hadoop-config.SHElse. "$bin/hadoop-config.sh"fi# Get Argumentsif[$#-ge1]; Thennamestartopt=$1Shift Case$nameStartOptinch(-upgrade);;(-rollback) datastartopt=$nameStartOpt;;(*)Echo$usageexit1;;Esacfi# Start Dfs daemons# start Namenode after datanodes, to minimize TimeNamenode is upW/o data# note:datanodes would log connection errorsuntilNamenode starts"$bin"/hadoop-daemon.SH--config $HADOOP _conf_dir start Namenode $nameStartOpt"$bin"/hadoop-daemons.SH--config $HADOOP _conf_dir start Datanode $dataStartOpt"$bin"/hadoop-daemons.SH--config $HADOOP _conf_dir--hosts MastersStart Secondarynamenode

Take a closer look and you can't find Hadoop-config.sh is also executed in start-dfs.sh, because we don't always use start-all.sh to start all of Hadoop's processes, and sometimes we just need to use HDFS instead of MapReduce, where we just have to execute the ST art-dfs.sh, the variables defined in the same hadoop-config.sh are also used by the file system-related processes, so the hadoop-config.sh must be executed before starting Namenode,datanode,secondarynamenode, and Hado The op-env.sh file is executed. Take a look at the last three lines of code, which are the scripts that start Namenode,datanode,secondarynamenode. After starting Hadoop a total of 5 processes, of which three is namenode,datanode,secondarynamenode, since can start process description corresponding class must have the main method, see the source code can verify this, this is not the point, The point is to see how the corresponding class is loading the configuration file. Whether it's Namenode, Dataname, or secondarynamenode, they'll load core-*.xml and Hdfs-*.xml files on startup to Org.apache.hadoop.hdfs.server.namenode.NameNode This class for example, the other two classes Org.apache.hadoop.hdfs.server.datanode.DataNode,org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode similar.

Org.apache.hadoop.hdfs.server.namenode.NameNode

 public  class  NameNode "implements   ClientProtocol, Datanodeprotocol,namenodeprotocol, Fsconstants, Refreshauthorizationpolicyprotocol,refreshusermappingsprotocol { static  {configuration.adddefaultresource (    } ...}

Look at the contents of the static code block, will be very excited, see Hdfs-default.xml and Hdfs-site.xml. The point here is that the static code block executes (not object initialization) when the class is loaded into the JVM to perform the initialization of the class. Configuration. Adddefaultresource("Hdfs-default.xml"); Before this code executes, it loads the configuration class into the JVM, so look at what the static code block in the Org.apache.hadoop.conf.Configuration class is doing.

org.apache.hadoop.conf.Configuration

Static{//Print deprecation warning if hadoop-site.xml is found in ClasspathClassLoader CL =Thread.CurrentThread (). Getcontextclassloader ();if(CL = =NULL) {CL= Configuration.class. getClassLoader ();}if(Cl.getresource ("Hadoop-site.xml")! =NULL) {Log.warn ("Deprecated:hadoop-site.xml found in the classpath." + "Usage of Hadoop-site.xml is DEPRECATED. Instead use Core-site.xml, "+" Mapred-site.xml and Hdfs-site.xml to override properties of "+" Core-default.xml, mapred-de Fault.xml and Hdfs-default.xml "+" respectively ");} Adddefaultresource ( " core-default.xml"); Adddefaultresource ("Core-site.xml"));}

The configuration class loads the two files of Core-default.xml and Core-site.xml when the class is initialized. This namenode loads the Core-*.xml and hdfs-*.xml files at boot time, where Core-*.xml is loaded by the configuration class.

loading core-*.xml and Mapred-*.xml files after executing Start-dfs.xml, execute start-mapred.sh, which is the same script as start-hdf.sh.

start-mapred.sh# Start Hadoop map reduce daemons. Run ThisOn master node. Bin= ' DirName '' Bin= ' CD ' $bin '; pwd 'if[-E "$bin/: /libexec/hadoop-config.sh " ]; Then. "$bin"/.. /libexec/hadoop-config.shElse. "$bin/hadoop-config.sh"fi # start mapred daemons# start Jobtracker first to minimize connection errors at startup"$bin"/hadoop-daemon.sh--config $HADOOP _conf_dir start Jobtracker "$bin"/hadoop-daemons.sh--config $HADOOP _conf_dir start Tasktracker

the script will also execute the hadoop-config.sh, and the hadoop-env.sh will be executed as well. Here and the start-dfs.sh are unified. The last two lines of code are to start the jobtracker and tasktracker processes. Also corresponds to two classes of Org.apache.hadoop.mapred.JobTracker and org.apache.hadoop.mapred.TaskTracker .

inOrg.apache.hadoop.mapred.JobTracker For example, org.apache.hadoop.mapred.TaskTracker similar to Org.apache.hadoop.mapred.JobTracker

 Public class Implements  static{configuration.adddefaultresource ("Mapred-default.xml"); Configuration.adddefaultresource ("Mapred-site.xml");} ...}

OK, with the above explanation, it is clear now. Jobtracker loads the Core-*.xml and Mapred-*.xml files at startup, where Core-*.xml is done by the configuration. Summarize:when using start-all.sh to start all Hadoop processes, the various configuration files are loaded in the order: hdfs:hadoop-env.sh--Core-default.xml--and Core-site.xml--hdfs-default.xml--Hdfs-site.xml
mapred:hadoop-env.sh--Core-default.xml--and Core-site.xml--mapred.default.xml--Mapred.site.xmlNote that the files of the Core-*.xml system are always loaded first, and 5 processes in Hadoop are loaded, which means that core-*.xml is a public base library that is shared by the big guys. configuration files are loaded at process startup, which also proves that if you modify a Hadoop configuration file, whether it is a system configuration file or an administrator profile, the restart process takes effect.

Hadoop configuration file load order

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More