Hadoop Process Initiation Process Analysis

Last Update:2018-07-26 Source: Internet

Author: User

Tags ssh

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Detailed procedures for starting the HDFS process using start-dfs.sh

The scripts involved are:

Under Bin:
hadoop-config.sh
start-dfs.sh
hadoop-daemons.sh
slaves.sh
hadoop-daemon.sh
Hadoop

Conf under:
hadoop-env.sh

Where both hadoop-config.sh and hadoop-env.sh are scripts related to the Hadoop environment variables.

Start-dfs.sh will call hadoop-daemon.sh start Naomenode, call hadoop-daemons.sh start Secondarynamenode, call hadoop-daemons.sh Start Datanode, because the process of starting datanode is needed to start on more than one node, so here only to analyze it, the other two process start is relatively simple, reference it can be understood.

start-dfs.sh: Load hadoop-config.sh, which is source hadoop-config.sh
|
|
|
hadoop-daemons.sh: Loading hadoop-config.sh; The script runs on the master node.
| Call slaves.sh and pass parameters: Exec "$bin/slaves.sh"--config $HADOOP _conf_dir cd "$HADOOP _home" \; "$bin/hadoop-daemon.sh"--config $HADOOP _conf_dir "$@"
| Here's the EXEC command: EXEC < scripts > < parameters >
| For example: Execute the following command to start Datanode
| EXEC ***/test_slaves.sh--config ***/. /conf CD ***/. '; ' ***/hadoop-daemon.sh--config ***/. /conf Start Datanode
| where the * * =/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin
|
slaves.sh: Loading hadoop-config.sh; The script runs on the master node, which starts all the hadoop-daemon.sh processes on the node and lets them run in parallel.
| Traverse each host in the slaves file, execute the SSH command sequentially, and the SSH command will get the relevant parameters of the shell script passed in the hadoop-daemons.sh and start the related process on each remote node
|
| For slave in ' cat ' $HOSTLIST | Sed "s/#.*$//;/^$/d" '
| Do
| SSH $HADOOP _ssh_opts $slave $ "${@///\\}" 2>&1 | Sed "s/^/$slave:/" &
| For example, the command is in the following form
| SSH hdfs05 CD ***/. '; ' ***/hadoop-daemon.sh--config ***/. /conf Start Datanode 2>&1 | Sed ' s/^/hdfs05:/' &
| The command will be executed on node Hdfs05: CD ***/. '; ' ***/hadoop-daemon.sh--config ***/. /conf Start Datanode 2>&1 | Sed ' s/^/hdfs05:/' &
| This command will be run from the background of the node, so that the script for all nodes can be started on the master node, and each node script executes in parallel.
| The script called hadoop-daemon.sh.
| Done
|
hadoop-daemon.sh: Load hadoop-config.sh, load hadoop-env.sh, set Hadoop related variables and the current Shell's Java environment variables,
| The script runs on the slave node, and in this case it is started by the master node and runs on its own, and if the script is to be tested, it needs to be tested independently on a node.
| The start and stop of the process are handled here,
| Stop is the kill tune related process, such as datanode, relatively simple,
| Here, according to the parameters passed in the slaves.sh, is the start Datanode,
| When the process starts, it executes the following command, which executes in the background of the current node.
| Nohup nice-n $HADOOP _niceness "$HADOOP _home"/bin/hadoop--config $HADOOP _conf_dir $command "$@" > "$log" 2>&1 & Lt /dev/null &
| For example, the following command may be executed here, the command will start the Hadoop script on this node, and will run in the background, the standard output in $log,
| Users can use Nohup nice-n 0 ***/directly when testing. /bin/hadoop--config ***/. /conf Datanode
| Or use Hadoop datanode directly, use Nohup and & just put it in the background, and do not hang up the run, do not occupy the current shell, see Nohup command
| Nohup nice-n 0 ***/. /bin/hadoop--config ***/. /conf datanode > "$log" 2>&1 </dev/null &
| where the * * =/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin,
|
Hadoop: Load hadoop-config.sh, load hadoop-env.sh, set Hadoop related variables and the current Shell's Java environment variables, according to the parameters passed earlier Datanode find the relevant Java class, Start the Datanode process.
| The script is run from the background of the node, and all standard output and standard error information is written to the $log because of the reason for the command in Hadoop-daemon.sh,
| Therefore, when the user needs to test the script separately, it needs to be tested separately, such as using the command Hadoop datanode to test the startup Datanode.
| Find the relevant Java class class= ' Org.apache.hadoop.hdfs.server.datanode.DataNode ' through the parameter Datanode, and set the Java environment variable classpath, The JVM occupies the largest amount of memory and executes Java commands to run
| Datanode class;
| The script eventually starts the process by executing the following command, such as Datanode
Complete the Exec "$JAVA" $JAVA _heap_max $HADOOP _opts-classpath "$CLASSPATH" $CLASS "$@"
The command may be in the following form:

Exec

/home/hadoop/jdk1.6.0_07/bin/java java command

-xmx1000m JVM occupies the maximum memory space

-dcom.sun.management.jmxremote-dhadoop.log.dir=/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /logs-dhadoop.log.file=hadoop.log-dhadoop.home.dir=/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. -dhadoop.id.str=-DHADOOP.ROOT.LOGGER=INFO,CONSOLE-DJAVA.LIBRARY.PATH=/ROOT/HADOOP/HADOOP-0.20.1/HADOOP-0.20.1/ bin/. /lib/native/linux-amd64-64-dhadoop.policy.file=hadoop-policy.xml

Hadoop Options $hadoop_opts

-classpath/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /conf:/home/hadoop/jdk1.6.0_07/lib/tools.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/..:/ root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /hadoop-0.20.1-core.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-cli-1.2.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-codec-1.3.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-el-1.0.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-httpclient-3.0.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-logging-1.0.4.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-logging-api-1.0.4.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/commons-net-1.4.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/core-3.1.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/hsqldb-1.8.0.10.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jasper-compiler-5.5.12.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jasper-runtime-5.5.12.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jets3t-0.6.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jetty-6.1.14.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jetty-util-6.1.14.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/junit-3.8.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/kfs-0.2.2.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/log4j-1.2.15.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/oro-2.0.8.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/servlet-api-2.5-6.1.14.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/slf4j-api-1.4.3.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/slf4j-log4j12-1.4.3.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/xmlenc-0.52.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jsp-2.1/jsp-2.1.jar:/root/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/. /lib/jsp-2.1/jsp-api-2.1.jar

This is the jar package in the Java environment variable Classpath,datanode class runtime dependency variable

Org.apache.hadoop.hdfs.server.datanode.DataNode

This is the final Java class that starts the Datanode process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More