6. Modify the hadoop configuration fileFile Location:/home/hadoop/etc/hadoop, file name: hadoop-env.sh, yarn-evn.sh, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
(1) configure the hadoop-env.sh File
# In the hadoop installation path, go to the hadoop/etc/hadoop/directory and edit the hadoop-env.sh, modify JAVA_HOME to JAVA installation path
[hadoop@linux-node1 home/hadoop]$ cd hadoop/etc/hadoop/[hadoop@linux-node1 hadoop]$ egrep JAVA_HOME hadoop-env.sh# The only required environment variable is JAVA_HOME. All others are# set JAVA_HOME in this file, so that it is correctly defined on#export JAVA_HOME=${JAVA_HOME}export JAVA_HOME=/usr/java/jdk1.8.0_101/
(2) configure the yarn. sh File
Specifies the java Runtime Environment of the yran framework. This file is the configuration file of the runtime environment of the yarn framework. You need to modify the location of JAVA_HOME.
[hadoop@linux-node1 hadoop]$ grep JAVA_HOME yarn-env.sh# export JAVA_HOME=/home/y/libexec/jdk1.6.0/export JAVA_HOME=/usr/java/jdk1.8.0_101/
(3) configure the slaves File
Specify the DataNode data storage server to write the host names of all DataNode machines to this file, as shown below:
[hadoop@linux-node1 hadoop]$ cat slaveslinux-node2linux-node3linux-node4
Hadoop three operating modes
Local Independent mode: All Hadoop components, such as NameNode, DataNode, Jobtracker, and Tasktracker, are running in a java Process.
Pseudo-distributed mode: Each Hadoop component has a separate Java virtual machine, which communicates with each other through a network socket.
Fully Distributed mode: Hadoop is distributed across multiple hosts, and different components are installed on different Guest instances based on their work nature.
# Configure the fully distributed mode
(4) modify the core-site.xml file, add the code for the red area, pay attention to the content marked in blue
<configuration><property><name>fs.default.name</name><value>hdfs://linux-node1:9000</value></property><property><name>io.file.buffer.size</name><value>131072</value></property><property><name>hadoop.tmp.dir</name><value>file:/home/hadoop/tmp</value><description>Abase for other temporary directories.</description></property></configuration>
(5) Modify hdfs-site.xml files
<Configuration> <property> <name> dfs. namenode. secondary. http-address </name> <value> linux-node1: 9001 </value> <description> # view HDFS status on the web interface </description> </property> <name> dfs. namenode. name. dir </name> <value> file:/home/hadoop/dfs/name </value> </property> <name> dfs. datanode. data. dir </name> <value> file:/home/hadoop/dfs/data </value> </property> <name> dfs. replication </name> <value> 2 </value> <description> # each Block has 2 backups </description> </property> <name> dfs. webhdfs. enabled </name> <value> true </value> </property> </configuration>
(6) Modify mapred-site.xml
This is the configuration of mapreduce tasks. Because hadoop2.x uses the yarn framework, to achieve distributed deployment, you must configure it as yarn under the mapreduce. framework. name attribute. Mapred. map. tasks and mapred. reduce. tasks are the number of map and reduce tasks respectively.
[hadoop@linux-node1 hadoop]$ cp mapred-site.xml.template mapred-site.xml<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>linux-node1:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>linux-node1:19888</value></property></configuration>
(7) Configure node yarn-site.xml
# This file is related to the yarn architecture configuration
<?xml version="1.0"?><!-- mapred-site.xml --><configuration><property><name>mapred.child.java.opts</name><value>-Xmx400m</value><!--Not marked as final so jobs can include JVM debuggung options --></property></configuration><?xml version="1.0"?><!-- yarn-site.xml --><configuration><!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.address</name><value>linux-node1:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>linux-node1:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>linux-node1:8031</value></property><property><name>yarn.resourcemanager.admin.address</name><value>linux-node1:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>linux-node1:8088</value></property><property><name>yarn.nodemanager.resource.memory-mb</name><value>8192</value></property></configuration>
7. Copy hadoop to another nodescp -r /home/hadoop/hadoop/ 192.168.0.90:/home/hadoop/scp -r /home/hadoop/hadoop/ 192.168.0.91:/home/hadoop/scp -r /home/hadoop/hadoop/ 192.168.0.92:/home/hadoop/
8. Initializing NameNode with hadoop users in linux-node1/home/hadoop/hadoop/bin/hdfs namenode –format
#echo $?#sudo yum –y install tree# tree /home/hadoop/dfs
9. Start hadoop/home/hadoop/hadoop/sbin/start-dfs.sh/home/hadoop/hadoop/sbin/stop-dfs.sh
# View the process on the namenode Node
ps aux | grep --color namenode
# View processes on DataNode
ps aux | grep --color datanode
10. Start the yarn distributed computing framework[hadoop@linux-node1 .ssh]$ /home/hadoop/hadoop/sbin/start-yarn.sh starting yarn daemons
# View processes on the NameNode Node
ps aux | grep --color resourcemanager
# View processes on DataNode nodes
ps aux | grep --color nodemanager
Note: start-dfs.sh and start-yarn.sh can be replaced by start-all.sh
/home/hadoop/hadoop/sbin/stop-all.sh/home/hadoop/hadoop/sbin/start-all.sh
11. Start the jobhistory service and check the mapreduce status.# On the NameNode Node
[hadoop@linux-node1 ~]$ /home/hadoop/hadoop/sbin/mr-jobhistory-daemon.sh start historyserverstarting historyserver, logging to /home/hadoop/hadoop/logs/mapred-hadoop-historyserver-linux-node1.out
12. view the HDFS Distributed File System Status/home/hadoop/hadoop/bin/hdfs dfsadmin –report
# View File blocks. A file consists of these blocks.
/home/hadoop/hadoop/bin/hdfs fsck / -files -blocks
13. View hadoop cluster status on the web pageView HDFS status: http: // 192.168.0.89: 50070/
View Hadoop cluster status: http: // 192.168.0.89: 8088/
Original address: http://www.linuxprobe.com/centos-deploy-hadoop-cluster.html ghost