first, system and software environment
1. Operating system
CentOS Release 6.5 (Final)
Kernel version:2.6.32-431.el6.x86_64
master.fansik.com:192.168.83.118
node1.fansik.com:192.168.83.119
node2.fansik.com:192.168.83.120
2.JDK version:1.7.0_75
3.Hadoop version:2.7.2
second, pre-installation preparation
1. Turn off firewall and SELinux
# Setenforce 0
# Service Iptables Stop
2. Configuring the host file
192.168.83.118 master.fansik.com
192.168.83.119 node1.fansik.com
192.168.83.120 node2.fansik.com
3. Generate Secret Key
master.fansik.com on Execution # Ssh-keygen always Enter
# SCP ~/.ssh/id_rsa.pub Node1.fansik.com:/root/.ssh/authorized_keys
# SCP ~/.ssh/id_rsa.pub Node2.fansik.com:/root/.ssh/authorized_keys
# chmod 600/root/.ssh/authorized_keys
4. Installing the JDK
# Tar XF jdk-7u75-linux-x64.tar.gz
# MV jdk1.7.0_75/usr/local/jdk1.7
# vim/etc/profile.d/java.sh Add the following:
Export java_home=/usr/local/jdk1.7
Export JRE_HOME=/USR/LOCAL/JDK1.7/JRE
Export classpath=.: $JAVA _home/lib:/dt.jar: $JAVA _home/lib/tools.jar
Export path= $PATH: $JAVA _home/bin
# Source/etc/profile
5. Synchronization Time ( otherwise there may be problems when analyzing the file )
# ntpdate 202.120.2.101 ( server of Shanghai Jiaotong University )
Third, install Hadoop
The official download site for Hadoop , you can choose the appropriate version download:http://hadoop.apache.org/releases.html
Perform the following operations on three machines, respectively:
# Tar XF hadoop-2.7.2.tar.gz
# MV Hadoop-2.7.2/usr/local/hadoop
# cd/usr/local/hadoop/
# mkdir tmp DFS dfs/data dfs/name
iv. Configuring Hadoop
Configuration on the master.fansik.com
# Vim/usr/local/hadoop/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs: // 192.168.83.118:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/ usr/local/hadoop/tmp</value> </property> <property> <name> io.file.buffer.size</name> <value>121702</value> </property></ Configuration>
# Vim/usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value >file:/usr/local/hadoop/dfs/name</value> </property> <property> <name> dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/dfs/data</value> </ property> <property> <name>dfs.replication</name> <value>2</value > </property> <property> <name>dfs.namenode.secondary.http-address</ name> <value>192.168.83.118.9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </ Property></configuration>
# Cp/usr/local/hadoop/etc/hadoop/mapred-site.xml.template/usr/local/hadoop/etc/hadoop/mapred-site.xml
# VIM (!$|/usr/local/hadoop/etc/hadoop/mapred-site.xml)
<configuration> <property> <name>mapreduce.framework.name</name> <value >yarn</value> </property> <property> <name>mapreduce.jobhistory.address </name> <value>192.168.83.118:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.83.118:19888</ Value> </property></configuration>
# Vim/usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value> mapreduce_shuffle</value> </property> <property> <name> Yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property > <name>yarn.resourcemanager.address</name> <value>192.168.83.118:8032</value> </ property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value> 192.168.83.118:8030</value> </property> <property> <name> Yarn.resourcemanager.resource-tracker.address</name> <value>192.168.83.118:8031</value> </ property> <property> <name>yarn.resourcemanager.admin.address</name> <value> 192.168.83.118:8033</value> </property> <property> <name> Yarn.resourcemanager.webapp.address</name> <value>192.168.83.118:8088</value> </property > <property> <name>yarn.resourcemanager.resource.memory.mb</name> <value>2048</ Value> </PROPERTY></configuration>
# vim/usr/local/hadoop/etc/hadoop/slaves
192.168.83.119
192.168.83.120
Synchronize the etc directory on master to Node1 and node2
# rsync-av/usr/local/hadoop/etc/node1.fansik.com:/usr/local/hadoop/etc/
# rsync-av/usr/local/hadoop/etc/node2.fansik.com:/usr/local/hadoop/etc/
operate on master.fansik.com , two node will start automatically
Configure environment variables for Hadoop
# vim/etc/profile.d/hadoop.sh
Export Path=/usr/local/hadoop/bin:/usr/local/hadoop/bin: $PATH
# Source/etc/profile
Initialization
# HDFs Namenode-format
See if you have an error
# echo $?
Start the service
# start-all.sh
Stop Service
# stop-all.sh
After you start the service, you can access it via the following address:
http://192.168.83.118:8088
http://192.168.83.118:50070
v. Testing Hadoop
Operating on the master.fansik.com
# HDFs Dfs-mkdir/fansik
If you are prompted to ignore the following warnings when creating a directory
16/07/29 17:38:27 WARN util. nativecodeloader:unable to load Native-hadoop library for your pform ... using Builtin-java classes where applicable
Workaround:
Go to the following sites to download the appropriate version:
http://dl.bintray.com/sequenceiq/sequenceiq-bin/
# TAR-XVF Hadoop-native-64-2.7.0.tar-c/usr/local/hadoop/lib/native/
If prompted:copyfromlocal:cannot create directory/123/. Name node is in safe mode
Explains that Hadoop has the security mode enabled, and the workaround
HDFs Dfsadmin-safemode Leave
Copy the myservicce.sh to the Fansik directory
# HDFs dfs-copyfromlocal./myservicce.sh/fansik
See /fansik directory for myservicce.sh files
# HDFs Dfs-ls/fansik
Analyzing files using workcount
# Hadoop Jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount/fansik/ myservicce.sh/zhangshan/
To view the parsed file:
# HDFs dfs-ls/zhangshan/
Found 2 Items
-rw-r--r--2 root supergroup 0 2016-08-02 15:19/zhangshan/_success
-rw-r--r--2 root supergroup 415 2016-08-02 15:19/zhangshan/part-r-00000
To view the analysis results:
# HDFs dfs-cat/zhangshan/part-r-00000
Hadoop Installation and Configuration