Assume that the macbooks, virtualbox4.3.6, and virtualbox are installed with ipvt13.10. In the multi-point distribution environment, after a machine is configured, the other two machines are cloned, with three machines in total. 1. ConfiguretheEnvironmentBash language: sudoapt-getinstall-yopenjdk-7-jdkopenssh-serversudoaddgrouphadoopsu
Machine mac book, virtualbox4.3.6, and virtualbox are installed with ipvt13.10. In the multi-point distribution environment, After configuring a machine, clone the other two machines. There are three machines in total. 1. Configure the Environment Bash language: sudo apt-get install-y openjdk-7-jdk openssh-server sudo addgroup hadoop su
Machine mac book, virtualbox4.3.6, and virtualbox are installed with ipvt13.10. In the multi-point distribution environment, After configuring a machine, clone the other two machines. There are three machines in total.
1. Configure the Environment
Bash language: sudo apt-get install-y openjdk-7-jdk openssh-server
Sudo addgroup hadoop
Sudo adduser-ingroup hadoop # create password
Sudo shortdo
Hadoop ALL = (ALL) ALL # hadoop user can use sudo
Su-hadoop # need password
Ssh-keygen-t rsa-P "" # Enter file (/home/hadoop/. ssh/id_rsa)
Cat/home/hadoop/. ssh/id_rsa.pub>/home/hadoop/. ssh/authorized_keys
Wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz
Tar zxvf hadoop-2.3.0.tar.gz
Sudo cp-r hadoop-2.3.0 // opt
Cd/opt
Sudo ln-s hadoop-2.3.0 hadoop
Sudo chown-R hadoop-hadoop hadoop-2.3.0
Sed-I '$ a \ nexport JAVA_HOME =/usr/lib/jvm/java-7-openjdk-amd64 'hadoop/etc/hadoop/hadoop-env.sh
2. Configure hadoop single Node environment
Cp mapred-site.xml.template mapred-site.xml
Vi mapred-site.xml
Mapreduce. cluster. temp. dir
No description
True
Mapreduce. cluster. local. dir
No description
True
Vi yarn-site.xml
Yarn. resourcemanager. resource-tracker.address
127.0.0.1: 8021
Host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.
Yarn. resourcemanager. schedager. address
127.0.0.1: 8022
Host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager.
Yarn. resourcemanager. schedager. class
Org. apache. hadoop. yarn. server. resourcemanager. schedity. capacity. capacityschedity
In case you do not want to use the default scheduler
Yarn. resourcemanager. address
127.0.0.1: 8023
The host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager.
Yarn. nodemanager. local-dirs
The local directories used by the nodemanager
Yarn. nodemanager. address
0.0.0.0: 8041
The nodemanagers bind to this port
Yarn. nodemanager. resource. memory-mb
10240
The amount of memory on the NodeManager in GB
Yarn. nodemanager. remote-app-log-dir
/App-logs
Directory on hdfs where the application logs are moved
Yarn. nodemanager. log-dirs
The directories used by Nodemanagers as log directories
Yarn. nodemanager. aux-services
Mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce to run
Supplemental Configuration:
Mapred-site.xml
Mapreduce. framework. name
Yarn
Core-site.xml
Fs. defaultFS
Hdfs: // 127.0.0.1: 9000
Hdfs-site.xml
Dfs. replication
1
Bash language: cd/opt/hadoop
Bin/hdfs namenode-format
Sbin/hadoop-daemon.sh start namenode
Sbin/hadoop-daemon.sh start datanode
Sbin/yarn-daemon.sh start resourcemanager
Sbin/yarn-daemon.sh start nodemanager
Jps
# Run a job on this node
Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 5 5 10
3. Running Problem
14/01/04 05:38:22 INFO ipc. client: Retrying connect to server: localhost/127.0.0.1: 8023. already tried 9 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries = 10, sleepTime = 1 SECONDS)
Netstat-atnp # found tcp6
Solve:
Cat/proc/sys/net/ipv6/conf/all/disable_ipv6 #0 means ipv6 is on, 1 means off
Cat/proc/sys/net/ipv6/conf/lo/disable_ipv6
Cat/proc/sys/net/ipv6/conf/default/disable_ipv6
Ip a | grep inet6 # have means ipv6 is on
Vi/etc/sysctl. conf
Net. ipv6.conf. all. disable_ipv6 = 1
Net. ipv6.conf. default. disable_ipv6 = 1
Net. ipv6.conf. lo. disable_ipv6 = 1
Sudo sysctl-p # have the same effect with reboot
Sudo/etc/init. d/networking restart
4. Cluster setup
Config/opt/hadoop/etc/hadoop/{hadoop-env.sh, yarn-env.sh}
Export JAVA_HOME =/usr/lib/jvm/java-7-openjdk-amd64
Cd/opt/hadoop
Mkdir-p tmp/{data, name} # on every node. name on namenode, data on datanode
Vi/etc/hosts # hostname also changed on each node
192.168.1.110 cloud1
192.168.1.112 cloud2
192.168.1.114 cloud3
Vi/opt/hadoop/etc/hadoop/slaves
Cloud2
Cloud3
Core-site.xml
Fs. defaultFS
Hdfs: // cloud1: 9000
Io. file. buffer. size
131072
Hadoop. tmp. dir
/Opt/hadoop/tmp
A base for other temporary directories.
It is said that dfs. datanode. data. dir needs to be cleared; otherwise, datanode cannot be started.
Hdfs-site.xml
Dfs. namenode. name. dir
/Opt/hadoop/name
Dfs. datanode. data. dir
/Opt/hadoop/data
Dfs. replication
2
Yarn-site.xml
Yarn. resourcemanager. address
Cloud1: 8032
ResourceManager host: port for clients to submit jobs.
Yarn. resourcemanager. schedager. address
Cloud1: 8030
ResourceManager host: port for ApplicationMasters to talk to schedager to obtain resources.
Yarn. resourcemanager. resource-tracker.address
Cloud1: 8031
ResourceManager host: port for NodeManagers.
Yarn. resourcemanager. admin. address
Cloud1: 8033
ResourceManager host: port for administrative commands.
Yarn. resourcemanager. webapp. address
Cloud1: 8088
ResourceManager web-ui host: port.
Yarn. resourcemanager. schedager. class
Org. apache. hadoop. yarn. server. resourcemanager. schedity. capacity. capacityschedity
In case you do not want to use the default scheduler
Yarn. nodemanager. resource. memory-mb
10240
The amount of memory on the NodeManager in MB
Yarn. nodemanager. local-dirs
The local directories used by the nodemanager
Yarn. nodemanager. log-dirs
The directories used by Nodemanagers as log directories
Yarn. nodemanager. remote-app-log-dir
/App-logs
Directory on hdfs where the application logs are moved
Yarn. nodemanager. aux-services
Mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce to run
Mapred-site.xml
Mapreduce. framework. name
Yarn
Mapreduce. jobhistory. address
Cloud1: 10020
Mapreduce. jobhistory. webapp. address
Cloud1: 19888
Cd/opt/hadoop/
Bin/hdfs namenode-format
Sbin/start-dfs.sh # cloud1 NameNode SecondaryNameNode, cloud2 and cloud3 DataNode
Sbin/start-yarn.sh # cloud1 ResourceManager, cloud2 and cloud3 NodeManager
Jps
View Cluster status bin/hdfs dfsadmin-report
View File Block Composition bin/hdfs fsck/-files-blocks
NameNode view hdfs http: // 192.168.1.110: 50070
View RM http: // 192.168.1.110: 8088
Bin/hdfs dfs-mkdir/input
Bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar randomwriter input
5. Questions:
Q: 14/01/05 23:59:05 WARN util. NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
A:/opt/hadoop/lib/native/the dynamic link library below is 32bit and should be replaced with 64-bit
Q: Are you sure you want to continue connecting (yes/no) displayed during ssh logon )? Solution
A: Modify/etc/ssh/ssh_config and change the # StrictHostKeyChecking ask to StrictHostKeyChecking no.
Q: The DataNode of two slaves cannot be added to the cluster system,
A: Delete the content lines of 127.0.1.1 or localhost in/etc/hosts.