Configuration of hadoop2.3.0 single point pseudo distribution and Multi-Point Distribution

Last Update:2018-06-02 Source: Internet

Author: User

Tags addgroup hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Assume that the macbooks, virtualbox4.3.6, and virtualbox are installed with ipvt13.10. In the multi-point distribution environment, after a machine is configured, the other two machines are cloned, with three machines in total. 1. ConfiguretheEnvironmentBash language: sudoapt-getinstall-yopenjdk-7-jdkopenssh-serversudoaddgrouphadoopsu

Machine mac book, virtualbox4.3.6, and virtualbox are installed with ipvt13.10. In the multi-point distribution environment, After configuring a machine, clone the other two machines. There are three machines in total. 1. Configure the Environment Bash language: sudo apt-get install-y openjdk-7-jdk openssh-server sudo addgroup hadoop su

1. Configure the Environment

Bash language: sudo apt-get install-y openjdk-7-jdk openssh-server

Sudo addgroup hadoop

Sudo adduser-ingroup hadoop # create password

Sudo shortdo

Hadoop ALL = (ALL) ALL # hadoop user can use sudo

Su-hadoop # need password

Ssh-keygen-t rsa-P "" # Enter file (/home/hadoop/. ssh/id_rsa)

Cat/home/hadoop/. ssh/id_rsa.pub>/home/hadoop/. ssh/authorized_keys

Wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz

Tar zxvf hadoop-2.3.0.tar.gz

Sudo cp-r hadoop-2.3.0 // opt

Cd/opt

Sudo ln-s hadoop-2.3.0 hadoop

Sudo chown-R hadoop-hadoop hadoop-2.3.0

Sed-I '$ a \ nexport JAVA_HOME =/usr/lib/jvm/java-7-openjdk-amd64 'hadoop/etc/hadoop/hadoop-env.sh

2. Configure hadoop single Node environment

Cp mapred-site.xml.template mapred-site.xml

Vi mapred-site.xml

Mapreduce. cluster. temp. dir

No description

True

Mapreduce. cluster. local. dir

No description

True

Vi yarn-site.xml

Yarn. resourcemanager. resource-tracker.address

127.0.0.1: 8021

Host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.

Yarn. resourcemanager. schedager. address

127.0.0.1: 8022

Host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager.

Yarn. resourcemanager. schedager. class

Org. apache. hadoop. yarn. server. resourcemanager. schedity. capacity. capacityschedity

In case you do not want to use the default scheduler

Yarn. resourcemanager. address

127.0.0.1: 8023

The host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager.

Yarn. nodemanager. local-dirs

The local directories used by the nodemanager

Yarn. nodemanager. address

0.0.0.0: 8041

The nodemanagers bind to this port

Yarn. nodemanager. resource. memory-mb

10240

The amount of memory on the NodeManager in GB

Yarn. nodemanager. remote-app-log-dir

/App-logs

Directory on hdfs where the application logs are moved

Yarn. nodemanager. log-dirs

The directories used by Nodemanagers as log directories

Yarn. nodemanager. aux-services

Mapreduce_shuffle

Shuffle service that needs to be set for Map Reduce to run

Supplemental Configuration:

Mapred-site.xml

Mapreduce. framework. name

Yarn

Core-site.xml

Fs. defaultFS

Hdfs: // 127.0.0.1: 9000

Hdfs-site.xml

Dfs. replication

Bash language: cd/opt/hadoop

Bin/hdfs namenode-format

Sbin/hadoop-daemon.sh start namenode

Sbin/hadoop-daemon.sh start datanode

Sbin/yarn-daemon.sh start resourcemanager

Sbin/yarn-daemon.sh start nodemanager

Jps

# Run a job on this node

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 5 5 10

3. Running Problem

14/01/04 05:38:22 INFO ipc. client: Retrying connect to server: localhost/127.0.0.1: 8023. already tried 9 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries = 10, sleepTime = 1 SECONDS)

Netstat-atnp # found tcp6

Solve:

Cat/proc/sys/net/ipv6/conf/all/disable_ipv6 #0 means ipv6 is on, 1 means off

Cat/proc/sys/net/ipv6/conf/lo/disable_ipv6

Cat/proc/sys/net/ipv6/conf/default/disable_ipv6

Ip a | grep inet6 # have means ipv6 is on

Vi/etc/sysctl. conf

Net. ipv6.conf. all. disable_ipv6 = 1

Net. ipv6.conf. default. disable_ipv6 = 1

Net. ipv6.conf. lo. disable_ipv6 = 1

Sudo sysctl-p # have the same effect with reboot

Sudo/etc/init. d/networking restart

4. Cluster setup

Config/opt/hadoop/etc/hadoop/{hadoop-env.sh, yarn-env.sh}

Export JAVA_HOME =/usr/lib/jvm/java-7-openjdk-amd64

Cd/opt/hadoop

Mkdir-p tmp/{data, name} # on every node. name on namenode, data on datanode

Vi/etc/hosts # hostname also changed on each node

192.168.1.110 cloud1

192.168.1.112 cloud2

192.168.1.114 cloud3

Vi/opt/hadoop/etc/hadoop/slaves

Cloud2

Cloud3

Core-site.xml

Fs. defaultFS

Hdfs: // cloud1: 9000

Io. file. buffer. size

131072

Hadoop. tmp. dir

/Opt/hadoop/tmp

A base for other temporary directories.

It is said that dfs. datanode. data. dir needs to be cleared; otherwise, datanode cannot be started.

Hdfs-site.xml

Dfs. namenode. name. dir

/Opt/hadoop/name

Dfs. datanode. data. dir

/Opt/hadoop/data

Dfs. replication

Yarn-site.xml

Yarn. resourcemanager. address

Cloud1: 8032

ResourceManager host: port for clients to submit jobs.

Yarn. resourcemanager. schedager. address

Cloud1: 8030

ResourceManager host: port for ApplicationMasters to talk to schedager to obtain resources.

Yarn. resourcemanager. resource-tracker.address

Cloud1: 8031

ResourceManager host: port for NodeManagers.

Yarn. resourcemanager. admin. address

Cloud1: 8033

ResourceManager host: port for administrative commands.

Yarn. resourcemanager. webapp. address

Cloud1: 8088

ResourceManager web-ui host: port.

Yarn. resourcemanager. schedager. class

Org. apache. hadoop. yarn. server. resourcemanager. schedity. capacity. capacityschedity

In case you do not want to use the default scheduler

Yarn. nodemanager. resource. memory-mb

10240

The amount of memory on the NodeManager in MB

Yarn. nodemanager. local-dirs

The local directories used by the nodemanager

Yarn. nodemanager. log-dirs

The directories used by Nodemanagers as log directories

Yarn. nodemanager. remote-app-log-dir

/App-logs

Directory on hdfs where the application logs are moved

Yarn. nodemanager. aux-services

Mapreduce_shuffle

Shuffle service that needs to be set for Map Reduce to run

Mapred-site.xml

Mapreduce. framework. name

Yarn

Mapreduce. jobhistory. address

Cloud1: 10020

Mapreduce. jobhistory. webapp. address

Cloud1: 19888

Cd/opt/hadoop/

Bin/hdfs namenode-format

Sbin/start-dfs.sh # cloud1 NameNode SecondaryNameNode, cloud2 and cloud3 DataNode

Sbin/start-yarn.sh # cloud1 ResourceManager, cloud2 and cloud3 NodeManager

Jps

View Cluster status bin/hdfs dfsadmin-report

View File Block Composition bin/hdfs fsck/-files-blocks

NameNode view hdfs http: // 192.168.1.110: 50070

View RM http: // 192.168.1.110: 8088

Bin/hdfs dfs-mkdir/input

Bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar randomwriter input

5. Questions:

Q: 14/01/05 23:59:05 WARN util. NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

A:/opt/hadoop/lib/native/the dynamic link library below is 32bit and should be replaced with 64-bit

Q: Are you sure you want to continue connecting (yes/no) displayed during ssh logon )? Solution

A: Modify/etc/ssh/ssh_config and change the # StrictHostKeyChecking ask to StrictHostKeyChecking no.

Q: The DataNode of two slaves cannot be added to the cluster system,

A: Delete the content lines of 127.0.1.1 or localhost in/etc/hosts.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More