One, Hadoop 2.x distributed installation Deployment

Last Update:2016-06-29 Source: Internet

Author: User

Tags hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, Hadoop 2.x distributed installation Deployment 1. Distributed Deployment Hadoop 2.x1.1 clone virtual machine and complete related configuration 1.1.1 clone virtual machine

Click Legacy Virtual Machine –> manage –> clone –> next –> create complete clone –> write name hadoop-senior02–> Select Directory

1.1.2 Configuration Modifications

1) Start clone virtual machine (memory configuration: 01:2g;02:1.5g;03:1.5g)
2) Change the hostname: Change two places
3) Modify the NIC name
Edit/etc/udev/rules.d/70-persistent-net.rules
Comments out the row containing the eth0
Change the eth1 containing the eth1 line to Eth0

Edit/etc/sysconfig/network-script/ifcfg-eth0
, change the MAC address to the address of the virtual machine settings

5) Reboot after configuration
6) Set the fixed IP

7) Connect 3 machines to the CRT

1.2 Basic configuration preparation for virtual machines in a cluster

1) First Delete all the contents of the/tmp directory

$cd /tmp

$sudo RM-RF./*

2) Remove all hadoop-2.5.0,maven,m2

$cd /opt/modules/

Rm?RF ./had ooP?2.5.0/ Rm-rf./apache-maven-3.0.5/
Cd /.m2/ RM-RF./*

3) Map the host name and IP of all machines
Edit/opt/hosts
Add all machines to the mapping of each machine, that is, open the file on each virtual machine and add the following:

The Hosts file in Windows also configures the mappings

So on any machine can be connected to other machines in the cluster.

4) in the OPT directory of all machines, add a directory/app/and modify the attribution, all the clusters are done under this (the cluster installation directory must be unified!) ）

$sudo mkdir /opt/app

$sudo Chown-r beifeng:beifeng/opt/app/

5) Unzip the hadoop-2.5.0 into the app directory (put a machine in place and send it to other machines)

$tar -zxf /opt/softwares/hadoop-2.5.0.tar.gz -C /opt/app/

1.3 Reasonable planning for Hadoop service component deployment

The distributed architecture uses the master-slave architecture, if it is pseudo-distributed then the master-slave is in a machine, if distributed then the main node in a machine, from the node in multiple machines. Generally put Datanode and Nodermanager on a machine, the former use computer disk space to store data, the latter uses memory and CPU to calculate the analysis data.
If you use 3 virtual machines, you can configure the following

3 units	Hadoop-senior	Hadoop-senior02	hadoop-senior03
Memory	2G	1.5G	1.5G
Cpu	1 cores	1 cores	1 cores
Hard disk	20G	20G	20G
Service Components	Namenode
		ResourceManager
	Datanode	Datanode	Datanode
	NodeManager	NodeManager	NodeManager
	Mrhistoryserver		Secondarynamenode

If you use 2 virtual machines You can configure the following (only two machines are used in this exercise)

2 sets	hadoop-senior	hadoop-senior02
memory	2g	1.5g
CPU	1 core	1 core
hard disk	20g	20g
Service component	resoucemanager
		namenode
	datanode	datanode
	nodemanager	nodemanager
	secondarynamenode	mrhistoryserver

1.4 Configure each service component subordinate node 1.4.1 configuration ${java_home} with "Hadoop 2.x pseudo-distributed Deployment" as a template

Open hadoop-evn.sh,mapred-env,yarn-env.sh
Configuring the directory for JDK to Java_home

1.4.2 Configuring HDFs

To create the TMP directory:

$mkdir -p /opt/app/hadoop2.5.0/data/tmp

Open Core-site.xml and add the following configuration:

Open slaves, configured as follows

Open Hdfs-site.xml

1.4.3 Configuring Yarn

Open Yarn-site.xml, configured as follows:

1.4.4 Configuration Historyserver

Open Mapred.site.xml, configured as follows:

1.5 Distribute Hadoop to a machine and start hdfs,yarn1.5.1 distribution

$scp-r hadoop-2.5.0/ [email protected]-senior02.ibeifeng.com:/opt/app/

1.5.2 Start Hdfs,yarn

1) First format HDFs
2) Start Senior02 Namenode, two Datanode
3) Start senior ResourceManager, two NodeManager
4) Start the Senior02 jobserver
5) Start the senior Secondarynamenode

WEB UI View Datanode has generated 2

WEB UI View NodeManager There are also two

1.6 Testing 1.6.1 Uploading files

New Catalog

$bin-mkdir-p tmp/conf

Uploading files

$bin-put etc/hadoop/*-site.xml tmp/conf

Read file

$bin-text tmp/conf/core-site.xml

1.6.2 WordCount Program Test

1) New Directory

$bin-mkdir-p  mapreduce/wordcount/input

2) upload files to directory

bin/hdfs dfs -put /opt/datas/wc.input mapreduce/wordcount/input

3) Run WordCount program

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount mapreduce/wordcount/input mapreduce/wordcount/output

4) Read the file

$bin/hdfs dfs -text mapreduce/wordcount/output/par*

1.6.3 Benchmark Test (guide p315)

Test Disk Memory
1)

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar

$bin/yarn jar hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0.jar

1.7 Configuring SSH Keyless entry

The original is now all deleted:

$cd.ssh/$rm-rf./*

Configure the NodeManager and resourcemanage two master nodes on the master node separately:
1) generate a pair of public keys and keys

$ssh-keygen-t rsa

2) Copy the public key to each machine

$ssh-copy-id bigdata-senior.ibeifeng.com$ssh-copy-id bigdata-senior02.ibeifeng.com

3) Shh Link

$ssh bigdata-senior.ibeifeng.com$ssh hadoop-senior02.ibeifeng.com

1.8 Cluster Time synchronization

Cluster time synchronization

1.8.1 find a machine as a time server, all machines synchronize time with this time server

For example, on the 01 machine:
1) Check to see if the time server is installed:

sudo rmp -qa|grep ntp

2) View time server run status

sudo

3) Turn on the time server

sudo chkconfig ntpd start

4) Set Random start

on

5) View boot status

sudo chkconfig --list|grep ntpd

6) configuration file

sudo vi /etc/ntp.conf

Changes in three places:
(1) Delete the next line of comments, and modify the network segment (the first 3 strings of IP address, followed by. 0)

(2) Comment out the time server server

(3) Remove the following two lines of comments

server 127.127.1.0....fudge 127.127.1.1.0.....

7) Restart the server

sudo service ntpd restart

8) Synchronization of the time server with the BIOS

$sudo vi /etc/sysconfig/ntpd

Add Content:
Sync_hwlock=yes
Wait 5 minutes

1.8.2 Configure all machines to synchronize with the machine

1) Configure all machines to synchronize with this hadoop-senior machine

sudo /usr/sbin/ntpdate hadoop-senior.ibeifeng.com

2) Write a timed task that synchronizes time with the time server every time
First switch to root
Set update synchronization every 10 minutes

$sudo-e

Join: 0-59/10* * * */user/sbin/ntpdate hadoop-senior.ibeifeng.com

3) Set the time

sudo-s2015-11-17sudo-s17:54:00

One, Hadoop 2.x distributed installation Deployment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More