One, Hadoop 2.x distributed installation Deployment

Source: Internet
Author: User
Tags hdfs dfs

One, Hadoop 2.x distributed installation Deployment 1. Distributed Deployment Hadoop 2.x1.1 clone virtual machine and complete related configuration 1.1.1 clone virtual machine

Click Legacy Virtual Machine –> manage –> clone –> next –> create complete clone –> write name hadoop-senior02–> Select Directory

1.1.2 Configuration Modifications

1) Start clone virtual machine (memory configuration: 01:2g;02:1.5g;03:1.5g)
2) Change the hostname: Change two places
3) Modify the NIC name
Edit/etc/udev/rules.d/70-persistent-net.rules
Comments out the row containing the eth0
Change the eth1 containing the eth1 line to Eth0

Edit/etc/sysconfig/network-script/ifcfg-eth0
, change the MAC address to the address of the virtual machine settings

5) Reboot after configuration
6) Set the fixed IP

7) Connect 3 machines to the CRT

1.2 Basic configuration preparation for virtual machines in a cluster

1) First Delete all the contents of the/tmp directory

$cd /tmp

$sudo RM-RF./*

2) Remove all hadoop-2.5.0,maven,m2

$cd /opt/modules/

Rm?RF ./had ooP?2.5.0/ Rm-rf./apache-maven-3.0.5/
Cd  /.m2/ RM-RF./*

3) Map the host name and IP of all machines
Edit/opt/hosts
Add all machines to the mapping of each machine, that is, open the file on each virtual machine and add the following:

The Hosts file in Windows also configures the mappings

So on any machine can be connected to other machines in the cluster.

4) in the OPT directory of all machines, add a directory/app/and modify the attribution, all the clusters are done under this (the cluster installation directory must be unified!) )

$sudo mkdir /opt/app

$sudo Chown-r beifeng:beifeng/opt/app/

5) Unzip the hadoop-2.5.0 into the app directory (put a machine in place and send it to other machines)

$tar -zxf /opt/softwares/hadoop-2.5.0.tar.gz -C /opt/app/
1.3 Reasonable planning for Hadoop service component deployment

The distributed architecture uses the master-slave architecture, if it is pseudo-distributed then the master-slave is in a machine, if distributed then the main node in a machine, from the node in multiple machines. Generally put Datanode and Nodermanager on a machine, the former use computer disk space to store data, the latter uses memory and CPU to calculate the analysis data.
If you use 3 virtual machines, you can configure the following

3 units Hadoop-senior Hadoop-senior02 hadoop-senior03
Memory 2G 1.5G 1.5G
Cpu 1 cores 1 cores 1 cores
Hard disk 20G 20G 20G
Service Components Namenode
ResourceManager
Datanode Datanode Datanode
NodeManager NodeManager NodeManager
Mrhistoryserver Secondarynamenode

If you use 2 virtual machines You can configure the following (only two machines are used in this exercise)

2 sets hadoop-senior hadoop-senior02
memory 2g 1.5g
CPU 1 core 1 core
hard disk 20g 20g
Service component resoucemanager
namenode
datanode datanode
nodemanager nodemanager
secondarynamenode mrhistoryserver
1.4 Configure each service component subordinate node 1.4.1 configuration ${java_home} with "Hadoop 2.x pseudo-distributed Deployment" as a template

Open hadoop-evn.sh,mapred-env,yarn-env.sh
Configuring the directory for JDK to Java_home

1.4.2 Configuring HDFs

To create the TMP directory:

$mkdir -p /opt/app/hadoop2.5.0/data/tmp

Open Core-site.xml and add the following configuration:

Open slaves, configured as follows

Open Hdfs-site.xml

1.4.3 Configuring Yarn

Open Yarn-site.xml, configured as follows:

1.4.4 Configuration Historyserver

Open Mapred.site.xml, configured as follows:

1.5 Distribute Hadoop to a machine and start hdfs,yarn1.5.1 distribution
$scp-r hadoop-2.5.0/ [email protected]-senior02.ibeifeng.com:/opt/app/
1.5.2 Start Hdfs,yarn

1) First format HDFs
2) Start Senior02 Namenode, two Datanode
3) Start senior ResourceManager, two NodeManager
4) Start the Senior02 jobserver
5) Start the senior Secondarynamenode

WEB UI View Datanode has generated 2

WEB UI View NodeManager There are also two

1.6 Testing 1.6.1 Uploading files

New Catalog

$bin-mkdir-p tmp/conf

Uploading files

$bin-put etc/hadoop/*-site.xml tmp/conf


Read file

$bin-text tmp/conf/core-site.xml

1.6.2 WordCount Program Test

1) New Directory

$bin-mkdir-p  mapreduce/wordcount/input

2) upload files to directory

bin/hdfs dfs -put /opt/datas/wc.input mapreduce/wordcount/input

3) Run WordCount program

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount mapreduce/wordcount/input mapreduce/wordcount/output

4) Read the file

$bin/hdfs dfs -text mapreduce/wordcount/output/par*

1.6.3 Benchmark Test (guide p315)

Test Disk Memory
1)

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar

2)

$bin/yarn jar hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0.jar
1.7 Configuring SSH Keyless entry

The original is now all deleted:

$cd.ssh/$rm-rf./*

Configure the NodeManager and resourcemanage two master nodes on the master node separately:
1) generate a pair of public keys and keys

$ssh-keygen-t rsa

2) Copy the public key to each machine

$ssh-copy-id bigdata-senior.ibeifeng.com$ssh-copy-id bigdata-senior02.ibeifeng.com

3) Shh Link

$ssh bigdata-senior.ibeifeng.com$ssh hadoop-senior02.ibeifeng.com


1.8 Cluster Time synchronization

Cluster time synchronization

1.8.1 find a machine as a time server, all machines synchronize time with this time server

For example, on the 01 machine:
1) Check to see if the time server is installed:

sudo rmp -qa|grep ntp

2) View time server run status

sudo

3) Turn on the time server

sudo chkconfig ntpd start

4) Set Random start

on

5) View boot status

sudo chkconfig --list|grep ntpd

6) configuration file

sudo vi /etc/ntp.conf

Changes in three places:
(1) Delete the next line of comments, and modify the network segment (the first 3 strings of IP address, followed by. 0)

(2) Comment out the time server server

(3) Remove the following two lines of comments

server 127.127.1.0....fudge 127.127.1.1.0.....

7) Restart the server

sudo service ntpd restart

8) Synchronization of the time server with the BIOS

$sudo vi /etc/sysconfig/ntpd

Add Content:
Sync_hwlock=yes
Wait 5 minutes

1.8.2 Configure all machines to synchronize with the machine

1) Configure all machines to synchronize with this hadoop-senior machine

sudo /usr/sbin/ntpdate hadoop-senior.ibeifeng.com

2) Write a timed task that synchronizes time with the time server every time
First switch to root
Set update synchronization every 10 minutes

$sudo-e

Join: 0-59/10* * * */user/sbin/ntpdate hadoop-senior.ibeifeng.com

3) Set the time

sudo-s2015-11-17sudo-s17:54:00

One, Hadoop 2.x distributed installation Deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.