One, Hadoop 2.x distributed installation Deployment 1. Distributed Deployment Hadoop 2.x1.1 clone virtual machine and complete related configuration 1.1.1 clone virtual machine
Click Legacy Virtual Machine –> manage –> clone –> next –> create complete clone –> write name hadoop-senior02–> Select Directory
1.1.2 Configuration Modifications
1) Start clone virtual machine (memory configuration: 01:2g;02:1.5g;03:1.5g)
2) Change the hostname: Change two places
3) Modify the NIC name
Edit/etc/udev/rules.d/70-persistent-net.rules
Comments out the row containing the eth0
Change the eth1 containing the eth1 line to Eth0
Edit/etc/sysconfig/network-script/ifcfg-eth0
, change the MAC address to the address of the virtual machine settings
5) Reboot after configuration
6) Set the fixed IP
7) Connect 3 machines to the CRT
1.2 Basic configuration preparation for virtual machines in a cluster
1) First Delete all the contents of the/tmp directory
$cd /tmp
$sudo RM-RF./*
2) Remove all hadoop-2.5.0,maven,m2
$cd /opt/modules/
Rm?RF ./had ooP?2.5.0/ Rm-rf./apache-maven-3.0.5/
Cd /.m2/ RM-RF./*
3) Map the host name and IP of all machines
Edit/opt/hosts
Add all machines to the mapping of each machine, that is, open the file on each virtual machine and add the following:
The Hosts file in Windows also configures the mappings
So on any machine can be connected to other machines in the cluster.
4) in the OPT directory of all machines, add a directory/app/and modify the attribution, all the clusters are done under this (the cluster installation directory must be unified!) )
$sudo mkdir /opt/app
$sudo Chown-r beifeng:beifeng/opt/app/
5) Unzip the hadoop-2.5.0 into the app directory (put a machine in place and send it to other machines)
$tar -zxf /opt/softwares/hadoop-2.5.0.tar.gz -C /opt/app/
1.3 Reasonable planning for Hadoop service component deployment
The distributed architecture uses the master-slave architecture, if it is pseudo-distributed then the master-slave is in a machine, if distributed then the main node in a machine, from the node in multiple machines. Generally put Datanode and Nodermanager on a machine, the former use computer disk space to store data, the latter uses memory and CPU to calculate the analysis data.
If you use 3 virtual machines, you can configure the following
3 units |
Hadoop-senior |
Hadoop-senior02 |
hadoop-senior03 |
Memory |
2G |
1.5G |
1.5G |
Cpu |
1 cores |
1 cores |
1 cores |
Hard disk |
20G |
20G |
20G |
Service Components |
Namenode |
|
|
|
|
ResourceManager |
|
|
Datanode |
Datanode |
Datanode |
|
NodeManager |
NodeManager |
NodeManager |
|
Mrhistoryserver |
|
Secondarynamenode |
If you use 2 virtual machines You can configure the following (only two machines are used in this exercise)
2 sets |
hadoop-senior |
hadoop-senior02 |
memory |
2g |
1.5g |
CPU |
1 core |
1 core |
hard disk |
20g |
20g |
Service component |
resoucemanager |
|
|
|
namenode |
|
datanode |
datanode |
|
nodemanager |
nodemanager |
|
secondarynamenode |
mrhistoryserver |
1.4 Configure each service component subordinate node 1.4.1 configuration ${java_home} with "Hadoop 2.x pseudo-distributed Deployment" as a template
Open hadoop-evn.sh,mapred-env,yarn-env.sh
Configuring the directory for JDK to Java_home
1.4.2 Configuring HDFs
To create the TMP directory:
$mkdir -p /opt/app/hadoop2.5.0/data/tmp
Open Core-site.xml and add the following configuration:
Open slaves, configured as follows
Open Hdfs-site.xml
1.4.3 Configuring Yarn
Open Yarn-site.xml, configured as follows:
1.4.4 Configuration Historyserver
Open Mapred.site.xml, configured as follows:
1.5 Distribute Hadoop to a machine and start hdfs,yarn1.5.1 distribution
$scp-r hadoop-2.5.0/ [email protected]-senior02.ibeifeng.com:/opt/app/
1.5.2 Start Hdfs,yarn
1) First format HDFs
2) Start Senior02 Namenode, two Datanode
3) Start senior ResourceManager, two NodeManager
4) Start the Senior02 jobserver
5) Start the senior Secondarynamenode
WEB UI View Datanode has generated 2
WEB UI View NodeManager There are also two
1.6 Testing 1.6.1 Uploading files
New Catalog
$bin-mkdir-p tmp/conf
Uploading files
$bin-put etc/hadoop/*-site.xml tmp/conf
Read file
$bin-text tmp/conf/core-site.xml
1.6.2 WordCount Program Test
1) New Directory
$bin-mkdir-p mapreduce/wordcount/input
2) upload files to directory
bin/hdfs dfs -put /opt/datas/wc.input mapreduce/wordcount/input
3) Run WordCount program
$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount mapreduce/wordcount/input mapreduce/wordcount/output
4) Read the file
$bin/hdfs dfs -text mapreduce/wordcount/output/par*
1.6.3 Benchmark Test (guide p315)
Test Disk Memory
1)
$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar
2)
$bin/yarn jar hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0.jar
1.7 Configuring SSH Keyless entry
The original is now all deleted:
$cd.ssh/$rm-rf./*
Configure the NodeManager and resourcemanage two master nodes on the master node separately:
1) generate a pair of public keys and keys
$ssh-keygen-t rsa
2) Copy the public key to each machine
$ssh-copy-id bigdata-senior.ibeifeng.com$ssh-copy-id bigdata-senior02.ibeifeng.com
3) Shh Link
$ssh bigdata-senior.ibeifeng.com$ssh hadoop-senior02.ibeifeng.com
1.8 Cluster Time synchronization
Cluster time synchronization
1.8.1 find a machine as a time server, all machines synchronize time with this time server
For example, on the 01 machine:
1) Check to see if the time server is installed:
sudo rmp -qa|grep ntp
2) View time server run status
sudo
3) Turn on the time server
sudo chkconfig ntpd start
4) Set Random start
on
5) View boot status
sudo chkconfig --list|grep ntpd
6) configuration file
sudo vi /etc/ntp.conf
Changes in three places:
(1) Delete the next line of comments, and modify the network segment (the first 3 strings of IP address, followed by. 0)
(2) Comment out the time server server
(3) Remove the following two lines of comments
server 127.127.1.0....fudge 127.127.1.1.0.....
7) Restart the server
sudo service ntpd restart
8) Synchronization of the time server with the BIOS
$sudo vi /etc/sysconfig/ntpd
Add Content:
Sync_hwlock=yes
Wait 5 minutes
1.8.2 Configure all machines to synchronize with the machine
1) Configure all machines to synchronize with this hadoop-senior machine
sudo /usr/sbin/ntpdate hadoop-senior.ibeifeng.com
2) Write a timed task that synchronizes time with the time server every time
First switch to root
Set update synchronization every 10 minutes
$sudo-e
Join: 0-59/10* * * */user/sbin/ntpdate hadoop-senior.ibeifeng.com
3) Set the time
sudo-s2015-11-17sudo-s17:54:00
One, Hadoop 2.x distributed installation Deployment