1. Cluster building strategy
Analysis:
I have only 3 computers, two ASUS laptops i7, i3 processors, and a desktop PENTIUM4 processor. To better test the zookeeper function,
We need a total of 6 Ubuntu (Ubuntu 14.04.3 LTS) hosts. Here's my host distribution policy:
I7: Open 4 Ubuntu virtual machines are
Virtual machine name memory hard disk network connection
Master 1G 20G Bridge
Master2 1G 20G Bridge
RM 512M 20G Bridging
Slave3 1G 20G Bridge
I3: Install Ubuntu system as Datanode node slave1
P4: Install Ubuntu System as Datanode node Slave2
2, modify the Hosts file
All machines use the Ubuntu account uniformly and set up the Hadoop group to add Ubuntu to the Hadoop group.
Add User:
SU//Switch to root
useradd-d/home/ubuntu-m ubuntu//Add password
passwd ubuntu//Set Password
To add a group:
Groupadd Hadoop
Add Ubuntu to Hadoop:
Usermod-g Hadoop Ubuntu
To view the group in which the current user is located:
Exit Ubuntu
Groups
Display
Hadoop
Edit hosts:
sudo nano/etc/hosts
Modified to the following
127.0.0.1localhost
127.0.1.1ubuntu
#根据实际ip修改
192.168.0.7 Master
192.168.0.10 Master2
192.168.0.4 slave1
192.168.0.9 Slave2
192.168.0.3 Slave3
192.168.0.8 RM
# The following lines are desirable for IPV6 capable hosts
:: 1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 Ip6-mcastprefix
Ff02::1 Ip6-allnodes
Ff02::2 ip6-allrouters
3, SSH no secret landing
(1) Master
Need to login all machines without a secret
(2) Master2
Requires no secret login Master
(3) RM
Requires no secret login slave1 slave2 slave3
(4) No-secret landing implementation
Take master landing to Master2 as an example
Current Position master, execute
If SSH is not installed, perform
sudo apt install openssh-server
After successful execution
SSH-KEYGEN-T RSA
All the way to enter
CD ~
CD. SSH
ls
Authorized_keys Id_rsa id_rsa.pub
Where Id_rsa is the private key to save the computer, Id_rsa.pub is the public key needs to be given to Mater2
Backup to Id_rsa.pub
CP Id_rsa.pub Id_rsa_m.pub
First, give yourself a
Cat Id_rsa_m.pub >> Authorized_keys
and send it to Master2.
SCP Id_rsa_m.pub Ubuntu@master2:~/.ssh
Switch to Master2, execute
CD ~
CD. SSH
Cat Id_rsa_m.pub >> Authorized_keys
Switch Master to perform
SSH Master2
Successfully landed in Master2
Ubuntu@master2:
Exit
Ubuntu@master:
3. Install JDK
JDK version: Jdk-8u91-linux-x64.tar.gz 117.118M
Copy the jdk-8u91-linux-x64.tar.gz to the master/tmp directory and unzip to/home/ubuntu/solf.
All Hadoop-related software will be installed in this directory
Start decompression
TAR-ZXVF jdk-8u91-linux-x64.tar.gz-c/home/ubuntu/solf
CD ~/solf
ls
jdk1.8.0_91
Add an environment variable, edit/etc/profile, and add the following at the end of the file
Java_home =/home/ubuntu/solf/jdk1.8.0_91
Export path= $PATH: $JAVA _home
Export path= $PATH: $JAVA _home/bin
Export path= $PATH: $JAVA _home/jre
Source/etc/profile
ubuntu@master:~/solf$ java-version
Java Version "1.8.0_91"
Java (TM) SE Runtime Environnement (build 1.8.0_91-b14)
Java HotSpot (TM) 64-bit Server VM (build 25.91-b14, Mixed mode)
Indicates a successful JDK installation
4. Installing Hadoop
Hadoop version: Hadoop-2.7.2.tar.gz 207.077M
Extract hadoop-2.7.2.tar.gz to/home/ubuntu/solf
TAR-ZXVF hadoop-2.7.2.tar.gz-c/home/ubuntu/solf
ls
hadoop-2.7.2 jdk1.8.0_91
Add an environment variable, edit/etc/profile, and add the following at the end of the file
Hadoop_install =/home/ubuntu/solf/hadoop-2.7.2
Export path= $PATH: $HADOOP _install
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Source/etc/profile
ubuntu@master:~/solf$ Hadoop version
Hadoop 2.7.2
Subversion Https://git-wip-us.apache.org/repos/asf/hadoop.git-r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by Jenkins on 2016-01-26t00:08z
Compiled with Protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command is run Using/home/ubuntu/solf/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
Indicates that the Hadoop installation was successful
5. Configure Hadoop Environment
Hadoop can be configured in three modes: standalone, pseudo-distributed, fully distributed (clustered)
To allow our Hadoop to switch in various modes, a link operation is required here
CD ~
CD Solf
For later CD convenience, here to hadoop-2.7.2 do a link
Ln-s hadoop-2.7.2 Hadoop
CD ~/solf/hadoop/etc/
ls
Hadoop
At this point there is only one Hadoop directory, this directory is the default profile of Hadoop configuration file, if you want to do mode switch a directory must not be
Cp-r Hadoop hadoop-full//cluster mode
Cp-r Hadoop hadoop-local//Standalone mode
Cp-r Hadoop hadoop-presudo/pseudo distribution mode
RM Hadoop//Delete original directory
If I want to use cluster mode now (other modes are the same)
Ln-s Hadoop-full Hadoop
ls
lrwxrwxrwx 1 Ubuntu modified 3 07:21 Hadoop-> hadoop-full
Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 hadoop-full
Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 hadoop-local
Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 Hadoop-presudo
At this point the default profile points to Hadoop-full
The following is a detailed configuration of the files in Hadoop-full:
(1) hadoo-env.sh
Simply modify a row, Java_home configured as/etc/profile inside Java_home path, preferably using absolute path
Export java_home=/home/ubuntu/solf/jdk1.8.0_91
(2) Core-site.xml
Fs.defaultfs
Hdfs://ns1
Hadoop.tmp.dir
/home/ubuntu/solf/hadoop/tmp
Ha.zookeeper.quorum
slave1:2181,slave2:2181,slave3:2181
(3) Hdfs-site.xml
Dfs.nameservices
ns1
Dfs.ha.namenodes.ns1
Nn1,nn2
Dfs.namenode.rpc-address.ns1.nn1
master:9000
Dfs.namenode.http-address.ns1.nn1
master:50070
Dfs.namenode.rpc-address.ns1.nn2
master2:9000
Dfs.namenode.http-address.ns1.nn2
master2:50070
Dfs.namenode.shared.edits.dir
Qjournal://slave1:8485;slave2:8485;slave3:8485/ns1
Dfs.journalnode.edits.dir
/home/ubuntu/solf/hadoop-2.7.2/journal
Dfs.ha.automatic-failover.enabled
True
Dfs.client.failover.proxy.provider.ns1
Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
Dfs.ha.fencing.methods
Sshfence
Shell (/bin/true)
Dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa
Dfs.ha.fencing.ssh.connect-timeout
30000
Dfs.permissions
False
Dfs.heartbeat.interval
3
Dfs.namenode.heartbeat.recheck-interval
35000
(4) Mapred-site.xml
No mapred-site.xml in the original directory, need CP
CP Mapred-site.xml.template Mapred-site.xml
Modified as follows:
Mapreduce.framework.name
Yarn
(5) Yarn-site.xml
Yarn.resourcemanager.hostname
Rm
Yarn.nodemanager.aux-services
Mapreduce_shuffle
(6) slaves specifies datanode hostname
Slave1
Slave2
Slave3
6. Send JDK and Hadoop environment to all machines
CD ~
Scp-r Solf Ubuntu@master2:/home/ubuntu
Scp-r Solf Ubuntu@rm:/home/ubuntu
Scp-r Solf Ubuntu@slave1:/home/ubuntu
Scp-r Solf Ubuntu@slave2:/home/ubuntu
Scp-r Solf Ubuntu@slave3:/home/ubuntu
Cd/etc
Su
Toggle Root User
Root@master:/etc
SCP Profile Root@master2:/etc
SCP Profile Root@rm:/etc
SCP Profile Root@slave1:/etc
SCP Profile Root@slave2:/etc
SCP Profile Root@slave3:/etc
Go into each machine to execute separately
Source/etc/profile
Java-version
Hadoop version
If both are displayed correctly, the JDK and Hadoop environments on each machine are set up
7, installation Configuration zookeeper
Zookeeper version: Zookeeper-3.4.5.tar.gz 16.018M
Extract zookeeper-3.4.5.tar.gz to/home/ubuntu/solf
TAR-ZXVF zookeeper-3.4.5.tar.gz-c/home/ubuntu/solf
ls
hadoop-2.7.2 jdk1.8.0_91 zookeeper-3.4.5
Enter SLAVE1
(1) Modify the configuration
CD ~/solf/zookeeper-3.4.5/conf
CP Zoo_sample.cfg Zoo.cfg
Edit
Nano zoo.cfg
Modify
Datadir=/home/ubuntu/solf/zookeeper-3.4.5/tmp
Add at end of file
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
(2) Create TMP file
CD ~/solf/zookeeper-3.4.5
mkdir tmp
CD tmp
Touch myID
Echo 1 > myID
View
Cat myID
1
Indicates a successful configuration
(3) Send the configured zookeeper-3.4.5 to Slave2, Slave3
CD ~/solf
Scp-r zookeeper-3.4.5 Ubuntu@slave2:~/solf
Scp-r zookeeper-3.4.5 Ubuntu@slave3:~/solf
Modify myID
Slave2:
Echo 2 > myID
Slave3:
Echo 3 > myID
At this point, the six-machine Hadoop cluster is configured to complete.
Start cluster:
Strictly follow the following boot sequence
(1) Start ZK
Next execution in slave1 slave2 slave3
CD ~/solf/zookeeper-3.4.5/bin
./zkserver.sh Start
View status
./zkserver.sh Status
JMX enabled by default
Using config:/home/ubuntu/solf/zookeeper-3.4.5/bin/. /conf/zoo.cfg
Mode:follower
viewing processes
JPs
51858 Jps
51791 Quorumpeermain
(2) Start Journalnode
Under Master, execute
hadoop-daemons.sh Start Journalnode
Slave2:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave2.out
Slave1:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave1.out
Slave3:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave3.out
View processes separately on slave1 slave2 Slave3
51958 Journalnode
52007 Jps
51791 Quorumpeermain
(3) Format HDFs
If HDFs is never formatted, under Master, execute
HDFs Namenode-format
Then send the TMP file under Hadoop to Master3 's Hadoop directory
CD ~/solf/hadoop
SCP tmp Ubuntu@master2:~/solf/hadoop
(4) Format ZK
Under Master, execute
If zookeeper is never formatted, under Master, execute
HDFs Zkfc-formatzk
(5) Start HDFs
Under Master, execute
start-dfs.sh
Master2:starting Namenode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-namenode-master2.out
Master:starting Namenode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-namenode-master.out
Slave2:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave2.out
Slave1:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave1.out
Slave3:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave3.out
Starting journal nodes [slave1 slave2 Slave3]
Slave2:journalnode running as Process 2299. Stop it.
Slave1:journalnode running as Process 2459. Stop it.
Slave3:journalnode running as Process 51958. Stop it.
Starting ZK Failover controllers on NN hosts [master Master2]
Master:starting ZKFC, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-zkfc-master.out
Master2:starting ZKFC, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-zkfc-master2.out
View native processes
JPs
14481 Namenode
14885 Jps
14780 Dfszkfailovercontroller
12764 Fsshell
View Java processes on slave1\2\3 Java
JPs
51958 Journalnode
52214 Jps
52078 DataNode
51791 Quorumpeermain
(6) Start yarn
Under RM, perform
start-yarn.sh
JPs
6965 ResourceManager
7036 Jps
View Slave1\2\3 Java processes
JPs
52290 NodeManager
51958 Journalnode
52410 Jps
52078 DataNode
51791 Quorumpeermain
If all the processes are as shown above, then the Hadoop cluster is really running and can be entered in the browser
192.168.0.7:50070 View the cluster
Test cluster:
1, Test Namenode switch
(1) Visit 192.168.0.7:50070
Show ' master:9000 ' (active)
Access 192. 168.0.10:50070
Show ' master2:9000 ' (standby)
Indicates that the current Namenode is managed by master, Master2 as a standby
(2) Kill the Namenode process under Master
Kill-9 14481
JPs
14944 Jps
14780 Dfszkfailovercontroller
12764 Fsshell
Start the Namenode process again
hadoop-daemon.sh Start Namenode
To perform the operation in (1), the following is displayed:
Show ' master:9000 ' (standby)
' master2:9000 ' (active)
This proves that zookeeper can switch the standby Namenode machine normally in case of namenode downtime
2, verify the HDFs file storage System
View the HDFS file structure
Hadoop fs-ls-r/
Create Solf Folder
Hadoop Fs-mkdir/solf
Uploading files
CD to file directory
Hadoop fs-put Spark-2.0.0-bin-without-hadoop.tgz/solf
View the HDFS file structure
Hadoop fs-ls-r/
Drwxr-xr-x-ubuntu supergroup 0 2017-08-04 07:16/solf
-rw-r--r--3 ubuntu supergroup 114274242 2017-08-04 07:16/solf/spark-2.0.0-bin-without-hadoop.tgz
Upload success!
3. Run WordCount Program
Use Eclipse to generate Wordcount.jar or use the example of Hadoop self
Upload input file to HDFs
CD to file directory
Hadoop fs-put File*.txt/input
CD to Wordcount.jar directory
Hadoop jar Wordcount.jar Com.will.hadoop.wordcount/input/wcout
17/08/04 07:42:01 INFO Client. Rmproxy:connecting to ResourceManager atrm/192.168.0.8:8032
17/08/04 07:42:02 WARN MapReduce. Jobresourceuploader:hadoop command-line optionparsing not performed. Implement the Tool interface and execute your creator Withtoolrunner to remedy this.
17/08/04 07:42:02 INFO input. Fileinputformat:total input paths to Process:3
17/08/04 07:42:03 INFO MapReduce. Jobsubmitter:number of Splits:3
17/08/04 07:42:03 INFO MapReduce. Jobsubmitter:submitting Tokens for job:job_1501854022188_0002
17/08/04 07:42:10 INFO Impl. Yarnclientimpl:submitted applicationapplication_1501854022188_0002
17/08/04 07:42:11 INFO MapReduce. Job:the URL to track the job:http://rm:8088/proxy/application_1501854022188_0002/
17/08/04 07:42:11 INFO MapReduce. Job:running job:job_1501854022188_0002
17/08/04 07:42:26 INFO MapReduce. Job:job job_1501854022188_0002 running in Ubermode:false
17/08/04 07:42:26 INFO MapReduce. Job:map vs Reduce 0%
17/08/04 07:42:48 INFO MapReduce. Job:map 100% Reduce 0%
17/08/04 07:43:10 INFO MapReduce. Job:map 100% Reduce 100%
17/08/04 07:43:11 INFO MapReduce. Job:job job_1501854022188_0002 completedsuccessfully
Open/wcout/part-r-00000
I 2
Apple 4
Car 4
Cat 4
Exit 4
Feel 2
3
4
Good,so 2
Gula
Hadoop 3
Happy 4
happy! 2
Hello 1
is 3
My 3
Pande 4
Peer 4
Quit 4
Test 1
TESTXX 2
This 3
Test success!
This shows that the Hadoop cluster build success!
Next will be introduced HBase, hive to this cluster ...