The Hadoop cluster is built in detail

Source: Internet
Author: User
Keywords Hadoop cluster modify Hosts file Ubuntu virtual machine Hadoop cluster build detail
Tags .gz address analysis asus asus notebook cat configuration configured

1. Cluster building strategy

Analysis:

I have only 3 computers, two ASUS laptops i7, i3 processors, and a desktop PENTIUM4 processor. To better test the zookeeper function,

We need a total of 6 Ubuntu (Ubuntu 14.04.3 LTS) hosts. Here's my host distribution policy:

I7: Open 4 Ubuntu virtual machines are

Virtual machine name memory hard disk network connection

Master 1G 20G Bridge

Master2 1G 20G Bridge

RM 512M 20G Bridging

Slave3 1G 20G Bridge

I3: Install Ubuntu system as Datanode node slave1

P4: Install Ubuntu System as Datanode node Slave2

2, modify the Hosts file

All machines use the Ubuntu account uniformly and set up the Hadoop group to add Ubuntu to the Hadoop group.

Add User:

SU//Switch to root

useradd-d/home/ubuntu-m ubuntu//Add password

passwd ubuntu//Set Password

To add a group:

Groupadd Hadoop

Add Ubuntu to Hadoop:

Usermod-g Hadoop Ubuntu

To view the group in which the current user is located:

Exit Ubuntu

Groups

Display

Hadoop

Edit hosts:

sudo nano/etc/hosts

Modified to the following

127.0.0.1localhost

127.0.1.1ubuntu

#根据实际ip修改

192.168.0.7 Master

192.168.0.10 Master2

192.168.0.4 slave1

192.168.0.9 Slave2

192.168.0.3 Slave3

192.168.0.8 RM

# The following lines are desirable for IPV6 capable hosts

:: 1 ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 Ip6-mcastprefix

Ff02::1 Ip6-allnodes

Ff02::2 ip6-allrouters

3, SSH no secret landing

(1) Master

Need to login all machines without a secret

(2) Master2

Requires no secret login Master

(3) RM

Requires no secret login slave1 slave2 slave3

(4) No-secret landing implementation

Take master landing to Master2 as an example

Current Position master, execute

If SSH is not installed, perform

sudo apt install openssh-server

After successful execution

SSH-KEYGEN-T RSA

All the way to enter

CD ~

CD. SSH

ls

Authorized_keys Id_rsa id_rsa.pub

Where Id_rsa is the private key to save the computer, Id_rsa.pub is the public key needs to be given to Mater2

Backup to Id_rsa.pub

CP Id_rsa.pub Id_rsa_m.pub

First, give yourself a

Cat Id_rsa_m.pub >> Authorized_keys

and send it to Master2.

SCP Id_rsa_m.pub Ubuntu@master2:~/.ssh

Switch to Master2, execute

CD ~

CD. SSH

Cat Id_rsa_m.pub >> Authorized_keys

Switch Master to perform

SSH Master2

Successfully landed in Master2

Ubuntu@master2:

Exit

Ubuntu@master:

3. Install JDK

JDK version: Jdk-8u91-linux-x64.tar.gz 117.118M

Copy the jdk-8u91-linux-x64.tar.gz to the master/tmp directory and unzip to/home/ubuntu/solf.

All Hadoop-related software will be installed in this directory

Start decompression

TAR-ZXVF jdk-8u91-linux-x64.tar.gz-c/home/ubuntu/solf

CD ~/solf

ls

jdk1.8.0_91

Add an environment variable, edit/etc/profile, and add the following at the end of the file

Java_home =/home/ubuntu/solf/jdk1.8.0_91

Export path= $PATH: $JAVA _home

Export path= $PATH: $JAVA _home/bin

Export path= $PATH: $JAVA _home/jre

Source/etc/profile

ubuntu@master:~/solf$ java-version

Java Version "1.8.0_91"

Java (TM) SE Runtime Environnement (build 1.8.0_91-b14)

Java HotSpot (TM) 64-bit Server VM (build 25.91-b14, Mixed mode)

Indicates a successful JDK installation

4. Installing Hadoop

Hadoop version: Hadoop-2.7.2.tar.gz 207.077M

Extract hadoop-2.7.2.tar.gz to/home/ubuntu/solf

TAR-ZXVF hadoop-2.7.2.tar.gz-c/home/ubuntu/solf

ls

hadoop-2.7.2 jdk1.8.0_91

Add an environment variable, edit/etc/profile, and add the following at the end of the file

Hadoop_install =/home/ubuntu/solf/hadoop-2.7.2

Export path= $PATH: $HADOOP _install

Export path= $PATH: $HADOOP _install/bin

Export path= $PATH: $HADOOP _install/sbin

Source/etc/profile

ubuntu@master:~/solf$ Hadoop version

Hadoop 2.7.2

Subversion Https://git-wip-us.apache.org/repos/asf/hadoop.git-r b165c4fe8a74265c792ce23f546c64604acf0e41

Compiled by Jenkins on 2016-01-26t00:08z

Compiled with Protoc 2.5.0

From source with checksum d0fda26633fa762bff87ec759ebe689c

This command is run Using/home/ubuntu/solf/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

Indicates that the Hadoop installation was successful

5. Configure Hadoop Environment

Hadoop can be configured in three modes: standalone, pseudo-distributed, fully distributed (clustered)

To allow our Hadoop to switch in various modes, a link operation is required here

CD ~

CD Solf

For later CD convenience, here to hadoop-2.7.2 do a link

Ln-s hadoop-2.7.2 Hadoop

CD ~/solf/hadoop/etc/

ls

Hadoop

At this point there is only one Hadoop directory, this directory is the default profile of Hadoop configuration file, if you want to do mode switch a directory must not be

Cp-r Hadoop hadoop-full//cluster mode

Cp-r Hadoop hadoop-local//Standalone mode

Cp-r Hadoop hadoop-presudo/pseudo distribution mode

RM Hadoop//Delete original directory

If I want to use cluster mode now (other modes are the same)

Ln-s Hadoop-full Hadoop

ls

lrwxrwxrwx 1 Ubuntu modified 3 07:21 Hadoop-> hadoop-full

Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 hadoop-full

Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 hadoop-local

Drwxr-xr-x 2 ubuntu Hadoop 4096 modified 1 09:55 Hadoop-presudo

At this point the default profile points to Hadoop-full

The following is a detailed configuration of the files in Hadoop-full:

(1) hadoo-env.sh

Simply modify a row, Java_home configured as/etc/profile inside Java_home path, preferably using absolute path

Export java_home=/home/ubuntu/solf/jdk1.8.0_91

(2) Core-site.xml

Fs.defaultfs

Hdfs://ns1

Hadoop.tmp.dir

/home/ubuntu/solf/hadoop/tmp

Ha.zookeeper.quorum

slave1:2181,slave2:2181,slave3:2181

(3) Hdfs-site.xml

Dfs.nameservices

ns1

Dfs.ha.namenodes.ns1

Nn1,nn2

Dfs.namenode.rpc-address.ns1.nn1

master:9000

Dfs.namenode.http-address.ns1.nn1

master:50070

Dfs.namenode.rpc-address.ns1.nn2

master2:9000

Dfs.namenode.http-address.ns1.nn2

master2:50070

Dfs.namenode.shared.edits.dir

Qjournal://slave1:8485;slave2:8485;slave3:8485/ns1

Dfs.journalnode.edits.dir

/home/ubuntu/solf/hadoop-2.7.2/journal

Dfs.ha.automatic-failover.enabled

True

Dfs.client.failover.proxy.provider.ns1

Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Dfs.ha.fencing.methods

Sshfence

Shell (/bin/true)

Dfs.ha.fencing.ssh.private-key-files

/root/.ssh/id_rsa

Dfs.ha.fencing.ssh.connect-timeout

30000

Dfs.permissions

False

Dfs.heartbeat.interval

3

Dfs.namenode.heartbeat.recheck-interval

35000

(4) Mapred-site.xml

No mapred-site.xml in the original directory, need CP

CP Mapred-site.xml.template Mapred-site.xml

Modified as follows:

Mapreduce.framework.name

Yarn

(5) Yarn-site.xml

Yarn.resourcemanager.hostname

Rm

Yarn.nodemanager.aux-services

Mapreduce_shuffle

(6) slaves specifies datanode hostname

Slave1

Slave2

Slave3

6. Send JDK and Hadoop environment to all machines

CD ~

Scp-r Solf Ubuntu@master2:/home/ubuntu

Scp-r Solf Ubuntu@rm:/home/ubuntu

Scp-r Solf Ubuntu@slave1:/home/ubuntu

Scp-r Solf Ubuntu@slave2:/home/ubuntu

Scp-r Solf Ubuntu@slave3:/home/ubuntu

Cd/etc

Su

Toggle Root User

Root@master:/etc

SCP Profile Root@master2:/etc

SCP Profile Root@rm:/etc

SCP Profile Root@slave1:/etc

SCP Profile Root@slave2:/etc

SCP Profile Root@slave3:/etc

Go into each machine to execute separately

Source/etc/profile

Java-version

Hadoop version

If both are displayed correctly, the JDK and Hadoop environments on each machine are set up

7, installation Configuration zookeeper

Zookeeper version: Zookeeper-3.4.5.tar.gz 16.018M

Extract zookeeper-3.4.5.tar.gz to/home/ubuntu/solf

TAR-ZXVF zookeeper-3.4.5.tar.gz-c/home/ubuntu/solf

ls

hadoop-2.7.2 jdk1.8.0_91 zookeeper-3.4.5

Enter SLAVE1

(1) Modify the configuration

CD ~/solf/zookeeper-3.4.5/conf

CP Zoo_sample.cfg Zoo.cfg

Edit

Nano zoo.cfg

Modify

Datadir=/home/ubuntu/solf/zookeeper-3.4.5/tmp

Add at end of file

server.1=slave1:2888:3888

server.2=slave2:2888:3888

server.3=slave3:2888:3888

(2) Create TMP file

CD ~/solf/zookeeper-3.4.5

mkdir tmp

CD tmp

Touch myID

Echo 1 > myID

View

Cat myID

1

Indicates a successful configuration

(3) Send the configured zookeeper-3.4.5 to Slave2, Slave3

CD ~/solf

Scp-r zookeeper-3.4.5 Ubuntu@slave2:~/solf

Scp-r zookeeper-3.4.5 Ubuntu@slave3:~/solf

Modify myID

Slave2:

Echo 2 > myID

Slave3:

Echo 3 > myID

At this point, the six-machine Hadoop cluster is configured to complete.

Start cluster:

Strictly follow the following boot sequence

(1) Start ZK

Next execution in slave1 slave2 slave3

CD ~/solf/zookeeper-3.4.5/bin

./zkserver.sh Start

View status

./zkserver.sh Status

JMX enabled by default

Using config:/home/ubuntu/solf/zookeeper-3.4.5/bin/. /conf/zoo.cfg

Mode:follower

viewing processes

JPs

51858 Jps

51791 Quorumpeermain

(2) Start Journalnode

Under Master, execute

hadoop-daemons.sh Start Journalnode

Slave2:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave2.out

Slave1:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave1.out

Slave3:starting Journalnode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-journalnode-slave3.out

View processes separately on slave1 slave2 Slave3

51958 Journalnode

52007 Jps

51791 Quorumpeermain

(3) Format HDFs

If HDFs is never formatted, under Master, execute

HDFs Namenode-format

Then send the TMP file under Hadoop to Master3 's Hadoop directory

CD ~/solf/hadoop

SCP tmp Ubuntu@master2:~/solf/hadoop

(4) Format ZK

Under Master, execute

If zookeeper is never formatted, under Master, execute

HDFs Zkfc-formatzk

(5) Start HDFs

Under Master, execute

start-dfs.sh

Master2:starting Namenode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-namenode-master2.out

Master:starting Namenode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-namenode-master.out

Slave2:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave2.out

Slave1:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave1.out

Slave3:starting Datanode, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-datanode-slave3.out

Starting journal nodes [slave1 slave2 Slave3]

Slave2:journalnode running as Process 2299. Stop it.

Slave1:journalnode running as Process 2459. Stop it.

Slave3:journalnode running as Process 51958. Stop it.

Starting ZK Failover controllers on NN hosts [master Master2]

Master:starting ZKFC, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-zkfc-master.out

Master2:starting ZKFC, logging to/home/ubuntu/solf/hadoop-2.7.2/logs/hadoop-ubuntu-zkfc-master2.out

View native processes

JPs

14481 Namenode

14885 Jps

14780 Dfszkfailovercontroller

12764 Fsshell

View Java processes on slave1\2\3 Java

JPs

51958 Journalnode

52214 Jps

52078 DataNode

51791 Quorumpeermain

(6) Start yarn

Under RM, perform

start-yarn.sh

JPs

6965 ResourceManager

7036 Jps

View Slave1\2\3 Java processes

JPs

52290 NodeManager

51958 Journalnode

52410 Jps

52078 DataNode

51791 Quorumpeermain

If all the processes are as shown above, then the Hadoop cluster is really running and can be entered in the browser

192.168.0.7:50070 View the cluster

Test cluster:

1, Test Namenode switch

(1) Visit 192.168.0.7:50070

Show ' master:9000 ' (active)

Access 192. 168.0.10:50070

Show ' master2:9000 ' (standby)

Indicates that the current Namenode is managed by master, Master2 as a standby

(2) Kill the Namenode process under Master

Kill-9 14481

JPs

14944 Jps

14780 Dfszkfailovercontroller

12764 Fsshell

Start the Namenode process again

hadoop-daemon.sh Start Namenode

To perform the operation in (1), the following is displayed:

Show ' master:9000 ' (standby)

' master2:9000 ' (active)

This proves that zookeeper can switch the standby Namenode machine normally in case of namenode downtime

2, verify the HDFs file storage System

View the HDFS file structure

Hadoop fs-ls-r/

Create Solf Folder

Hadoop Fs-mkdir/solf

Uploading files

CD to file directory

Hadoop fs-put Spark-2.0.0-bin-without-hadoop.tgz/solf

View the HDFS file structure

Hadoop fs-ls-r/

Drwxr-xr-x-ubuntu supergroup 0 2017-08-04 07:16/solf

-rw-r--r--3 ubuntu supergroup 114274242 2017-08-04 07:16/solf/spark-2.0.0-bin-without-hadoop.tgz

Upload success!

3. Run WordCount Program

Use Eclipse to generate Wordcount.jar or use the example of Hadoop self

Upload input file to HDFs

CD to file directory

Hadoop fs-put File*.txt/input

CD to Wordcount.jar directory

Hadoop jar Wordcount.jar Com.will.hadoop.wordcount/input/wcout

17/08/04 07:42:01 INFO Client. Rmproxy:connecting to ResourceManager atrm/192.168.0.8:8032

17/08/04 07:42:02 WARN MapReduce. Jobresourceuploader:hadoop command-line optionparsing not performed. Implement the Tool interface and execute your creator Withtoolrunner to remedy this.

17/08/04 07:42:02 INFO input. Fileinputformat:total input paths to Process:3

17/08/04 07:42:03 INFO MapReduce. Jobsubmitter:number of Splits:3

17/08/04 07:42:03 INFO MapReduce. Jobsubmitter:submitting Tokens for job:job_1501854022188_0002

17/08/04 07:42:10 INFO Impl. Yarnclientimpl:submitted applicationapplication_1501854022188_0002

17/08/04 07:42:11 INFO MapReduce. Job:the URL to track the job:http://rm:8088/proxy/application_1501854022188_0002/

17/08/04 07:42:11 INFO MapReduce. Job:running job:job_1501854022188_0002

17/08/04 07:42:26 INFO MapReduce. Job:job job_1501854022188_0002 running in Ubermode:false

17/08/04 07:42:26 INFO MapReduce. Job:map vs Reduce 0%

17/08/04 07:42:48 INFO MapReduce. Job:map 100% Reduce 0%

17/08/04 07:43:10 INFO MapReduce. Job:map 100% Reduce 100%

17/08/04 07:43:11 INFO MapReduce. Job:job job_1501854022188_0002 completedsuccessfully

Open/wcout/part-r-00000

I 2

Apple 4

Car 4

Cat 4

Exit 4

Feel 2

3

4

Good,so 2

Gula

Hadoop 3

Happy 4

happy! 2

Hello 1

is 3

My 3

Pande 4

Peer 4

Quit 4

Test 1

TESTXX 2

This 3

Test success!

This shows that the Hadoop cluster build success!

Next will be introduced HBase, hive to this cluster ...

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.