Hadoop Learning note -22.hadoop2.x environment Setup and configuration

Source: Internet
Author: User
Tags hostname to ip hadoop fs

Since 2015 spent 2 months to learn the hadoop1.x learning tutorial, to Hadoop this magical little elephant has a preliminary understanding, but also the content of each study summarized, but also formed my blog series "Hadoop Learning Notes Series." In fact, as early as 2014 hadoop2.x version has started to pop up, and has become the current mainstream. Of course, there are some non-offline computing frameworks that are true when calculating frame storm, near real-time computational framework spark and so on. I believe that understanding of hadoop2.x children's shoes should know that the 2.x compared to the 1.x version of the update should not be a half, the most significant reflected in two points:

(1) The Namenode of HDFs can be used in a clustered manner, enhancing the namenodes level expansion capability and high availability, respectively:HDFs Federation and HA;

(2) MapReduce will jobtracker resource management and task Lifecycle management (including timing triggering and monitoring), split into two separate components and renamed YARN(yet another Resource negotiator);

Therefore, I decided to take advantage of now become a single dog (full of sentimental) time, the hadoop2.x study, also incidentally share some study notes articles and friends to share.

As to what hadoop2.x has changed compared to the 1.x, if you do not know, then you can read this article "HADOOP2 's improved content introduction" probably understand that this article will not introduce these, directly on the environment to build and configure the content.

First, the preparatory work

(1) A well-equipped computer or notebook (mainly memory, memory, memory, important things to say three times)

(2) A virtual machine software you have used (can be VMware, virtual box or other, I use VMware WorkStation)

(3) An SSH client software you have used (can make xshell,xftp, winscp and so on, I use the xshell+xftp)

(4) Hadoop2.4.1, JDK1.7 Linux installation package (of course you can also download directly online)

Of course, thoughtful I have prepared for you Hadoop2.4.1 and JDK1.7 package, you can download through this link: click I download

Second, pseudo-distributed building 2.1 Basic Network configuration

After installing VMware Workstation, your network adapter will come out two more, all you have to do is set up a static IP address for the extra 2nd network card, and here I am Ethernet 3 Here we set the gateway to 192.168.22.1, set the IP for the host to 192.168.22.2, that is to say, our virtual machine must be in 192.168.22.x this network segment.

Then we set the network connection mode for the virtual machine, choose VMNET8 (NAT mode), if you do not know what NAT means, please Baidu a bit.

2.2 Installing a Linux image for a virtual machine

Here we choose CentOS, and you can choose other Linux distributions as well.

2.3 Setting a static IP address

Enter the command setup, enter device Configuration, input service network restart restart the NIC

In addition, want to access the virtual machine on the host, need open port, for convenience, we directly shut down the virtual machine's firewall: sudo service iptables stop

Check Status: sudo chkconfig iptables off

2.4 Set full command line mode to start

Input commands sudo vi/etc/inittab, modify id:3, input reboot restart the virtual machine

2.5 using Xshell instead of VMware Direct operation

This time you can use Xshell and no longer need to hit the command directly in VMware, you will find Xshell very cool!

2.6 Adding Hadoop users to the sudo user group

Because the root user of Linux is too large to use the root user is very insecure, so we generally use a general user to operate, in the use of the need for high privileges with the sudo command to execute. Therefore, we need to add Hadoop users to the sudo user group here.

The input command, Su, vi/etc/sudoers, find this line: Root all= (All) all

Then add a line to the line below it: Hadoop all= (All) all

Finally save the exit.

2.7 Changing the hostname to IP address mapping relationship

(1) Reboot, Hadoop-master.manulife, sudo vi/etc/sysconfig/network
(2) sudo vi/etc/hosts-plus line: 192.168.22.100 hadoop-master.manulife

2.8 Create a folder that is dedicated to installing packages (not necessary)

(1) rm-rf p* d* music/videos/templates/
(2) MkDir app-Installed place
(3) Where to place the installation package mkdir Local

2.9 Installing the JDK

(1) Upload JDK to virtual machine, here with SFTP software

(2) Decompression jdk:tar-zvxf jdk-c. /app/

(3) Setting environment variables:

sudo vi/etc/profile
Export java_home=/home/hadoop/app/jdk1.7.0_65
Export path= $PATH: $JAVA _home/bin;
Source/etc/profile

2.10 Installing Hadoop

(1) Upload JDK to virtual machine, here with SFTP software

(2) Decompression jdk:tar-zvxf hadoop-c. /app/

(3) Delete redundant doc files in the share folder in Hadoop (not necessary): RM-RF doc

(4) Set up some important configuration files in the ETC folder in Hadoop: CD etc-Hadoop-env.sh,core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml

hadoop-env.sh

Vim hadoop-env.sh
#第27行
Export JAVA_HOME=/HOME/HADOOP/APP/JDK

Core-site.xml

<!--Specify the file system schema (URI) used by Hadoop, the address of the boss of HDFs (NameNode)--
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master.manulife:9000</value>
</property>
<!--specify the storage directory where the Hadoop runtime generates files--
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop/tmp</value>
</property>

Hdfs-site.xml

<!--Specify the number of HDFs replicas-
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Yarn-site.xml

<!--Specify the address of yarn's boss (ResourceManager)---
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master.manulife</value>
</property>
<!--Reducer How to get data--
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

Mapred-site.xml (need to change first name: MV Mapred-site.xml. Template Mapred-site.xml)

<!--specify Mr to run on yarn--
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

(5) Setting environment variables

sudo vi/etc/profile

Export java_home=/usr/java/jdk1.7.0_65
Export hadoop_home=/itcast/hadoop-2.4.1
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin

Source/etc/profile

(6) Formatting Namenode

Hadoop Namenode-format

(7) Two core functions of starting Hadoop: HDFs and yarn

First edit the slaves configuration file (here Our main node is both Datanode and Namenode): VI Slaves, add hadoop-master.manulife

Start hdfs:sbin/start-dfs.sh

Start yarn:sbin/start-yarn.sh

Verify that it is started: JPS

(8) Accessing the Hadoop Manager in the host

First, add the IP address and host name of the virtual machine to Windows HOSTS:WINDOWS/SYSTEM32/ETC, plus a line: 192.168.22.100 Hadoop-master.manulife

Open Browser Input: http://hadoop-master.manulife:50070

2.11 HDFs Simple test

Upload a file to Hdfs:hadoop fs-put xxxx.tar.gz hdfs://hadoop-master.manulife:9000/
Download a file from HDFs: Hadoop fs-get hdfs://hadoop-master.manulife:9000/xxxx.tar.gz

2.12 MapReduce Simple test

Here, you run a example of the PI that comes with Hadoop:

(1) cd/home/hadoop/app/hadoop/share/hadoop/mapreduce/
(2) Hadoop jar Hadoop-mapreduce-examples-2.4.1.jar Pi 5 5

2.13 SSH password-free login

SSH Password-free login is set up in general Linux distributed cluster, where we first set the primary node to SSH password-free login:

(1) ssh-keygen-t RSA
(2) CD. SSH, CP id_rsa.pub Authorized_keys
(3) SSH localhost

Third, the Java Development Environment Building 3.1 preparation 3.2 Using the Java API Operation HDFs 3.3 Simple test Four, fully distributed construction

Zhou Xurong

Source: http://www.cnblogs.com/edisonchou/

The copyright of this article is owned by the author and the blog Park, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to give the original link.

Hadoop Learning note -22.hadoop2.x environment Setup and configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.