Installation of Hadoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment and Objectives:

- system : Vmware/ubuntu 12.04

- Hadoop version : 0.20.2

- My node configuration ( fully distributed cluster)

Master (Job Tracker)	192.168.221.130	H1
Slave (Task Tracker/data node)	192.168.221.141	H2
Slave (Task Tracker/data node)	192.168.221.142	H3

- user : Hadoop_admin

- target : Successfully start hadoop,http://localhost:50030,http://localhost:50070, display the process correctly

Because it is the Linux 0 Foundation installation process difficult, in the group of friends with the help of a few days to install (Heroes do not laugh), the following some details on IT operations is too basic, there will be missing knowledge and even understand the incorrect, some even do not understand the steps to implement, I hope the guidance. (in addition to Mr. Huang, this process is very much on the web, such as the Apache Hadoop wiki running Hadoop on Ubuntu Linux (Multi-node Cluster), here only to record the problems and steps in my own installation)

The basic process, I divided into the following points

Installing Ubuntu using VMware

I'm using Ubuntu 12, and I've got some software/tools (the link is Sina Micro-disk).

· VMWare Workstation (go to the official website for free)

· Ubuntu-12.04.1-desktop-i386.iso

· Jdk-7u7-windows-i586.rar

· Because the teacher repeatedly stressed that the differences between different versions of Hadoop, the Novice is best to use the same version of Hadoop with teachers, namely hadoop-0.20.2.tar.gz

· WinSCP (I use), putty or securecrt to transfer jdk, Hadoop to Ubuntu

Install Ubuntu

Basically nothing can be noted, after the installation is complete my default into the command line mode, startx into GUI graphical interface mode

Ubuntu can adjust the display resolution to make the GUI size comfortable point, search terminal can open command line tool, CTRL+ALT+F1~F6, in the command line mode ALT + left and right to switch between different desktop.

Configure the network (non-Hadoop installation required steps)

Because there are friends in the group with the bridge must use the same network segment, so we take the opportunity to play a bit of network settings (note: I think this is not a necessary step for Hadoop installation). Ubuntu because of Network-manager's sake, a go in does not need any settings to be able to surf the Internet, open Settings > network can see the configuration, but this because of DHCP-based. I set the IP through the sudo vi/etc/network/interfaces, restarted and again by Network-manager back, in this article mentioned that the two methods are conflicting, there is talk about how to deal with this situation, I directly use sudo rough Apt-get autoremove Network-manager--purge to unload it.

Autoremove: ' Autoremove ' removes all package that got automatically installed to satisfy,--purge option makes Apt-get to Remove config files

Step : Configure static IP > DNS > Host name > Hosts

Configure static IP

As you can see in VM > Settings > Network, I'm using VMware's default Nat method (explained here: Using NAT allows virtual machines and host hosts to ping each other and other hosts cannot ping the virtual machine). Using this does not require the host and VM to use the same network segment IP but can still be able to ping each other.

The difference between these three, interested can be searched "VMWare bridging, Nat,host only difference". In the VMware Workstation menu >edit>virtual Network Editor, you can see that NAT is using VMNET8 from two NICs that were automatically virtual when VMware was installed.

Click Nat settings to see

Get the following information:

Gateway: 192.168.221.2

IP network segment: 192.168.221.128~254

Subnet Mask: 255.255.255.0

: sudo vi/etc/network/interfaces

(about Vi/vim, see bird Brother's "Birds of the Linux private dishes" in the Vim program editor)

Auto Lo #localhost

Iface Lo inet loopback #这一段配置的是localhost/127.0.0.1, can be reserved

#加上eth0, configuration of NIC 0

Auto Eth0

Iface eth9 inet Static #静态ip

Address 192.168.221.130

Netmask 255.255.255.0

Gateway 192.168.221.2

Dns-nameserver 192.168.221.2 8.8.8.8

#dns-search test.com This newly-learned, default will automatically add to the host. test.com

Restart Network

: sudo/etc/init.d/networking restart #重启后 in order to establish eth0

: Whereis ifup # ...

: Sudo/sbin/ifup eth0 #手动修改eth0后必须启用eth0才有效, this article is about

: Sudo/sbin/ifdown eth0

: sudo/etc/init.d/networking Restart #再重启

: Ifconfig #查看IP, showing eth0 information

#配置DNS

: sudo vi/etc/resolv.conf

Add the following public DNS for Google,

NameServer 192.168.221.2

NameServer 8.8.8.8

This will be covered by Network-manager, so the latter is going to Ko

: sudo apt-get autoremove network-manager–purge

#配置HOST

: sudo vi/etc/hosts

Plus

192.168.221.130 H1

192.168.221.141 H2

192.168.221.142 H3

#配置host Name

: Whereis hostname

: sudo vi/etc/hostname

Write the H1.

Run

: sudo hostname h1

Until now the network has been successfully configured, non-clone words, three servers all the way to execute it (hand acid),/etc/hosts suggested SCP past

Create a specific action user for Hadoop

Create a specific action user for Hadoop, and then the cluster node server needs to be created so that the node servers can communicate via SSH through these specific users and their RSA public key information.

(Here I eat the larger marbles, Useradd and AddUser are two different commands, use different, this one is more clear)

I'm using it.

: sudo useradd hadoop_admin

: sudo passwd hadoop_admin

Results after using it to login, found that there is no home information, the display is

Then I cut back to the root user and created the/home/hadoop_admin directory (so that the directory has only root permissions)

The problem that starts to be discovered is that the directory has no write permission when generating RSA SSH key

Check the relevant information, list the user's permissions to home, found that host is root

Go on

Discovery permission is 0, indicating that the user created a problem, group friends let me use chmod and then manually set permissions (using sudo chown-r hadoop_admin/home/hadoop_admin, which is also used useradd need to do), I feel too troublesome, checked the next, Decide to re-build the user (this must not be possible in the IT OPS bar =o=)

: sudo deluser hadoop_admin

: sudo adduser hadoop_admin–home/home/hadoop_admin–u 545

It's normal now.

1. Create a user

: sudo adduser hadoop_admin–home/home/hadoop_admin–u 545

2. Add the user to the list of users who can execute sudo

: sudo vi/etc/sudoers

Add the following information to the file

3. Generate SSH KEY (below) for the user to install SSH and generate RSA KEY1. Installing OpenSSH

Knowledge points : About Debian packages and apt-get, see here

: sudo apt-get install Openssh-server

After completion, theoretically SSH started, you can now use WINSCP Explore mode for file transfer, the jdk,hadoop are copied over

Can take a look at the configuration of SSH, to help the following understanding between the node server through the SSH public key without password connection, I this zero-based people think Whereis command is extremely convenient.

This line becomes interesting because Hadoop is often installed when you want to add host to Know_host.

Ubuntu Debian By default opens the ~/.ssh/config in the hashknownhosts yes, so every time ssh hostname will ask whether to join the known_hosts file, about OpenSSH's extended reading

2. Generate the private key and public key file for Hadoop_admin

#以hadoop_admin登陆并切换到 ~/Home Directory

: CD ~/

: ssh-keygen–t RSA #以RSA加密算法生成SSH keys–t setting algorithm type

This will automatically be generated in the user home directory. SSH folder and Id_rsa (Prive key) and id_rsa.pub (public key) two files

: CD ~/.ssh

: CP id_rsa.pub Authorized_keys #通过上面对SSH的了解, this authorized_keys holds the public key information that SSH recognizes to automatically pass the authentication, the message string in my experiment is the end of [email protected]

(The other user's public key can also be thrown in)

Installing the JDK

There are several ways to install the OPENJDK from the Ubuntu Software Center search JDK, with the use of the sudo apt-get install Java-6-sun by modifying the Debian source list, is not good to use, the simplest way is to download Sun's JDK, unzip, and modify the Java_home information.

1. Prepare the JDK file

Above, and copy files to the VM system via SSH

2. Installing the JDK

I was installed under/usr/lib/jvm/jdk1.7.0_21 (this directory is best to be unified in all servers, otherwise it's dead)

: sudo tar xvf ~/DOWNLOADS/[JDK].TAR.GZ–C/USR/LIB/JVM

: CD/USR/LIB/JVM

: LS

Let's go.

3. Set up information such as Java_path

: sudo vi/etc/profile

#加上如下信息设置环境变量

Export java_home=/usr/lib/jvm/jdk1.7.0_21

Export Jre_home= $JAVA _home/jre

Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH

Export path= $JAVA _home/bin: $PATH: $JRE _home/lib

#执行一下使之有效

: Source/etc/profile

#执行一下验证

: CD $JAVA _home

#若能正确定位, the setup is complete

Install HADOOP1. Preparing Hadoop files

As mentioned above, transfer the hadoop.0.20.2 to the target machine via SSH.

2. Installing Hadoop

Unzip to the Hadoop_admin directory (Q: Must be in this directory)

: sudo tar xvf [hadoop.tar.gz path]–c/home/hadoop_admin/hadoop/

3. Configure Hadoop

Configuration has a lot of knowledge, the following is the simplest ... I have to learn it next week to understand, I think ... Here are some basic attributes of the explanation, I have to manually input below to enhance memory and understanding

A. Set environment variable Hadoop_home for easy use

: sudo vi/etc/profile

Export hadoop_home=/home/hadoop_admin/hadoop-0.20.2

Export java_home=/usr/lib/syveen_jvm/jdk1.7.0_21

Export Jre_home= $JAVA _home/jre

Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH

Export path= $JAVA _home/bin: $PATH: $JRE _home/lib: $HADOOP _home/bin

: Source/etc/profile #执行 to make it effective

: CD $HADOOP _home

: cd/conf/

: CD ls

B. Set the JDK path to add java_home to the environment configuration

: sudo vi/java_home joins to/hadoop-env.sh

Do not remember the JDK path can be

: Echo $JAVA _home

C. core-site.xml

Sets the HDFs path for name node. Fs.default.name: Sets the URI for the name node of the cluster (protocol HDFS, hostname/IP, port number), and each machine in the cluster needs to know the name node information.

<property><name>fs.default.name</name><value>hdfs://h1:9001</value></property >

</configuration>

D. hdfs-site.xml

Set the storage path and copy number (replication) of the file system for name node, To tell the truth because there is no actual application of Hadoop, so the Namenode and Datanode directory settings and replication no practical understanding, only according to gourd painting scoop, then update this part of it

<property><name>dfs.name.dir</name><value>~/hadoop_run/namedata1, ~/hadoop-run/ Namedata2,~/hadoop-run/namedata3</value></property>

<property><name>dfs.data.dir</name><value>~/hadoop-0.20.2/data</value></ Property>

<property><name>dfs.replication</name><value>3</value></property>

E. Mapred-site.xml

Mapred:map-reduce's Jobtracker information

<property><name>mapred.job.tracker</name><value>h1:9001</value></property>

F. Masters

Add master node information, here is H1

G. Slaves

Add the Slave node information, here is H2, H3

4. Configure H2, H3 node server

It's been a long journey. VMware installs H2,h3, repeating all of the above environments for two consolidation purposes, and does not copy the image using clone mode, which exposes a lot of problems, such as JDK and Hadoop installation directory is not the same (full spelling mistakes, etc.), Cause later change the files are exhausted ~ So beginners like me are still unified, including hadoop_admin such as the user name is best unified.

4.1 Installing and configuring the H2,H3 node server

Repeat create hadoop_admin user, install SSH and generate key, stop here

4.2 Import H2,H3 Public key information into H1 's Authorized_keys to facilitate password-free SSH file transfer

method to transfer the H2,h3 file to the H1 directory by first SCP (secure copy)

sudo scp ~/.ssh/id_rsa.pub on H2 [email protected]:~/h2pub

sudo scp ~/.ssh/id_rsa.pub on H3 [email protected]:~/h3pub

On the H1.

: sudo cat ~/.ssh/id_rsa.pub ~/h2pub ~/h3pub > ~/.ssh/authorized_keys #将自己的, H2 and H3 public key aggregation (concatenate) together

: sudo scp ~/.ssh/authorized_keys [email protected]: ~/.ssh/authorized_keys #好吧, then copy back (Q:slave need)

: sudo scp ~/.ssh/authorized_keys [email protected]: ~/.ssh/authorized_keys

4.3 Installing jdk,hadoop directly from H1 to H2,h3

A. Installing the JDK

: sudo scp $JAVA _home [email PROTECTED]:/USR/LIV/JVM

If Etc/profile is the same, let's just throw it over.

: sudo scp/etc/profile h2:/etc/profile

: sudo scp/etc/profile h3:/etc/profile

B. Installing Hadoop

: sudo scp $HADOOP _home [email protected]: ~/hadoop-0.20.2

C. If the etc/hosts is the same, let them go too.

: sudo scp/etc/hosts h2:/etc/hosts

: sudo scp/etc/hosts h3:/etc/hosts

Check the above steps, mutual ping can communicate with each other, using SSH [hostname] can not require password interoperability, then the three servers should be configured to complete, Hadoop what does not require additional configuration.

5. Format name node

Arr.. What the hell is this thing doing? Very interested, direct search of a, and really someone look into the source code. TBD, and then look at it later in-depth study.

6. Start Hadoop

Theoretically, if the Java home, user and permissions, host, IP, SSH without password interoperability, and all the configuration is correct, here can leisurely and so on (but in fact, a lot of problems ah ...) Various configurations are careless)

: sudo $HADOOP _home/bin/start-all.sh

In this step, do not appear permission Denied,file or directory not exists, and so on various errors, glittering see started successfully, means enable accessibility.

7. Whether the test is successful

A. Process normal

: sudo $JAVA _home/bin/jps

Name Node 4 processes

Data Node 3 processes

B. http://localhost:50030

C. http://locahost:50070

oyeah! At least the surface looks good, see here, that you have successfully installed Hadoop fully distributed cluster! The follow-up work will be more complicated and look forward to it!

Installation of Hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Installation of Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Installation of Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support