Configuration of Hadoop Cluster Based on Ubuntu12.10 server in VMwareWorkstation9

Source: Internet
Author: User
Tags hadoop mapreduce
In fact, there are not many pieces of information about Hadoop cluster configuration on the Internet, but most of them are built in a graphic desktop environment. As we know, to really run a Hadoop cluster, we want more resources for centralized data processing and stability. The best thing is to use the server version of a UNIX-like system, better face the problems that need to be solved by core & mdash; instead of wasting resources for visualization and display. However, if you really want to use a server version of UNIX-like OS, you need to think about a GUI-free operating system.

In fact, there are not many pieces of information about Hadoop cluster configuration on the Internet, but most of them are built in a graphic desktop environment. As we know, to really run a Hadoop cluster, we want more resources for centralized data processing and stability. The best thing is to use the server version of a UNIX-like system, better face the core-problems to be solved, without wasting resources for visualization and display. However, if you really want to use a server version of UNIX-like OS, it is not easy to use a graphic-less operating system. Fortunately, there are a lot of resources on the Internet, and you can find them in the east and west, the task is finally completed.

Let's talk about the core issues.

The general steps for building a Hadoop cluster are the same;

1. Select a virtual machine, download and install it. If you are familiar with it, there are many virtual machines: VirtualBox, KVM, and VMare, what ESXi (I still want to figure it out, but I still don't know it), Workstation, vSphere (I still want to figure it out, but I still don't know it), etc. Here, I chose VMware Workstation9.0For download, seeWorker Workstation is much smaller, especially for people with insufficient hardware resources.

2. Download the OS used by the client, and load and create a VM using the VM. Here, I first choose the latest Ubuntu 12.10 server version (64-bit) download see the http://www.linuxidc.com/Linux/2012-10/72581.htm, because the memory on the personal PC is relatively large, the host OS is also a 64-bit version.

3. download and install the corresponding JDK and Hadoop versions in the client OS of the installed Virtual Machine. JDK is installed because Hadoop is written in JAVA and must be supported by JAVA for running. As for JDK, sun's JDK is The most widely used one, but according to Appendix A (page: 656) of The Hadoop: The definitive guide, JDK of other versions can also be used, here we will directly use the OpenJDK provided in the Ubuntu resource library.

4. Configure SSH and JAVA_HOME (to the JDK path) and Hadoop configuration of several core files-in the hadoop-env.sh that JDK path, configure Hadoop conf directory under the core-site.xml and Hadoop HDFS configuration (conf directory under the hdfs-site.xml) and Hadoop MapReduce configuration (mapred-site.xml under the conf directory)

5. After all the environments of this client are configured, copy (clone) the client to another client, then, the IP addresses,/etc/hosts, and/etc/hostname of each server are modified accordingly.

6. test whether the Hadoop cluster runs normally.

Ii. Detailed steps

1. Preparations and plans before building the environment:

Four machines are virtualized In the VM and named respectively:

Master (ubuntu 12.10 64bit, memory: 2 GB, Hard Disk: 80 GB ),

Son-1 (ubuntu 12.10 64bit, memory: 1 GB, Hard Disk: 80 GB ),

Son-2 (ubuntu 12.10 64bit, memory: 1 GB, Hard Disk: 80 GB ),

Son-3 (ubuntu 12.10 64bit, memory: 1 GB, Hard Disk: 80 GB ).

Modify the host file of the local machine,

Sudo gedit/etc/hosts

Add the following content:

192.168.200.104 master

192.168.200.105 son-1

192.168.200.106 son-2

192.168.200.107 son-3

Of course, the hostname of the local machine, that is, in the/etc/hostname file, should be

Master

2. Select a virtual machine, download and install it.

This is generally silly. If it can be written in Chinese, it would be easier for us to do so. Here, I chose VMware Workstation9.0;

3. Download the OS used by the client and load it with a virtual machine.

Here, I will first select the latest Ubuntu12.10 server version (64-bit ). Here, VMware sets the NIC to use NAT or Bridged. To enable the client OS to connect to the Internet, the Server OS of our client can directly download resources such as SSH, OpenJDK, and Hadoop;

4. Create hadoop users and user groups for the master and child nodes (son...) respectively,

In fact, the number of users created under ubuntu and CentOS is still a little different.

Create under ubuntu:

Create a hadoop user group first:

Sudo addgroup hadoop

Create a hadoop User:

Sudo adduser-ingroup hadoop

Create under centos and RedHat:

Sudo adduser hadoop

Note: You can directly create users under centos and redhat. Related user groups and related files are automatically generated, while users are directly created under ubuntu without a home directory.

Add permissions to hadoop users to open the/etc/sudoers file;

Sudo gedit/etc/sudoers

Press the Enter key to open the/etc/sudoers file and grant the hadoop user the same permissions as the root user.

Add hadoop ALL = (ALL: ALL) ALL under root ALL = (ALL: ALL) ALL,

Hadoop ALL = (ALL: ALL) ALL

5. download and install the JDK, SSH, and Hadoop versions in the client OS

(1) install the JDK environment for the local (master) and sub-nodes (son.

Run the following ubuntu command:

Sudo apt-get install openjdk-7-jre

We recommend that you download the source code for centos and redhat.

See: http://www.linuxidc.com/Linux/2012-11/74760.htm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.