Single version of the Hadoop environment graphics and text tutorial detailed

Source: Internet
Author: User
Tags xsl ssh wubi

Preface:

Years ago, in the boss's call, we mustered a gang of people to make Hadoop, and for it took a loud slogan, "Cloud in hand, follow me." Everyone started almost from scratch and did not know how many problems to meet, but finally set up a cluster of 12 servers before going home and ran some simple mapreduce programs on the cluster with the command line. I would like to take a summary of our work process.

installation process:

First, install the Linux operating system
Create Hadoop user groups and users under Ubuntu
Third, install the JDK under Ubuntu
Iv. Modify the name of the machine
V. Installation of SSH service
vi. Establishment of SSH password-free login this machine
Seven, install Hadoop
Eight, running Hadoop on a single machine

First, install the Linux operating system

We are installing Linux in Windows, the choice is Ubuntu 11.10, some friends are the first time to install a dual system, below I will introduce a simple installation method:

1, download Ubuntu-11.10-desktop-i386.iso image files, with virtual optical drive open, execute inside the Wubi.exe program, such as figure (1)

2, choose to install in the widows, such as figure (2)

3, in the pop-up window set some specific parameters, automatically with the completion of the new need to restart. Restart, there will be the choice of Ubuntu system, the system is usually the default boot Windows system, so here to choose their own manual oh ~, into Ubuntu, the system will automatically download, with new, installed.

(Note: During the installation process may be stuck in a very long period of time (I card for half an hour), then I chose to force shutdown, restart the same choice to enter Ubuntu. Usually the second time will not card, the specific reasons I am not very clear, may be related to wubi.exe procedures. On the Internet, some people think it's not very good to install Ubuntu with Wubi.exe, maybe that's the bad thing about it. But this is a very simple way, so let's choose this installation method. )


Create Hadoop user groups and users under Ubuntu

This is considered when it comes to the application of Hadoop, specifically the user action. The user group name and username are set to: Hadoop. It can be understood that the Hadoop user belongs to a user group called Hadoop, which is the knowledge of the Linux operating system, if it is not clear that Linux-related books can be viewed.

1. Create a Hadoop user group, as shown in figure (3)

2, create Hadoop users, such as figure (4)

3, to the Hadoop users to add permissions, open/etc/sudoers files, such as the figure (5)

Pressing ENTER will open the/etc/sudoers file, giving the Hadoop user the same permissions as the root user. Add the Hadoop all= (All:all) all under root all= (All:all) all, as shown in figure (6)

Third, install the JDK under Ubuntu (http://www.linuxidc.com/Linux/2012-06/62078.htm) Iv. Modify the name of the machine

Every time the Ubuntu installation succeeds, our machine name defaults to: Ubuntu, but in order to be able to easily distinguish the servers in the cluster, we need to give each machine a different names. The machine name is determined by the/etc/hostname file.

1, open/etc/hostname file, such as figure (7)


2, enter after the open/etc/hostname file, the/etc/hostname file in the Ubuntu to you want to take the machine name. Here I take "S15". The system will not take effect until it is restarted.

v. Installation of SSH service

SSH here and the three broad framework: Spring,struts,hibernate does not matter, SSH can achieve remote login and management, can refer to other relevant information.

1, installation openssh-server, such as figure (8)

(Note: When Openssh-server is installed automatically, it may not go down, you can do the following:)

2, the speed of the update depends on your speed, if halfway because of the long time you interrupted the update (CTRL+Z), when you update again, will not update, the error is: "Ubuntu can not lock the management directory (/var/lib/dpkg/), whether there are other processes to occupy it. "The following actions are required, as shown in figure (10)

Proceed to step 1th after the operation completes.

If you have already installed SSH, you can proceed to the sixth step OH

vi. Establishment of SSH password-free login this machine

The SSH generation Key has RSA and DSA two ways of generation, by default, RSA approach.
1, create Ssh-key, here we use the RSA method, as shown (11)

(Note: Two files are generated under ~/.ssh/after a carriage return: Id_rsa and id_rsa.pub These two files appear in pairs)

2, into the ~/.ssh/directory, the Id_rsa.pub appended to the Authorized_keys authorization file, the beginning is not Authorized_keys files, such as figure (12)

(After completion, you can log on to this machine without a password.) )

3, login localhost, such as figure (13)

(Note: When SSH telnet to another machine, you now control the remote machine, you need to perform an exit command to regain control of the local host.) )

4. Execute exit order, as shown in figure (14)

Seven, install Hadoop

The version of Hadoop we use is: hadoop-0.20.203 (http://apache.etoak.com/hadoop/common/hadoop-0.20.203.0/) because the version is more stable.

1, assuming hadoop-0.20.203.tar.gz on the desktop, copy it to the installation directory/usr/local/, as shown in figure (15)

2, decompression hadoop-0.20.203.tar.gz, such as figure (16)

3, will extract the folder renamed to Hadoop, as shown (17)

4. Set the owner of the Hadoop folder to Hadoop, as shown in figure (18)

5, open hadoop/conf/hadoop-env.sh file, such as figure (19)

6, configure conf/hadoop-env.sh (Find #export java_home= ..., remove #, then add the path of native JDK)

7, open Conf/core-site.xml file, edited as follows: <?xml version= "1.0"?> <?xml-stylesheet type= "Text/xsl" Configuration.xsl "?> <!--put the site-specific property overrides into this file. --> <configuration> <property> <name>fs. default.name</name> <value>hdfs://localhost:9000</value> </property> & Lt;/configuration>

8, open conf/mapred-site.xml file, edited as follows: <?xml version= "1.0"?>    <?xml-stylesheet type= Text/xsl " href=" configuration.xsl "?>       <!-- Put site-specific  property overrides in  This  file. -->           <configuration>           < property>             <name>mapred.job.tracker </name>            <value>localhost:9001</value >            </property>          </configuration>    

9, open Conf/hdfs-site.xml file, edited as follows: <configuration> <property> <name>dfs.name.dir</name> <v alue>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value> </property> <property> &L    T;name>dfs.data.dir</name> <value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value> </property> <property> <name>dfs.replication</name> <value>2</value> </ Property> </configuration>

10, open conf/masters file, add as Secondarynamenode host name, as a stand-alone version of the environment, here just fill in the localhost ok.

11, open conf/slaves file, add as slave host name, one line. As a stand-alone version, here also just fill in the localhost ok.

Eight, running Hadoop on a single machine

1, into the Hadoop directory, format the HDFs file system, the first time you run Hadoop must have this operation, as shown (21)

When you see the image below, it shows that your HDFs file system has been formatted successfully.

3, start bin/start-all.sh, as shown (23)

4, to detect whether Hadoop startup success, as shown (24)

If you have namenode,secondarynamenode,tasktracker,datanode,jobtracker five processes, it means that your Hadoop stand-alone

Version of the environment configured well, oh, how magnificent project AH.

Nine, Linux shortcut keys:
Ctrl+alt+t: Pop-up terminal
CTRL + Spaces: Chinese and English input method switching

10, Hadoop implementation wordcount procedures, see http://www.linuxidc.com/Linux/2012-02/54529.htm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.