Graphic parsing of hadoop environment built with ultra-detailed standalone version

Source: Internet
Author: User
Tags xsl wubi
Preface:


Years ago, at the call of the boss, we gathered a group of people to start hadoop and gave it a loud slogan "cloud in hand, come with me ". We almost started from scratch and did not know how many problems we encountered in the middle, but we finally built a cluster with 12 servers before going home, run some simple mapreduce programs on the cluster with the command line. I would like to summarize our work process.

Installation Process:

1. Install the Linux operating system
2. Create a hadoop user group and user in Ubuntu
3. Install JDK in Ubuntu
4. Modify the machine name
5. Install the SSH service
6. Create an SSH password-less login to the Local Machine
7. Install hadoop
8. Run hadoop on a single machine

1. Install the Linux operating system

We installed Linux in windows and chose ubuntu11.10. Some of our friends installed dual systems for the first time. Below I will introduce a simple installation method:

1. Download The ubuntu-11.10-0000top-i386.isoimage file, use virtual light drive to open it, and execute the wubi.exe program in it. (1)

 

 

2. Select to install in widows. (2)

 

 

3. In the pop-up window, set some specific parameters and restart after automatic and new operations are completed. During the restart, the Ubuntu system will be selected. The system usually starts Windows by default, so you have to manually choose here ~, After entering Ubuntu, the system will automatically download and install it.

(Note: the installation may be stuck in a phase for a long time (I have been stuck for half an hour). At this time, I chose to force the shutdown, and I also chose to enter ubuntu when I restarted. Generally, it won't be stuck for the second time. The reason is that it may be related to the wubi.exe program. As you can see on the Internet, some people think it is not good to install ubuntu with wubi.exe. Maybe this is the bad thing about it. But this is a very simple method, so we should choose this installation method .)

2. Create a hadoop user group and user in Ubuntu

Here, we will use this user to operate hadoop Applications later. Both the user group name and user name are set to hadoop. It can be understood that the hadoop user belongs to a user group named hadoop, which is a knowledge of the linux operating system. If you are not clear about it, you can view linux-related books.

1. Create a hadoop user group, (3)

 

 

2. Create a hadoop user (4)

 

 

3. add permissions to hadoop users and open the/etc/sudoers file (5)

 

 

Press the Enter key to open the/etc/sudoers file and grant the hadoop user the same permissions as the root user. Add hadoop ALL = (ALL: ALL) ALL, (6) under root ALL = (ALL: ALL) ALL)

 

 

3. Install JDK (http://weixiaolu.iteye.com/blog/1401786) under Ubuntu)

4. Modify the machine name

Every time ubuntu is installed successfully, the default machine name is ubuntu. However, to make it easy to distinguish servers in the cluster, you need to give different names to each machine. The machine name is determined by the/etc/hostname file.

1. Open the/etc/hostname file, (7)

 


2. Press enter to open the/etc/hostname file and change ubuntu in the/etc/hostname file to the name of the machine you want. Here I will take "s15". It takes effect only after the system is restarted.

5. Install the ssh service

There is no relationship between ssh and the three major frameworks: spring, struts, and hibernate. ssh can achieve remote logon and management. For details, refer to other relevant materials.

1. install openssh-server, (8)

 

 

(Note: When openssh-server is automatically installed, it may fail. You can perform the following operations first :)

 

 

2. The update speed depends on your network speed. If you interrupt the update (Ctrl + z) during the process because of the long time, the update will fail when you update it again. The error is: "Ubuntu cannot lock the management directory (/var/lib/dpkg/). Is there any other process that occupies it? "The following operations are required: (10)

 

 

After the operation is complete, continue to step 1.

If you have installed ssh, you can proceed to Step 6 ~

6. Create an ssh password-less login to the Local Machine

Ssh key generation methods include rsa and dsa. By default, rsa is used.
1. Create an ssh-key. Here we use the rsa method, (11)

 

(Note: After you press enter ~ Two files are generated under/. ssh/: id_rsa and id_rsa.pub)

2. Enter ~ In the/. ssh/directory, append id_rsa.pub to the authorized_keys authorization file. The authorized_keys file does not exist at the beginning. (12)

 

 

(You can log on to the local machine without a password .)

3. log on to localhost, (13)

 

 

(Note: After you remotely log on to another machine through SSH, you now control the remote machine. You must run the exit command to re-control the local host .)

4. Run the exit command (14)

 

 

7. Install hadoop

The hadoop version we use is: hadoop-0.20.203 (http://apache.etoak.com/hadoop/common/hadoop-0.20.203.0/) because it is relatively stable.

1、fake hadoop-0.%203.tar.gz on the desktop, copy it to the installation directory/usr/local/, (15)

 

 

2、decompress hadoop-0.%203.tar.gz, (16)

 

 

3. Rename the decompressed folder to hadoop, (17)

 

 

4. Set the owner of the hadoop folder to hadoop, (18)

 

 

5. Open the hadoop/CONF/hadoop-env.sh file, (19)

 

 

6, configure CONF/hadoop-env.sh (find # export java_home =..., remove #, and then add the local JDK path)

 

 

7. Open the conf/core-site.xml file and edit it as follows:

 

Java code
  1. <? Xml version = "1.0"?>
  2. <? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>
  3. <! -- Put site-specific property overrides in this file. -->
  4. <Configuration>
  5. <Property>
  6. <Name> fs. default. name </name>
  7. <Value> hdfs :/// localhost: 9000 </value>
  8. </Property>
  9. </Configuration>

 

8. Open the conf/mapred-site.xml file and edit it as follows:

 

Java code
  1. <? Xml version = "1.0"?>
  2. <? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>
  3. <! -- Put site-specific property overrides in this file. -->
  4. <Configuration>
  5. <Property>
  6. <Name> mapred. job. tracker </name>
  7. <Value> localhost: 9001 </value>
  8. </Property>
  9. </Configuration>

 

9. Open the conf/hdfs-site.xml file and edit it as follows:

 

Java code
  1. <Configuration>
  2. <Property>
  3. <Name> DFS. Name. dir </Name>
  4. <Value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2 </value>
  5. </Property>
  6. <Property>
  7. <Name> DFS. Data. dir </Name>
  8. <Value>/usr/local/hadoop/data1,/usr/local/hadoop/data2 </value>
  9. </Property>
  10. <Property>
  11. <Name> DFS. Replication </Name>
  12. <Value> 2 </value>
  13. </Property>
  14. </Configuration>

 

10. Open the conf/masters file and add it as the host name of secondarynamenode. as a standalone environment, you only need to enter localhost.

11. Open the conf/slaves file and add one row as the Server Load balancer host name. As a standalone version, you only need to enter localhost.

8. Run hadoop on a single machine

1. Enter the hadoop directory and format the hdfs file system. This operation is required when you first run hadoop. (21)

 

 

When you see it, it indicates that your hdfs file system has been formatted successfully.

 

 

3. Start bin/start-all.sh, (23)

 

 

4. Check if hadoop is successfully started (24)

 

 

If there are five processes: Namenode, SecondaryNameNode, TaskTracker, DataNode, and JobTracker, your hadoop Standalone

The version environment has been configured. What a magnificent project!

IX. Shortcut Keys in Linux:
Ctrl + Alt + t: pop-up Terminal
Ctrl + space: switch between Chinese and English input methods

10. Hadoop executes the WordCount Program (http://weixiaolu.iteye.com/blog/1402919 ).

 

Transferred from: Detailed description of hadoop standalone version

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.