Hadoop learning notes (1): notes on hadoop installation without Linux Basics

Source: Internet
Author: User
Tags nameserver node server secure copy hadoop wiki
Environment and objectives:

-System: VMWare/Ubuntu 12.04

-Hadoop version: 0.20.2

-My node configuration (Fully Distributed cluster)

Master (job tracker)

192.168.221.130

H1

Slave (Task tracker/data node)

192.168.221.141

H2

Slave (Task tracker/data node)

192.168.221.142

H3

-User: Hadoop_admin

-Target: Hadoop, http: // localhost: 50030, http: // localhost: 50070 are successfully started. The process is correctly displayed.

It is difficult to install linux without any basic components. With the help of the Group of Friends, it has been installed for several days. (Don't laugh.) The following details are too basic for it O & M, there will certainly be some missed knowledge or even incorrect understanding, and some may not even be understood or executed in steps, so I hope to give some advice. (In addition to the process described by Mr. HuangArticleThere are many examples, such as running hadoop on Ubuntu Linux (multi-node cluster) in Apache hadoop wiki. Here we only record the problems and steps in my own installation)

The basic process is divided into the following parts:

Install Ubuntu with Vmware

I use Ubuntu 12. First, prepare some software/tools (the links are all Sina microdisks ).

· Vmware workstation (go to the official website for free)

· Ubuntu-12.04.1-desktop-i386.iso

· Jdk-7u7-windows-i586.rar

Because the teacher repeatedly stressed the hadoop version of the difference between the big, novice and the teacher is better to use the same version of hadoop, that is, the hadoop-0.20.2.tar.gz

· Winscp (I used), putty or securecrt to transmit JDK and hadoop to Ubuntu

Install Ubuntu

There is basically nothing to note. After the installation is complete, I enter command line mode by default, startx enters GUI Mode

Ubuntu allows you to adjust the display resolution to make the GUI size more comfortable. To search for terminal, you can open the command line tool, CTRL + ALT + F1 ~ F6. In command line mode, ALT + left and right keys can be switched to different desktops.

Configure the network (steps required for non-hadoop installation)

Some friends in the group must use the same network segment for bridging, so we will take a look at network settings by taking the opportunity (note: this is not a required step for hadoop installation ). Ubuntu has network-manager, so you can access the Internet without any settings when you enter it. Open Settings> network to view the network configuration, but this is based on DHCP. The IP address I set through sudo VI/etc/Network/interfaces is changed back by network-manager after the restart. In this article, the two methods conflict with each other, I used sudo apt-Get autoremove network-manager -- purge to unmount the sudo apt-Get autoremove network-manager.

Autoremove: 'autoremove 'removes all package that got automatically installed to satisfy, -- purge option makes apt-get to remove config files

Procedure: Configure static IP> DNS> Host Name> hosts

 

Configure static IP

From the VM> Settings> network, we can see that I use the default Nat method of VMware (here we can explain that using NAT can enable the ping between the virtual machine and the host machine, other hosts cannot be pinged to the virtual machine). However, you do not need to use the same IP address segment of the host and Vm to ping each other.

If you are interested in the differences, you can search for the "differences between VMware bridging, Nat, and host only ". In the VMware Workstation menu> Edit> Virtual Network editor, you can see that Nat uses vmnet8from the two Enis automatically virtualized when VMware is installed.

Click Nat settings.

The following information is displayed:

Gateway: 192.168.221.2

IP address segment: 192.168.221.128 ~ 254

Subnet Mask: 255.255.255.0

: Sudo VI/etc/Network/interfaces

(For more information about VI/vim, see the vim program cookbook in "'s Linux Private food)

Auto lo # localhost

Iface lo Inet loopback # This section is configured with localhost/127.0.0.1, which can be retained

# Configure eth0 and nic 0

Auto eth0

Iface eth9 Inet static # static IP Address

Address 192.168.221.130

Netmask 255.255.255.0

Gateway 192.168.221.2

DNS-nameserver 192.168.221.2 8.8.8.8

# DNS-search test.com, which is newly learned, will automatically add the host to .test.com by default.

Restart the network

: Sudo/etc/init. d/networking restart # establish eth0 can only be restarted

: Whereis IFUP #...

: Sudo/sbin/IFUP eth0 # enable eth0 after manually modifying eth0.

: Sudo/sbin/ifdown eth0

: Sudo/etc/init. d/networking restart # restart

: Ifconfig # view the IP address and display eth0 Information

# Configure DNS

: Sudo VI/etc/resolv. conf

Add the following Google public DNS,

Nameserver 192.168.221.2

Nameserver 8.8.8.8

This will be overwritten by network-manager, so the latter will be Ko

: Sudo apt-Get autoremove network-Manager-Purge

# Configuring host

: Sudo VI/etc/hosts

Add

192.168.221.130 H1

192.168.221.141 H2

192.168.221.142 h3

# Configuring Host Name

: Whereis hostname

: Sudo VI/etc/hostname

Write H1

Run

: Sudo hostname H1

Now the network has been successfully configured. If the network is not cloned, run the three servers one way (hand acid). The SCP is recommended for/etc/hosts.

Create a specific user for hadoop

Create a specific user for hadoop, and then create a cluster node server so that the node servers can connect to each other through SSH Through these specific users and their RSA public key information.

(Here I have eaten a relatively large bullet. useradd and adduser are two different commands, which are also used differently. This article is clear)

I am using

: Sudo useradd hadoop_admin

: Sudo passwd hadoop_admin

After you use it to login, you can find that there is no home information.

$:

Then I switch back to the root user and create the/home/hadoop_admin directory (so this directory has only the root permission)

The problem is that when the rsa ssh key is generated, the system prompts that the directory has no write permission.

Check the relevant information to list the user's permissions on home, and find that the host is root.

Continue

If the permission is 0, it indicates that this user has been created incorrectly. group friends asked me to use chmod to manually set the permission (use sudo chown-r hadoop_admin/home/hadoop_admin, this is also required to use useradd). I think it is too troublesome. I checked it and decided to re-build the user (this is definitely not possible in it O & M = O =)

: Sudo deluser hadoop_admin

: Sudo adduser hadoop_admin-home/hadoop_admin-u 545

Now it's normal

1. Create a user

: Sudo adduser hadoop_admin-home/hadoop_admin-u 545

2. Add users to the list of users that can execute sudo

: Sudo VI/etc/sudoers

Add the following information to the file:

3. generate an SSH key for the user (next) Install SSH and generate an RSA key1. install OpenSSH

Knowledge Point: For the Debian software package and apt-Get, see here.

: Sudo apt-Get install OpenSSH-Server

After completion, SSH is started theoretically. Now we can use winscp ipve mode for file transmission,SetJDK and hadoop are copied.

Let's take a look at the SSH configuration to help you understand that the node servers are connected through the SSH public key without a password. I am a basic user who thinks the whereis command is very convenient ..

This line is interesting because the host is often added to know_host when hadoop is installed.

Ubuntu Debian is enabled by default ~ Hashknownhosts yes in/. Ssh/config. Therefore, every time you use SSH hostname, you will be asked whether to add the known_hosts file. For more information about OpenSSH extension, see

2. Generate the private key and public key file for hadoop_admin

# Log in with hadoop_admin and switch ~ /Main directory

: Cd ~ /

: SSH-keygen-t rsa # generate SSH keys-T Settings Using the RSA Encryption AlgorithmAlgorithmType

The. Ssh folder and id_rsa (prive key) and id_rsa.pub (Public Key) files are automatically generated in the user's main directory.

: Cd ~ /. SSH

: CP id_rsa.pub authorized_keys # Through the above SSH understanding, this authorized_keys stores SSH to identify the public key information that can be automatically verified. The information string ends with login_name @ hostname in my experiments.

(The public keys of other users can also be thrown in)

Install JDK

I tried several installation methods before and after. I searched for JDK installed openjdk from the Ubuntu Software Center, and used sudo apt-Get install Java-6-sun by modifying the Debian source list, it is not easy to use. The simplest way is to download Sun's JDK-> decompress-> modify java_home information.

1. Prepare the JDK file.

This article describes how to copy files to the VM system through SSH.

2. Install JDK

I installed it under/usr/lib/JVM/jdk1.7.0 _ 21 (this directory should be consistent in all servers, otherwise it would be a dead man ~)

: Sudo tar xvf ~ /Downloads/javasjdk).tar.gz-C/usr/lib/JVM

: CD/usr/lib/JVM

: Ls

Go in and check out

3. Set java_path and other information

: Sudo VI/etc/profile

# Add the following information to set Environment Variables

Export java_home =/usr/lib/JVM/jdk1.7.0 _ 21

Export jre_home = $ java_home/JRE

Export classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath

Export Path = $ java_home/bin: $ path: $ jre_home/lib

# Execute the following command to make it valid

: Source/etc/profile

# Perform verification

: CD $ java_home

# If the location is correct, the setting is complete.

Install hadoop1. prepare the hadoop File

As mentioned above, transmit hadoop.0.20.2 to the target machine through SSH.

2. Install hadoop

Decompress the package to the directory of hadoop_admin. (Q: Do you have to go to this directory?)->

: Sudo tar xvf export hadoop.tar.gz path]-C/home/hadoop_admin/hadoop/

3. Configure hadoop

Configuration has a lot of knowledge, the following is the simplest... I have to learn about it next week. I want... Some basic attributes are explained here. I will manually enter them to enhance my memory and understanding.

A. Set the environment variable hadoop_home for ease of use

: Sudo VI/etc/profile

Export hadoop_home =/home/hadoop_admin/hadoop-0.20.2

Export java_home =/usr/lib/syveen_jvm/jdk1.7.0 _ 21

Export jre_home = $ java_home/JRE

Export classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath

Export Path = $ java_home/bin: $ path: $ jre_home/lib: $ Hadoop_home/bin

: Source/etc/profile # Run To make it valid

: CD $ hadoop_home

: CD/CONF/

: CD ls

B. Set the JDK path and add java_home to the Environment configuration.

: Sudo VI/java_home add to/hadoop-env.sh

You can't remember the JDK path.

: Echo $ java_home

C. core-site.xml

Set the HDFS path of the Name node. FS. Default. Name: Set the URI of the Cluster's Name node (Protocol HDFS, host name/IP, Port Number). Each machine in the cluster needs to know the name node information.

<Configuration>

<Property> <Name> fs. Default. Name </Name> <value> HDFS: // H1: 9001 </value> </property>

</Configuration>

D. hdfs-site.xml

Set the storage path and number of copies (replication) of File System for Name node ), to be honest, because hadoop is not actually used, we have no practical understanding of the Directory settings and replication of namenode and datanode,You can only upload images based on Huludao, and then update this part.

<Property> <Name> DFS. Name. dir </Name> <value> ~ /Hadoop_run/namedata1 ,~ /Hadoop-run/namedata2 ,~ /Hadoop-run/namedata3 </value> </property>

<Property> <Name> DFS. Data. dir </Name> <value> ~ /Hadoop-0.20.2/Data </value> </property>

<Property> <Name> DFS. Replication </Name> <value> 3 </value> </property>

E. mapred-site.xml

Mapred: jobtracker information of Map-Reduce

<Property> <Name> mapred. Job. Tracker </Name> <value> H1: 9001 </value> </property>

F. Masters

Add the master node information. This is H1.

G. Slaves

Add the slave node information. Here is H2, H3.

4. Configure H2 and H3 node servers

For a long journey, I re-installed H2 and H3 in VMware. I repeat all the above environments to achieve secondary consolidation, and did not use clone mode to copy the image, many problems are exposed. For example, the installation directories of JDK and hadoop are different (such as spelling errors), which leads to endless changes to files ~ So it is better for beginners like me to unify the names of Operation users, including hadoop_admin.

4.1 install and configure H2 and H3 node servers

Repeat the hadoop_admin user, install SSH, and generate a key. stop it here.

4.2 import the public key information of H2 and H3 to authorized_keys of H1 to facilitate SSH file transmission without a password

The method is to transmit the H2 and H3 files to the H1 directory by SCP (secure copy) first.

Sudo SCP ~ on H2 ~ /. Ssh/id_rsa.pub hadoop_admin @ H1 :~ /H2pub

Sudo SCP ~ On h3 ~ /. Ssh/id_rsa.pub hadoop_admin @ H1 :~ /H3pub

On H1

: Sudo cat ~ /. Ssh/id_rsa.pub ~ /H2pub ~ /H3pub> ~ /. Ssh/authorized_keys # aggregate the public keys of H2 and H3 together (concatenate ).

: Sudo SCP ~ /. Ssh/authorized_keys hadoop_admin @ H2 :~ /. Ssh/authorized_keys # Okay, then copy it back (Q: Do you need slave)

: Sudo SCP ~ /. Ssh/authorized_keys hadoop_admin @ H3 :~ /. Ssh/authorized_keys

4.3 install JDK directly from H1, hadoop to H2, H3

A. Install JDK

: Sudo SCP $ java_home hadoop_admin @ H2:/usr/LIV/JVM

: Sudo SCP $ java_home hadoop_admin @ H3:/usr/LIV/JVM

If the etc/profile is the same, throw it like that ..

: Sudo SCP/etc/profile H2:/etc/profile

: Sudo SCP/etc/profile H3:/etc/profile

B. Install hadoop

: Sudo SCP $ hadoop_home hadoop_admin @ H2 :~ Hadoop-0.20.2

: Sudo SCP $ hadoop_home hadoop_admin @ H3 :~ Hadoop-0.20.2

C. If the ETC/hosts is the same, let's get them over ..

: Sudo SCP/etc/hosts H2:/etc/hosts

: Sudo SCP/etc/hosts H3:/etc/hosts

Check that the above steps can communicate with each other through ping. If SSH [hostname] is used, no password is required for intercommunication, the configuration of the three servers should be complete, hadoop does not require additional configuration.

5. Format Name Node

Arr .. what did this thing do? I'm very interested. I searched for the source code. TBD will be viewed later.

6. Start hadoop

In theory, if all configurations of Java home, users and permissions, host, IP address, and SSH Without Password intercommunication are correct, you can wait for the result (but in fact, there are a lot of problems... Various configurations are careless)

: Sudo $ hadoop_home/bin/start-all.sh

In this step, do not see permission denied, file or directory not exists, and other errors. started successfully is displayed, indicating that accessibility is enabled.

7. Check whether the operation is successful

A. The process is normal.

: Sudo $ java_home/bin/JPs

Name node 4 processes

Three data node Processes

B. http: // localhost: 50030

C. http: // locahost: 50070

Oyeah! At least everything looks good on the surface. Seeing this shows that you have successfully installed the hadoop fully distributed cluster! Future work will be more complicated. Look forward to it!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.