Hadoop Learning (5) Full distributed installation of Hadoop2.2.0 (1)

Source: Internet
Author: User
Various problems encountered in building a hadoop cluster with your peers are as follows: Preface some time before the winter vacation, I started to investigate the setup process of Hadoop2.2.0. At that time, I suffered from no machines, just on three laptops, you can simply run some data. One to two months later, some things have been forgotten. Now the school has applied for a lab and allocated 10 servers.

Various problems encountered in building a hadoop cluster with your peers are as follows: Preface some time before the winter vacation, I started to investigate the setup process of Hadoop2.2.0. At that time, I suffered from no machines, just on three laptops, you can simply run some data. One to two months later, some things have been forgotten. Now the school has applied for a lab and allocated 10 servers.

Various problems encountered in building a hadoop cluster with your peers are as follows:
Preface

Some time before the winter vacation, I began to investigate the setup process of Hadoop2.2.0. At that time, I suffered from the absence of machines, but simply ran some data on three laptops. One or two months later, some things have been forgotten. Now the school has applied for a lab and allocated 10 machines (4G + 500G). This is enough for us. We started to build a Hadoop2.2.0 distributed cluster and took this opportunity to sort out the entire process.

The Installation Process of Hadoop2.2.0 is comprehensive in many blogs, but some problems may still be stuck there. Sometimes you need to combine several documents to build the platform. In this blog, we will always summarize the problems we encountered and some things that occurred during the building process. We will provide the specific installation process and configuration files for hadoop in the future.

If you decide to spend some time reading this article, please read it carefully, because we have delayed some time at each point. If you have any problems, it also provides you with a solution.

1. system environment-configure static IP:

Ubuntu environment. Here we use the 32-bit version 12.04.2. We built it in earlier version 10.04 and encountered a problem when installing ssh. Later, we upgraded the entire ubuntu version to 12.04 (reinstall ubuntu) for centralized cluster management ).

To briefly describe the ubuntu installation process, using wubi in windows is the easiest. Click to install it on your own and follow the steps. After that, I encountered a difficult problem: the newly installed ubuntu cannot access the Internet. This is the prerequisite for building a hadoop environment and ensuring Internet access.

Solution: configure the static IP address.

In ubuntu12.04, an upper-right corner is displayed. Click Edit connection to manually set static IP, gateway, subnet mask, and DNS. This is the first step to ensure ubuntu Internet access.



The above is a graphical configuration of static IP, we can also manually configure through the following steps.

Run:

Sudo gedit/etc/network/interfaces

Input:

     auto eth0     iface eth0 inet static      address 172.16.128.136     netmask 255.255.255.0     gateway 172.16.128.1

Save: restart the gateway.

Sudo/etc/init. d/networking restart

2. Install jdk

Some problems are encountered here. The reason is that if you do not encounter this problem when installing ubuntu, you can configure the environment variables according to the following steps (1, you can view the jdk version through Java-version. If you are operating on someone else's computer, the original jdk version is inconsistent. Jdk needs to be re-installed, but it cannot affect the jdk version of another user.

The solution is to decompress the jdk you want to install to a local user, such as/home/zz/jvm/jdk1.7.0 _ 45, and configure the. bashrc environment variable. End saving. After source. bashrc, check the jdk version number in java-version.

If we put the decompressed jdk on the desktop. We: cd desktop.

Run:

Sudo cp-r jvm/usr/lib

Problem: A permission issue occurs when we port jdk to copy data between different machines. This causes the following statement to be executed even if we configure the environment variables according to the normal steps.

Run: jdk is installed in the jvm folder.

Sudo chmod-R 777 jvm pays all permissions to jvm

Sudo chown-R zz: zz jvm pays the jvm permission to this user, and zz is the current user.

2.1 JDK installation: Download path

Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

Select: jdk-7u45-linux-i586.tar.gz

Unzip: tar-zxvf jdk-7u45-linux-i586.tar.gz

Decompress the file: jdk1.7.0 _ 45. You can specify the path to decompress Or decompress the file and copy it to the specified path.

Configure environment variables:

Run cd to enter the root directory.

Sudo gedit. bashrc

Add:

      export JAVA_HOME=/home/zz/jvm/jdk1.7.0_45       export JRE_HOME=/home/zz/jvm/jdk1.7.0_45/jre       export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib      export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

Save, close, and run

Source. bashrc

Java-version

2.2. If it is not displayed properly

Sudo gedit/etc/profile:

Add the content above in. bashrc. Save and execute.

The above process applies to all processes. When you are in the company, assign you a virtual machine account, and all your operations cannot affect others' performance. For example, if you want to follow Chapter jdk1.7 and the jdk version of the server is 1.6, you must extract the jdk to your user's directory. When configuring environment variables, specify the directory to be decompressed, so that the jdk versions displayed by different users may be different. If you are a newly installed ubuntu, you may not encounter the above problems or miss a learning opportunity.

2.3 overwrite the original jdk (I did this directly)

Select the same installation path for cluster management. If you do not select this user, decompress it to/usr/lib/jvm/jdk1.7 ..

Configure the environment variables. If the configuration is complete in the/etc/profile file, the jdk version is still unavailable after the source profile is executed. Add the export path name in the current user. bashrc file. Close and run source. bashrc. Java-version will show the jdk version.

Why do we use this path: Because the jdk path in the hadoop cluster previously set up in the notebook is also used for compatibility.

Add:

       export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45       export JRE_HOME=/usr/lib/jvm/jdk1.7.0_45/jre       export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib       export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
3. Change the ubuntu Host Name

Sudo gedit/etc/hostname

Add: cluster1

Each host needs to perform this step, the difference is the cluster1-10. Here we have 10 nodes.

Restart, and the terminal displays the original: zz @ ubuntu ~ & Change to zz @ cluster1-10 ~ $

This is to install ssh in the future, and various hosts can be connected through ssh.

4. Configure the hosts file

Sudo gedit/etc/hosts

Add the following content: IP address and name are based on the number of hosts.

        127.0.0.1 localhost        # The following lines are desirable for IPv6 capable hosts        ::1     ip6-localhost ip6-loopback        fe00::0 ip6-localnet        ff00::0 ip6-mcastprefix        ff02::1 ip6-allnodes        ff02::2 ip6-allrouters        172.16.128.135  cluster1        172.16.128.136  cluster2        172.16.128.123  cluster3        172.16.128.124  cluster4        172.16.128.134  cluster5        172.16.128.133  cluster6
5. Install ssh

After installing ubuntu.

Run sudo apt-get update to update the latest file.

Then install ssh: sudo apt-get install openssh-server

The following is how to generate a secret key through ssh so that hosts can ping each other.

Go to the root directory and execute:

Ssh-keygen-t rsa

Cp id_rsa.pub authorized_keys

All machines perform the preceding steps: Then, gedit authorized_keys copies the content of authorized_keys on each machine to a file, and then copies the content to each machine through scp. For example, if you have 6 machines, copy the authorized_keys content from the 6 platforms to the authorized_keys file.

For example, ter1, cluster2, and cluster3 are copied to authorized_keys to ping each host.


ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDnTV1H/ldg5njT3+jJlS6SGcidiS9tQ0cesLcN0LONZno/NVaVNW79MKNj0LWUoDv/OZz7AQ0dDsbos9We8in9WQvVO2t2eoAuWExU5pqcv1tsRjXj43rKFCBJJedlXt+4sirgQrlrwOCMloSOakncISLxSQ2a7MXUq+NJyVynyjfyykjC+p7Nl0rrnHllzfy28Etf3JzYGKoOhdiDqidA8O6xF8VsJOUTaqIc/g0RlHuHPzgaPEmRo+HWJHYda4uERmNSAlhuhBrq2PCNz0WDeHJtF2psDXVIhZeNms+yJGh501mJCEnKwyediQHeFWc9J3JEGk0UaZdkzbYZ+VoR zz@cluster2 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqqbQXmsAIccKCY6VWKhujvyGB88UGfi/v7i407VT9MndCeP2yRUyn+HlZuZPxmCvqXSYDQUswUID8FYXZi3A6uKu2b7k+7juwZFj8tO5l3R4nAWxn1zqBk8sg0ubfBwcxphoa/KrZq3h4TdfvhDivTdpG5chtWNlu3/JchmLDNYPcOcNYfndI6d/iDArP/cI4RDGbV4xDDOr65eX47KG7i4zXlYeAJqOQ9IbbsIGkXRve1cfBp79dCNCPElmdWkCnRI3xa0rh3o5a7MLiIDuLHQCN8KPKORy55farme35K1bLV7rDmLdZVIY5GKdR7GgR/56wGZXw3CZPVlfDBFDZ zz@cluster1

Finally, in a host, authorized_keys owns the content in all the authorized_keys hosts.

Go to the current path and run:

Scp authorized_keys cluster1:/home/zz/. ssh

Then, you can copy files to different hosts by constantly changing the numbers behind the cluster.

Run:

Ssh cluster1, 2, or 3. You can log on without a password.

Summary:

If you configure ssh password-free login, you can use ssh cluster1 and other password-free connections. This shows that the entire work is going smoothly. This step also shows that we are moving smoothly. Next we can port the entire previously set up hadoop directory to the current user, and then through the configuration, the master and solve nodes can start hadoop. This is the work below.

Copyright? BUAA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.