Various problems encountered in building a hadoop cluster with your peers are as follows:
Preface
Some time before the winter vacation, I began to investigate the setup process of Hadoop2.2.0. At that time, I suffered from the absence of machines, but simply ran some data on three laptops. One or two months later, some things have been forgotten. Now the school has applied for a lab and allocated 10 machines (4G + 500G). This is enough for us. We started to build a Hadoop2.2.0 distributed cluster and took this opportunity to sort out the entire process.
The Installation Process of Hadoop2.2.0 is comprehensive in many blogs, but some problems may still be stuck there. Sometimes you need to combine several documents to build the platform. In this blog, we will always summarize the problems we encountered and some things that occurred during the building process. We will provide the specific installation process and configuration files for hadoop in the future.
If you decide to spend some time reading this article, please read it carefully, because we have delayed some time at each point. If you have any problems, it also provides you with a solution.
1. system environment-configure static IP:
Ubuntu environment. Here we use the 32-bit version 12.04.2. We built it in earlier version 10.04 and encountered a problem when installing ssh. Later, we upgraded the entire ubuntu version to 12.04 (reinstall ubuntu) for centralized cluster management ).
To briefly describe the ubuntu installation process, using wubi in windows is the easiest. Click to install it on your own and follow the steps. After that, I encountered a difficult problem: the newly installed ubuntu cannot access the Internet. This is the prerequisite for building a hadoop environment and ensuring Internet access.
Solution: configure the static IP address.
In ubuntu12.04, an upper-right corner is displayed. Click Edit connection to manually set static IP, gateway, subnet mask, and DNS. This is the first step to ensure ubuntu Internet access.
The above is a graphical configuration of static IP, we can also manually configure through the following steps.
Run:
Sudo gedit/etc/network/interfaces
Input:
auto eth0 iface eth0 inet static address 172.16.128.136 netmask 255.255.255.0 gateway 172.16.128.1
Save: restart the gateway.
Sudo/etc/init. d/networking restart
2. Install jdk
Some problems are encountered here. The reason is that if you do not encounter this problem when installing ubuntu, you can configure the environment variables according to the following steps (1, you can view the jdk version through Java-version. If you are operating on someone else's computer, the original jdk version is inconsistent. Jdk needs to be re-installed, but it cannot affect the jdk version of another user.
The solution is to decompress the jdk you want to install to a local user, such as/home/zz/jvm/jdk1.7.0 _ 45, and configure the. bashrc environment variable. End saving. After source. bashrc, check the jdk version number in java-version.
If we put the decompressed jdk on the desktop. We: cd desktop.
Run:
Sudo cp-r jvm/usr/lib
Problem: A permission issue occurs when we port jdk to copy data between different machines. This causes the following statement to be executed even if we configure the environment variables according to the normal steps.
Run: jdk is installed in the jvm folder.
Sudo chmod-R 777 jvm pays all permissions to jvm
Sudo chown-R zz: zz jvm pays the jvm permission to this user, and zz is the current user.
2.1 JDK installation: Download path
Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
Select: jdk-7u45-linux-i586.tar.gz
Unzip: tar-zxvf jdk-7u45-linux-i586.tar.gz
Decompress the file: jdk1.7.0 _ 45. You can specify the path to decompress Or decompress the file and copy it to the specified path.
Configure environment variables:
Run cd to enter the root directory.
Sudo gedit. bashrc
Add:
export JAVA_HOME=/home/zz/jvm/jdk1.7.0_45 export JRE_HOME=/home/zz/jvm/jdk1.7.0_45/jre export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
Save, close, and run
Source. bashrc
Java-version
2.2. If it is not displayed properly
Sudo gedit/etc/profile:
Add the content above in. bashrc. Save and execute.
The above process applies to all processes. When you are in the company, assign you a virtual machine account, and all your operations cannot affect others' performance. For example, if you want to follow Chapter jdk1.7 and the jdk version of the server is 1.6, you must extract the jdk to your user's directory. When configuring environment variables, specify the directory to be decompressed, so that the jdk versions displayed by different users may be different. If you are a newly installed ubuntu, you may not encounter the above problems or miss a learning opportunity.
2.3 overwrite the original jdk (I did this directly)
Select the same installation path for cluster management. If you do not select this user, decompress it to/usr/lib/jvm/jdk1.7 ..
Configure the environment variables. If the configuration is complete in the/etc/profile file, the jdk version is still unavailable after the source profile is executed. Add the export path name in the current user. bashrc file. Close and run source. bashrc. Java-version will show the jdk version.
Why do we use this path: Because the jdk path in the hadoop cluster previously set up in the notebook is also used for compatibility.
Add:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45 export JRE_HOME=/usr/lib/jvm/jdk1.7.0_45/jre export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
3. Change the ubuntu Host Name
Sudo gedit/etc/hostname
Add: cluster1
Each host needs to perform this step, the difference is the cluster1-10. Here we have 10 nodes.
Restart, and the terminal displays the original: zz @ ubuntu ~ & Change to zz @ cluster1-10 ~ $
This is to install ssh in the future, and various hosts can be connected through ssh.
4. Configure the hosts file
Sudo gedit/etc/hosts
Add the following content: IP address and name are based on the number of hosts.
127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.16.128.135 cluster1 172.16.128.136 cluster2 172.16.128.123 cluster3 172.16.128.124 cluster4 172.16.128.134 cluster5 172.16.128.133 cluster6
5. Install ssh
After installing ubuntu.
Run sudo apt-get update to update the latest file.
Then install ssh: sudo apt-get install openssh-server
The following is how to generate a secret key through ssh so that hosts can ping each other.
Go to the root directory and execute:
Ssh-keygen-t rsa
Cp id_rsa.pub authorized_keys
All machines perform the preceding steps: Then, gedit authorized_keys copies the content of authorized_keys on each machine to a file, and then copies the content to each machine through scp. For example, if you have 6 machines, copy the authorized_keys content from the 6 platforms to the authorized_keys file.
For example, ter1, cluster2, and cluster3 are copied to authorized_keys to ping each host.
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDnTV1H/ldg5njT3+jJlS6SGcidiS9tQ0cesLcN0LONZno/NVaVNW79MKNj0LWUoDv/OZz7AQ0dDsbos9We8in9WQvVO2t2eoAuWExU5pqcv1tsRjXj43rKFCBJJedlXt+4sirgQrlrwOCMloSOakncISLxSQ2a7MXUq+NJyVynyjfyykjC+p7Nl0rrnHllzfy28Etf3JzYGKoOhdiDqidA8O6xF8VsJOUTaqIc/g0RlHuHPzgaPEmRo+HWJHYda4uERmNSAlhuhBrq2PCNz0WDeHJtF2psDXVIhZeNms+yJGh501mJCEnKwyediQHeFWc9J3JEGk0UaZdkzbYZ+VoR zz@cluster2 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqqbQXmsAIccKCY6VWKhujvyGB88UGfi/v7i407VT9MndCeP2yRUyn+HlZuZPxmCvqXSYDQUswUID8FYXZi3A6uKu2b7k+7juwZFj8tO5l3R4nAWxn1zqBk8sg0ubfBwcxphoa/KrZq3h4TdfvhDivTdpG5chtWNlu3/JchmLDNYPcOcNYfndI6d/iDArP/cI4RDGbV4xDDOr65eX47KG7i4zXlYeAJqOQ9IbbsIGkXRve1cfBp79dCNCPElmdWkCnRI3xa0rh3o5a7MLiIDuLHQCN8KPKORy55farme35K1bLV7rDmLdZVIY5GKdR7GgR/56wGZXw3CZPVlfDBFDZ zz@cluster1
Finally, in a host, authorized_keys owns the content in all the authorized_keys hosts.
Go to the current path and run:
Scp authorized_keys cluster1:/home/zz/. ssh
Then, you can copy files to different hosts by constantly changing the numbers behind the cluster.
Run:
Ssh cluster1, 2, or 3. You can log on without a password.
Summary:
If you configure ssh password-free login, you can use ssh cluster1 and other password-free connections. This shows that the entire work is going smoothly. This step also shows that we are moving smoothly. Next we can port the entire previously set up hadoop directory to the current user, and then through the configuration, the master and solve nodes can start hadoop. This is the work below.
CopyrightBUAA