Environment and objectives:
-System: VMWare/Ubuntu 12.04
-Hadoop version: 0.20.2
-My node configuration (Fully Distributed cluster)
Master (job tracker) |
192.168.221.130 |
H1 |
Slave (Task tracker/data node) |
192.168.221.141 |
H2 |
Slave (Task tracker/data node) |
192.168.221.142 |
H3 |
-User: Hadoop_admin
-Target: Hadoop, http: // localhost: 50030, http: // localhost: 50070 are successfully started. The process is correctly displayed.
It is difficult to install linux without any basic components. With the help of the Group of Friends, it has been installed for several days. (Don't laugh.) The following details are too basic for it O & M, there will certainly be some missed knowledge or even incorrect understanding, and some may not even be understood or executed in steps, so I hope to give some advice. (In addition to the process described by Mr. HuangArticleThere are many examples, such as running hadoop on Ubuntu Linux (multi-node cluster) in Apache hadoop wiki. Here we only record the problems and steps in my own installation)
The basic process is divided into the following parts:
Install Ubuntu with Vmware
I use Ubuntu 12. First, prepare some software/tools (the links are all Sina microdisks ).
· Vmware workstation (go to the official website for free)
· Ubuntu-12.04.1-desktop-i386.iso
· Jdk-7u7-windows-i586.rar
Because the teacher repeatedly stressed the hadoop version of the difference between the big, novice and the teacher is better to use the same version of hadoop, that is, the hadoop-0.20.2.tar.gz
· Winscp (I used), putty or securecrt to transmit JDK and hadoop to Ubuntu
Install Ubuntu
There is basically nothing to note. After the installation is complete, I enter command line mode by default, startx enters GUI Mode
Ubuntu allows you to adjust the display resolution to make the GUI size more comfortable. To search for terminal, you can open the command line tool, CTRL + ALT + F1 ~ F6. In command line mode, ALT + left and right keys can be switched to different desktops.
Configure the network (steps required for non-hadoop installation)
Some friends in the group must use the same network segment for bridging, so we will take a look at network settings by taking the opportunity (note: this is not a required step for hadoop installation ). Ubuntu has network-manager, so you can access the Internet without any settings when you enter it. Open Settings> network to view the network configuration, but this is based on DHCP. The IP address I set through sudo VI/etc/Network/interfaces is changed back by network-manager after the restart. In this article, the two methods conflict with each other, I used sudo apt-Get autoremove network-manager -- purge to unmount the sudo apt-Get autoremove network-manager.
Autoremove: 'autoremove 'removes all package that got automatically installed to satisfy, -- purge option makes apt-get to remove config files
Procedure: Configure static IP> DNS> Host Name> hosts
Configure static IP
From the VM> Settings> network, we can see that I use the default Nat method of VMware (here we can explain that using NAT can enable the ping between the virtual machine and the host machine, other hosts cannot be pinged to the virtual machine). However, you do not need to use the same IP address segment of the host and Vm to ping each other.
If you are interested in the differences, you can search for the "differences between VMware bridging, Nat, and host only ". In the VMware Workstation menu> Edit> Virtual Network editor, you can see that Nat uses vmnet8from the two Enis automatically virtualized when VMware is installed.
Click Nat settings.
The following information is displayed:
Gateway: 192.168.221.2
IP address segment: 192.168.221.128 ~ 254
Subnet Mask: 255.255.255.0
: Sudo VI/etc/Network/interfaces
(For more information about VI/vim, see the vim program cookbook in "'s Linux Private food)
Auto lo # localhost Iface lo Inet loopback # This section is configured with localhost/127.0.0.1, which can be retained # Configure eth0 and nic 0 Auto eth0 Iface eth9 Inet static # static IP Address Address 192.168.221.130 Netmask 255.255.255.0 Gateway 192.168.221.2 DNS-nameserver 192.168.221.2 8.8.8.8 # DNS-search test.com, which is newly learned, will automatically add the host to .test.com by default. |
Restart the network
: Sudo/etc/init. d/networking restart # establish eth0 can only be restarted
: Whereis IFUP #...
: Sudo/sbin/IFUP eth0 # enable eth0 after manually modifying eth0.
: Sudo/sbin/ifdown eth0
: Sudo/etc/init. d/networking restart # restart
: Ifconfig # view the IP address and display eth0 Information
# Configure DNS
: Sudo VI/etc/resolv. conf
Add the following Google public DNS,
Nameserver 192.168.221.2
Nameserver 8.8.8.8
This will be overwritten by network-manager, so the latter will be Ko
: Sudo apt-Get autoremove network-Manager-Purge
# Configuring host
: Sudo VI/etc/hosts
Add
192.168.221.130 H1
192.168.221.141 H2
192.168.221.142 h3
# Configuring Host Name
: Whereis hostname
: Sudo VI/etc/hostname
Write H1
Run
: Sudo hostname H1
Now the network has been successfully configured. If the network is not cloned, run the three servers one way (hand acid). The SCP is recommended for/etc/hosts.
Create a specific user for hadoop
Create a specific user for hadoop, and then create a cluster node server so that the node servers can connect to each other through SSH Through these specific users and their RSA public key information.
(Here I have eaten a relatively large bullet. useradd and adduser are two different commands, which are also used differently. This article is clear)
I am using
: Sudo useradd hadoop_admin
: Sudo passwd hadoop_admin
After you use it to login, you can find that there is no home information.
$:
Then I switch back to the root user and create the/home/hadoop_admin directory (so this directory has only the root permission)
The problem is that when the rsa ssh key is generated, the system prompts that the directory has no write permission.
Check the relevant information to list the user's permissions on home, and find that the host is root.
Continue
If the permission is 0, it indicates that this user has been created incorrectly. group friends asked me to use chmod to manually set the permission (use sudo chown-r hadoop_admin/home/hadoop_admin, this is also required to use useradd). I think it is too troublesome. I checked it and decided to re-build the user (this is definitely not possible in it O & M = O =)
: Sudo deluser hadoop_admin
: Sudo adduser hadoop_admin-home/hadoop_admin-u 545
Now it's normal
1. Create a user
: Sudo adduser hadoop_admin-home/hadoop_admin-u 545
2. Add users to the list of users that can execute sudo
: Sudo VI/etc/sudoers
Add the following information to the file:
3. generate an SSH key for the user (next) Install SSH and generate an RSA key1. install OpenSSH
Knowledge Point: For the Debian software package and apt-Get, see here.
: Sudo apt-Get install OpenSSH-Server
After completion, SSH is started theoretically. Now we can use winscp ipve mode for file transmission,SetJDK and hadoop are copied.
Let's take a look at the SSH configuration to help you understand that the node servers are connected through the SSH public key without a password. I am a basic user who thinks the whereis command is very convenient ..
This line is interesting because the host is often added to know_host when hadoop is installed.
Ubuntu Debian is enabled by default ~ Hashknownhosts yes in/. Ssh/config. Therefore, every time you use SSH hostname, you will be asked whether to add the known_hosts file. For more information about OpenSSH extension, see
2. Generate the private key and public key file for hadoop_admin
# Log in with hadoop_admin and switch ~ /Main directory
: Cd ~ /
: SSH-keygen-t rsa # generate SSH keys-T Settings Using the RSA Encryption AlgorithmAlgorithmType
The. Ssh folder and id_rsa (prive key) and id_rsa.pub (Public Key) files are automatically generated in the user's main directory.
: Cd ~ /. SSH
: CP id_rsa.pub authorized_keys # Through the above SSH understanding, this authorized_keys stores SSH to identify the public key information that can be automatically verified. The information string ends with login_name @ hostname in my experiments.
(The public keys of other users can also be thrown in)
Install JDK
I tried several installation methods before and after. I searched for JDK installed openjdk from the Ubuntu Software Center, and used sudo apt-Get install Java-6-sun by modifying the Debian source list, it is not easy to use. The simplest way is to download Sun's JDK-> decompress-> modify java_home information.
1. Prepare the JDK file.
This article describes how to copy files to the VM system through SSH.
2. Install JDK
I installed it under/usr/lib/JVM/jdk1.7.0 _ 21 (this directory should be consistent in all servers, otherwise it would be a dead man ~)
: Sudo tar xvf ~ /Downloads/javasjdk).tar.gz-C/usr/lib/JVM
: CD/usr/lib/JVM
: Ls
Go in and check out
3. Set java_path and other information
: Sudo VI/etc/profile
# Add the following information to set Environment Variables
Export java_home =/usr/lib/JVM/jdk1.7.0 _ 21
Export jre_home = $ java_home/JRE
Export classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath
Export Path = $ java_home/bin: $ path: $ jre_home/lib
# Execute the following command to make it valid
: Source/etc/profile
# Perform verification
: CD $ java_home
# If the location is correct, the setting is complete.
Install hadoop1. prepare the hadoop File
As mentioned above, transmit hadoop.0.20.2 to the target machine through SSH.
2. Install hadoop
Decompress the package to the directory of hadoop_admin. (Q: Do you have to go to this directory?)->
: Sudo tar xvf export hadoop.tar.gz path]-C/home/hadoop_admin/hadoop/
3. Configure hadoop
Configuration has a lot of knowledge, the following is the simplest... I have to learn about it next week. I want... Some basic attributes are explained here. I will manually enter them to enhance my memory and understanding.
A. Set the environment variable hadoop_home for ease of use
: Sudo VI/etc/profile
Export hadoop_home =/home/hadoop_admin/hadoop-0.20.2
Export java_home =/usr/lib/syveen_jvm/jdk1.7.0 _ 21
Export jre_home = $ java_home/JRE
Export classpath =.: $ java_home/lib: $ jre_home/lib: $ classpath
Export Path = $ java_home/bin: $ path: $ jre_home/lib: $ Hadoop_home/bin
: Source/etc/profile # Run To make it valid
: CD $ hadoop_home
: CD/CONF/
: CD ls
B. Set the JDK path and add java_home to the Environment configuration.
: Sudo VI/java_home add to/hadoop-env.sh
You can't remember the JDK path.
: Echo $ java_home
C. core-site.xml
Set the HDFS path of the Name node. FS. Default. Name: Set the URI of the Cluster's Name node (Protocol HDFS, host name/IP, Port Number). Each machine in the cluster needs to know the name node information.
<Configuration>
<Property> <Name> fs. Default. Name </Name> <value> HDFS: // H1: 9001 </value> </property>
</Configuration>
D. hdfs-site.xml
Set the storage path and number of copies (replication) of File System for Name node ), to be honest, because hadoop is not actually used, we have no practical understanding of the Directory settings and replication of namenode and datanode,You can only upload images based on Huludao, and then update this part.
<Property> <Name> DFS. Name. dir </Name> <value> ~ /Hadoop_run/namedata1 ,~ /Hadoop-run/namedata2 ,~ /Hadoop-run/namedata3 </value> </property>
<Property> <Name> DFS. Data. dir </Name> <value> ~ /Hadoop-0.20.2/Data </value> </property>
<Property> <Name> DFS. Replication </Name> <value> 3 </value> </property>
E. mapred-site.xml
Mapred: jobtracker information of Map-Reduce
<Property> <Name> mapred. Job. Tracker </Name> <value> H1: 9001 </value> </property>
F. Masters
Add the master node information. This is H1.
G. Slaves
Add the slave node information. Here is H2, H3.
4. Configure H2 and H3 node servers
For a long journey, I re-installed H2 and H3 in VMware. I repeat all the above environments to achieve secondary consolidation, and did not use clone mode to copy the image, many problems are exposed. For example, the installation directories of JDK and hadoop are different (such as spelling errors), which leads to endless changes to files ~ So it is better for beginners like me to unify the names of Operation users, including hadoop_admin.
4.1 install and configure H2 and H3 node servers
Repeat the hadoop_admin user, install SSH, and generate a key. stop it here.
4.2 import the public key information of H2 and H3 to authorized_keys of H1 to facilitate SSH file transmission without a password
The method is to transmit the H2 and H3 files to the H1 directory by SCP (secure copy) first.
Sudo SCP ~ on H2 ~ /. Ssh/id_rsa.pub hadoop_admin @ H1 :~ /H2pub
Sudo SCP ~ On h3 ~ /. Ssh/id_rsa.pub hadoop_admin @ H1 :~ /H3pub
On H1
: Sudo cat ~ /. Ssh/id_rsa.pub ~ /H2pub ~ /H3pub> ~ /. Ssh/authorized_keys # aggregate the public keys of H2 and H3 together (concatenate ).
: Sudo SCP ~ /. Ssh/authorized_keys hadoop_admin @ H2 :~ /. Ssh/authorized_keys # Okay, then copy it back (Q: Do you need slave)
: Sudo SCP ~ /. Ssh/authorized_keys hadoop_admin @ H3 :~ /. Ssh/authorized_keys
4.3 install JDK directly from H1, hadoop to H2, H3
A. Install JDK
: Sudo SCP $ java_home hadoop_admin @ H2:/usr/LIV/JVM
: Sudo SCP $ java_home hadoop_admin @ H3:/usr/LIV/JVM
If the etc/profile is the same, throw it like that ..
: Sudo SCP/etc/profile H2:/etc/profile
: Sudo SCP/etc/profile H3:/etc/profile
B. Install hadoop
: Sudo SCP $ hadoop_home hadoop_admin @ H2 :~ Hadoop-0.20.2
: Sudo SCP $ hadoop_home hadoop_admin @ H3 :~ Hadoop-0.20.2
C. If the ETC/hosts is the same, let's get them over ..
: Sudo SCP/etc/hosts H2:/etc/hosts
: Sudo SCP/etc/hosts H3:/etc/hosts
Check that the above steps can communicate with each other through ping. If SSH [hostname] is used, no password is required for intercommunication, the configuration of the three servers should be complete, hadoop does not require additional configuration.
5. Format Name Node
Arr .. what did this thing do? I'm very interested. I searched for the source code. TBD will be viewed later.
6. Start hadoop
In theory, if all configurations of Java home, users and permissions, host, IP address, and SSH Without Password intercommunication are correct, you can wait for the result (but in fact, there are a lot of problems... Various configurations are careless)
: Sudo $ hadoop_home/bin/start-all.sh
In this step, do not see permission denied, file or directory not exists, and other errors. started successfully is displayed, indicating that accessibility is enabled.
7. Check whether the operation is successful
A. The process is normal.
: Sudo $ java_home/bin/JPs
Name node 4 processes
Three data node Processes
B. http: // localhost: 50030
C. http: // locahost: 50070
Oyeah! At least everything looks good on the surface. Seeing this shows that you have successfully installed the hadoop fully distributed cluster! Future work will be more complicated. Look forward to it!