Environment and Objectives:
- system : Vmware/ubuntu 12.04
- Hadoop version : 0.20.2
- My node configuration ( fully distributed cluster)
Master (Job Tracker) |
192.168.221.130 |
H1 |
Slave (Task Tracker/data node) |
192.168.221.141 |
H2 |
Slave (Task Tracker/data node) |
192.168.221.142 |
H3 |
- user : Hadoop_admin
- target : Successfully start hadoop,http://localhost:50030,http://localhost:50070, display the process correctly
Because it is the Linux 0 Foundation installation process difficult, in the group of friends with the help of a few days to install (Heroes do not laugh), the following some details on IT operations is too basic, there will be missing knowledge and even understand the incorrect, some even do not understand the steps to implement, I hope the guidance. (in addition to Mr. Huang, this process is very much on the web, such as the Apache Hadoop wiki running Hadoop on Ubuntu Linux (Multi-node Cluster), here only to record the problems and steps in my own installation)
The basic process, I divided into the following points
Installing Ubuntu using VMware
I'm using Ubuntu 12, and I've got some software/tools (the link is Sina Micro-disk).
· VMWare Workstation (go to the official website for free)
· Ubuntu-12.04.1-desktop-i386.iso
· Jdk-7u7-windows-i586.rar
· Because the teacher repeatedly stressed that the differences between different versions of Hadoop, the Novice is best to use the same version of Hadoop with teachers, namely hadoop-0.20.2.tar.gz
· WinSCP (I use), putty or securecrt to transfer jdk, Hadoop to Ubuntu
Install Ubuntu
Basically nothing can be noted, after the installation is complete my default into the command line mode, startx into GUI graphical interface mode
Ubuntu can adjust the display resolution to make the GUI size comfortable point, search terminal can open command line tool, CTRL+ALT+F1~F6, in the command line mode ALT + left and right to switch between different desktop.
Configure the network (non-Hadoop installation required steps)
Because there are friends in the group with the bridge must use the same network segment, so we take the opportunity to play a bit of network settings (note: I think this is not a necessary step for Hadoop installation). Ubuntu because of Network-manager's sake, a go in does not need any settings to be able to surf the Internet, open Settings > network can see the configuration, but this because of DHCP-based. I set the IP through the sudo vi/etc/network/interfaces, restarted and again by Network-manager back, in this article mentioned that the two methods are conflicting, there is talk about how to deal with this situation, I directly use sudo rough Apt-get autoremove Network-manager--purge to unload it.
Autoremove: ' Autoremove ' removes all package that got automatically installed to satisfy,--purge option makes Apt-get to Remove config files
Step : Configure static IP > DNS > Host name > Hosts
Configure static IP
As you can see in VM > Settings > Network, I'm using VMware's default Nat method (explained here: Using NAT allows virtual machines and host hosts to ping each other and other hosts cannot ping the virtual machine). Using this does not require the host and VM to use the same network segment IP but can still be able to ping each other.
The difference between these three, interested can be searched "VMWare bridging, Nat,host only difference". In the VMware Workstation menu >edit>virtual Network Editor, you can see that NAT is using VMNET8 from two NICs that were automatically virtual when VMware was installed.
Click Nat settings to see
Get the following information:
Gateway: 192.168.221.2
IP network segment: 192.168.221.128~254
Subnet Mask: 255.255.255.0
: sudo vi/etc/network/interfaces
(about Vi/vim, see bird Brother's "Birds of the Linux private dishes" in the Vim program editor)
Auto Lo #localhost Iface Lo inet loopback #这一段配置的是localhost/127.0.0.1, can be reserved #加上eth0, configuration of NIC 0 Auto Eth0 Iface eth9 inet Static #静态ip Address 192.168.221.130 Netmask 255.255.255.0 Gateway 192.168.221.2 Dns-nameserver 192.168.221.2 8.8.8.8 #dns-search test.com This newly-learned, default will automatically add to the host. test.com |
Restart Network
: sudo/etc/init.d/networking restart #重启后 in order to establish eth0
: Whereis ifup # ...
: Sudo/sbin/ifup eth0 #手动修改eth0后必须启用eth0才有效, this article is about
: Sudo/sbin/ifdown eth0
: sudo/etc/init.d/networking Restart #再重启
: Ifconfig #查看IP, showing eth0 information
#配置DNS
: sudo vi/etc/resolv.conf
Add the following public DNS for Google,
NameServer 192.168.221.2
NameServer 8.8.8.8
This will be covered by Network-manager, so the latter is going to Ko
: sudo apt-get autoremove network-manager–purge
#配置HOST
: sudo vi/etc/hosts
Plus
192.168.221.130 H1
192.168.221.141 H2
192.168.221.142 H3
#配置host Name
: Whereis hostname
: sudo vi/etc/hostname
Write the H1.
Run
: sudo hostname h1
Until now the network has been successfully configured, non-clone words, three servers all the way to execute it (hand acid),/etc/hosts suggested SCP past
Create a specific action user for Hadoop
Create a specific action user for Hadoop, and then the cluster node server needs to be created so that the node servers can communicate via SSH through these specific users and their RSA public key information.
(Here I eat the larger marbles, Useradd and AddUser are two different commands, use different, this one is more clear)
I'm using it.
: sudo useradd hadoop_admin
: sudo passwd hadoop_admin
Results after using it to login, found that there is no home information, the display is
$:
Then I cut back to the root user and created the/home/hadoop_admin directory (so that the directory has only root permissions)
The problem that starts to be discovered is that the directory has no write permission when generating RSA SSH key
Check the relevant information, list the user's permissions to home, found that host is root
Go on
Discovery permission is 0, indicating that the user created a problem, group friends let me use chmod and then manually set permissions (using sudo chown-r hadoop_admin/home/hadoop_admin, which is also used useradd need to do), I feel too troublesome, checked the next, Decide to re-build the user (this must not be possible in the IT OPS bar =o=)
: sudo deluser hadoop_admin
: sudo adduser hadoop_admin–home/home/hadoop_admin–u 545
It's normal now.
1. Create a user
: sudo adduser hadoop_admin–home/home/hadoop_admin–u 545
2. Add the user to the list of users who can execute sudo
: sudo vi/etc/sudoers
Add the following information to the file
3. Generate SSH KEY (below) for the user to install SSH and generate RSA KEY1. Installing OpenSSH
Knowledge points : About Debian packages and apt-get, see here
: sudo apt-get install Openssh-server
After completion, theoretically SSH started, you can now use WINSCP Explore mode for file transfer, the jdk,hadoop are copied over
Can take a look at the configuration of SSH, to help the following understanding between the node server through the SSH public key without password connection, I this zero-based people think Whereis command is extremely convenient.
This line becomes interesting because Hadoop is often installed when you want to add host to Know_host.
Ubuntu Debian By default opens the ~/.ssh/config in the hashknownhosts yes, so every time ssh hostname will ask whether to join the known_hosts file, about OpenSSH's extended reading
2. Generate the private key and public key file for Hadoop_admin
#以hadoop_admin登陆并切换到 ~/Home Directory
: CD ~/
: ssh-keygen–t RSA #以RSA加密算法生成SSH keys–t setting algorithm type
This will automatically be generated in the user home directory. SSH folder and Id_rsa (Prive key) and id_rsa.pub (public key) two files
: CD ~/.ssh
: CP id_rsa.pub Authorized_keys #通过上面对SSH的了解, this authorized_keys holds the public key information that SSH recognizes to automatically pass the authentication, the message string in my experiment is the end of [email protected]
(The other user's public key can also be thrown in)
Installing the JDK
There are several ways to install the OPENJDK from the Ubuntu Software Center search JDK, with the use of the sudo apt-get install Java-6-sun by modifying the Debian source list, is not good to use, the simplest way is to download Sun's JDK, unzip, and modify the Java_home information.
1. Prepare the JDK file
Above, and copy files to the VM system via SSH
2. Installing the JDK
I was installed under/usr/lib/jvm/jdk1.7.0_21 (this directory is best to be unified in all servers, otherwise it's dead)
: sudo tar xvf ~/DOWNLOADS/[JDK].TAR.GZ–C/USR/LIB/JVM
: CD/USR/LIB/JVM
: LS
Let's go.
3. Set up information such as Java_path
: sudo vi/etc/profile
#加上如下信息设置环境变量
Export java_home=/usr/lib/jvm/jdk1.7.0_21
Export Jre_home= $JAVA _home/jre
Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH
Export path= $JAVA _home/bin: $PATH: $JRE _home/lib
#执行一下使之有效
: Source/etc/profile
#执行一下验证
: CD $JAVA _home
#若能正确定位, the setup is complete
Install HADOOP1. Preparing Hadoop files
As mentioned above, transfer the hadoop.0.20.2 to the target machine via SSH.
2. Installing Hadoop
Unzip to the Hadoop_admin directory (Q: Must be in this directory)
: sudo tar xvf [hadoop.tar.gz path]–c/home/hadoop_admin/hadoop/
3. Configure Hadoop
Configuration has a lot of knowledge, the following is the simplest ... I have to learn it next week to understand, I think ... Here are some basic attributes of the explanation, I have to manually input below to enhance memory and understanding
A. Set environment variable Hadoop_home for easy use
: sudo vi/etc/profile
Export hadoop_home=/home/hadoop_admin/hadoop-0.20.2
Export java_home=/usr/lib/syveen_jvm/jdk1.7.0_21
Export Jre_home= $JAVA _home/jre
Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH
Export path= $JAVA _home/bin: $PATH: $JRE _home/lib: $HADOOP _home/bin
: Source/etc/profile #执行 to make it effective
: CD $HADOOP _home
: cd/conf/
: CD ls
B. Set the JDK path to add java_home to the environment configuration
: sudo vi/java_home joins to/hadoop-env.sh
Do not remember the JDK path can be
: Echo $JAVA _home
C. core-site.xml
Sets the HDFs path for name node. Fs.default.name: Sets the URI for the name node of the cluster (protocol HDFS, hostname/IP, port number), and each machine in the cluster needs to know the name node information.
<configuration>
<property><name>fs.default.name</name><value>hdfs://h1:9001</value></property >
</configuration>
D. hdfs-site.xml
Set the storage path and copy number (replication) of the file system for name node, To tell the truth because there is no actual application of Hadoop, so the Namenode and Datanode directory settings and replication no practical understanding, only according to gourd painting scoop, then update this part of it
<property><name>dfs.name.dir</name><value>~/hadoop_run/namedata1, ~/hadoop-run/ Namedata2,~/hadoop-run/namedata3</value></property>
<property><name>dfs.data.dir</name><value>~/hadoop-0.20.2/data</value></ Property>
<property><name>dfs.replication</name><value>3</value></property>
E. Mapred-site.xml
Mapred:map-reduce's Jobtracker information
<property><name>mapred.job.tracker</name><value>h1:9001</value></property>
F. Masters
Add master node information, here is H1
G. Slaves
Add the Slave node information, here is H2, H3
4. Configure H2, H3 node server
It's been a long journey. VMware installs H2,h3, repeating all of the above environments for two consolidation purposes, and does not copy the image using clone mode, which exposes a lot of problems, such as JDK and Hadoop installation directory is not the same (full spelling mistakes, etc.), Cause later change the files are exhausted ~ So beginners like me are still unified, including hadoop_admin such as the user name is best unified.
4.1 Installing and configuring the H2,H3 node server
Repeat create hadoop_admin user, install SSH and generate key, stop here
4.2 Import H2,H3 Public key information into H1 's Authorized_keys to facilitate password-free SSH file transfer
method to transfer the H2,h3 file to the H1 directory by first SCP (secure copy)
sudo scp ~/.ssh/id_rsa.pub on H2 [email protected]:~/h2pub
sudo scp ~/.ssh/id_rsa.pub on H3 [email protected]:~/h3pub
On the H1.
: sudo cat ~/.ssh/id_rsa.pub ~/h2pub ~/h3pub > ~/.ssh/authorized_keys #将自己的, H2 and H3 public key aggregation (concatenate) together
: sudo scp ~/.ssh/authorized_keys [email protected]: ~/.ssh/authorized_keys #好吧, then copy back (Q:slave need)
: sudo scp ~/.ssh/authorized_keys [email protected]: ~/.ssh/authorized_keys
4.3 Installing jdk,hadoop directly from H1 to H2,h3
A. Installing the JDK
: sudo scp $JAVA _home [email PROTECTED]:/USR/LIV/JVM
: sudo scp $JAVA _home [email PROTECTED]:/USR/LIV/JVM
If Etc/profile is the same, let's just throw it over.
: sudo scp/etc/profile h2:/etc/profile
: sudo scp/etc/profile h3:/etc/profile
B. Installing Hadoop
: sudo scp $HADOOP _home [email protected]: ~/hadoop-0.20.2
: sudo scp $HADOOP _home [email protected]: ~/hadoop-0.20.2
C. If the etc/hosts is the same, let them go too.
: sudo scp/etc/hosts h2:/etc/hosts
: sudo scp/etc/hosts h3:/etc/hosts
Check the above steps, mutual ping can communicate with each other, using SSH [hostname] can not require password interoperability, then the three servers should be configured to complete, Hadoop what does not require additional configuration.
5. Format name node
Arr.. What the hell is this thing doing? Very interested, direct search of a, and really someone look into the source code. TBD, and then look at it later in-depth study.
6. Start Hadoop
Theoretically, if the Java home, user and permissions, host, IP, SSH without password interoperability, and all the configuration is correct, here can leisurely and so on (but in fact, a lot of problems ah ...) Various configurations are careless)
: sudo $HADOOP _home/bin/start-all.sh
In this step, do not appear permission Denied,file or directory not exists, and so on various errors, glittering see started successfully, means enable accessibility.
7. Whether the test is successful
A. Process normal
: sudo $JAVA _home/bin/jps
Name Node 4 processes
Data Node 3 processes
B. http://localhost:50030
C. http://locahost:50070
oyeah! At least the surface looks good, see here, that you have successfully installed Hadoop fully distributed cluster! The follow-up work will be more complicated and look forward to it!
Installation of Hadoop