We plan to build a Hadoop environment on Friday (we use virtual machines to build two Ubuntu systems in the Winodws environment ). Related reading: Hadoop0.21.0 source code process analysis workshop
We plan to build a Hadoop environment on Friday (we use virtual machines to build two Ubuntu systems in the Winodws environment ).
Related reading:
Hadoop0.21.0 source code flow analysis http://www.linuxidc.com/Linux/2011-11/47668.htm
Installing Hadoop full process http://www.linuxidc.com/Linux/2011-08/40153.htm with VMware
First, we will introduce the preparations:
1: Hadoop 0.20.2 (download from the official website)
2: VMware7 (download from official website)
3: jdk-6u20-linux (official website download) // I am not from the official website at the end of the tragedy.
4: Ubuntu 10.04 (ISO)
User: jackydai (192.168.118.128) jacky (192.168.118.129)
Jackydai corresponds to namenode/JobTracker jacky datanode/Tasttrack
Officially started. First install VMware7.01. This so easy!
Because I installed ubuntu on a virtual machine, the installation was simple and easy.
Attached partition suggestions:
The 15 GB partition of the hard disk is as follows:
1. boot 200 M
2. swap 2G
3. home 7G
4. All the remaining root users
The system was installed in 20 minutes. As a result, the system wanted to update the system. When ubuntu update management was opened, more than 80 files were downloaded for more than two hours. I can't help but look at the speed of 10-15 K. After more than two hours, it was finally completed. It turns out that we still need to install ubuntu !. I endured it again. This 190 K-210 K. (Who knows?) the system is completely installed at around 10 o'clock.
1: Start to install JDK. Since I have never used ubuntu to even install files. But fortunately, there are many strong people in the class. Please help me immediately. It took almost half an hour. Even JDK is not safe. You have to go first. Please try again immediately. The result is still unsuccessful 10 minutes later. The result is a bold guess that there was an error in the JDK file. I got a security question from the official website. Okay. I am dropping some gods. JDK. My heart is getting cold. This tells us that the software package is coming from the official website. Of course, it is based on free software.
Command:
(1): $ chmod 777./home/jackydai/jdk-6u20-linux-i586-rpm.bin // Add Permissions
(2): $./home/jackydai/jdk-6u20-linux-i586-rpm.bin // install JDK
(3): Add the following code to/etc/profile: // set the environment variable
Export JAVA_HOME = "/usr/java/jdk1.6.0 _ 20"
Export PATH = "$ PATH: $ JAVA_HOME/bin: $ JAVA_HOME/jre/bin :"
Export CLASSPATH = "$ CLASSPATH: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib"
(4): souce/etc/profile // make the environment variable take effect
(5): which java // test whether JDK is successfully installed/usr/java/jdk1.6.0 _ 12/bin/java
2: Install SSH to achieve SSH password-less connection.
Command:
(1): $ sudo apt-get install ssh // install ssh
(2): ssh-keygen-t dsa // generate the key file, for example, id_dsa id_dsa.pub.
(3) cp id_dsa.pub anthorized_keys // Add to the trusted list
(4) ssh localhost or ssh jackydai // if the password is successfully accessed, this is a local test,
At AM, the tragedy is sleeping.
3. Install Hadoop:
The next day, at, we immediately started to access the two systems without a password. No progress has been made at any three o'clock. During this period, the ubuntu Forum posts once, and Baidu knows to post once. Ubuntu sends a group of five questions. No answer, tragedy-sleep. After, it took an hour to complete the installation. Hadoop was installed at the end of, and the pseudo-node mode was tested. Sleep. The next day, at, the system finally completed the SSH password-less connection, but the directory of the file failed. Solve the directory problem at, start datanode and solve all the problems. HOHO ~~
Command:
(1): Modify etc/hosts
192.168.118.128 jackydai
192.168.118.129 jacky
(2): tar-xzvf hadoop-0.20.0.tar.gz // install hadoop
(3): add the hadoop-0.20.0 File export JAVA_HOME under the hadoop-env.sh/conf =/usr/java/jdk1.6.0 _ 20
(4): Configure master jackydai to configure slaves
(5): repair and modify the core-site.xml
Fs. default. name
Hdfs: // jackydai: 9000
Modify mapred-site.xml
Mapred. job. tracker
Jackydai: 9001
Modify hdfs-site.xml
Dfs. replication
1
Hadoop. tmp. dir
/Home/jackydai/hadoop_tmp_dir/
(6): bin/hadoop namenode-format // only namdnode is required for initialization
(7): bin/start-all.sh // start in namenode
PS: 1: slaves is jacky's ssh connection account. Although it is a jacky account, the ssh connection account may be different. It was discovered this afternoon.
2: There are two methods to connect to two ssh hosts. From the client, jacky copies $ scp/home/jacky/. ssh/id_dsa.pub.
You can also copy the file by using a USB flash drive or using a shared folder.
3: In the Hadoop environment, you only need namenode without a password to datanode. Therefore, you only need to copy the client from the server.
4: the settings of the three files, Hadoop installation, and hosts files should be the same on the two computers.
5: The file name of the two systems to install Hadoop should be the same:/home/jackydai/hadoop-0.20.0 or no corresponding directory at startup
6: the initialization and startup are attached successfully (this is my figure from the data)
Initialization Interface
Startup Screen
After setup, you can view the namenode status through http: // jackydai: 50070.
Http: // jackydai: 50030 view JobTracke status
For the rest, you can run the built-in examples in Hadoop to see if they are successful.
Summary:
I finally finished writing the record, or I forgot it in two days. This is all time for experience.
There is nothing to build a platform. I wrote down all the basic ubuntu operation commands. After at least 100 hits, you can see how painful the building process is.