I. SummaryAfter several days of debugging, we finally successfully set up a hadoop test environment under Linux cent OS 5.0. This test builds a pseudo-distributed architecture on a server. The hadoop pseudo-distributed mode simulates hadoop distributed on a single machine. The distributed mode on a single machine is not a real pseudo-distributed mode, but a thread-simulated distributed mode. Hadoop itself cannot distinguish between pseudo-distributed and distributed, and the two configurations are similar. The only difference is that pseudo-distributed is configured on a single machine, both the data node and the Name node are a machine. Although the installation steps of hadoop are not complex, I still encountered many trivial problems during the installation. Now I will record the detailed process and problems I encountered when setting up hadoop.
II. Environment ConstructionThe software required to build the test environment includes: jdk1.6.0_2020.hadoop-0.5112.tar.gz. Test the server operating system Linux cent OS 5.0.
1. SSH password-less authentication ConfigurationHadoop requires the SSH protocol. namenode will use the SSH protocol to start the namenode and datanode processes. Both the pseudo-distributed mode data node and the Name node are themselves. Ssh localhost must be configured for password-free authentication. Log On As the root user and run the following command in the Home Directory: SSH-keygen-t rsa [root @ master ~] # Ssh-keygen-T RSA
Generating public/private RSA key pair.
Enter file in which to save the key (/root/. Ssh/id_rsa ):
& Press enter to go to the default path &
Created directory '/root/. Ssh'. & create/root/. Ssh directory &
Enter passphrase (empty for no passphrase ):
Enter same passphrase again:
Your identification has been saved in/root/. Ssh/id_rsa.
Your public key has been saved in/root/. Ssh/id_rsa.pub.
The key fingerprint is:
C6: 7e: 57: 59: 0a: 2D: 85: 49: 23: CC: C4: 58: FF: DB: 5b: 38 root @ master
The id_rsa private key and id_rsa.pub Public Key are generated in the/root/. Ssh/directory. Enter the/root/. Ssh directory and configure the following under the namenode node: [root @ master. Ssh] #
After configuring cat id_rsa.pub> authorized_keys, you can use the local ssh ip address to test whether Password Logon is required.
2. JDK installation and Java environment variable configuration2.1 log on to the JDK installation root user, create a folder/usr/program, and download the JDK installation package
Jdk-6u13-linux-i586.bin, copy to directory/usr/program, enter the directory in the command line, execute the command ". /jdk-6u20-linux-i586.bin ", Command run is completed, will generate the folder jdk1.6.0 _ 20 in the directory, installation is complete. 2.2 configure the root user to log on to the Java environment variable, execute the command "VI/etc/profile" in the command line, and add the following content, configure environment variables (note that the/etc/profile file is very important and will be used later in hadoop configuration ).
# Set java environment
Export JAVA_HOME =/usr/program/jdk1.6.0 _ 20
Export JRE_HOME =/usr/program/jdk1.6.0 _ 20/jre
Export CLASSPATH =.: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib
Export PATH = $ JAVA_HOME/bin: $ JAVA_HOME/jre/bin: $ PATH after adding the above content in the vi Editor, save and exit, run the following command to make the configuration take effect: chmod + x/etc/profile; Add the execution permission source/etc/profile; after the configuration is complete, enter java-version in the command line, if the following information is displayed, the java environment is successfully installed. Java version "1.6.0 _ 20"
Java (TM) SE Runtime Environment (build 1.6.0 _ 20-b02)
Java HotSpot (TM) Server VM (build 16.3-b01, mixed mode)
2. Hadoop ConfigurationDownload the hadoop-0.20.2.tar.gz, decompress it to the/usr/local/hadoop directory, unzip the directory form is/usr/local/hadoop/hadoop-0.20.2. Run the following command:
Tar zxvf hadoop-0.19.1.tar.gz to decompress hadoop compressed files. 2.1 enter/usr/local/hadoop/hadoop-0.20.2/conf, configure Hadoop profile 2.1.1 configure hadoop-env.sh file add # set java environment
Export JAVA_HOME =/usr/program/jdk1.6.0 _ 20 edit and save and exit. 2.1.2 configure core-site.xml [root @ master conf] # vi core-site.xml
<? Xml version = "1.0"?>
<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> <! -- Put site-specific property overrides in this file. --> <configuration> <property> <Name> FS. default. name </Name> <value> HDFS: // 202.173.253.36: 9000/</value> </property> <Name> hadoop. TMP. dir </Name> <value>/usr/local/hadoop/hadooptmp </value> </property> </configuration> 2.1.3 configuration hdfs-site.xml [root @ master conf] # vi hdfs-site.xml
<? XML version = "1.0"?>
<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> <! -- Put site-specific property overrides in this file. --> <configuration> <property> <Name> DFS. name. dir </Name> <value>/usr/local/hadoop/HDFS/name </value> </property> <Name> DFS. data. dir </Name> <value>/usr/local/hadoop/HDFS/Data </value> </property> <Name> DFS. replication </Name> <value> 1 </value> </property> </configuration> 2.1.4 configure mapred-site.xml [root @ master conf] # vi mapred-site.xml
<? XML version = "1.0"?>
<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?> <! -- Put site-specific property overrides in this file. --> <configuration> <property> <name> mapred. job. tracker </name> <value> 202.173.253.36: 9001 </value> </property> <name> mapred. local. dir </name> <value>/usr/local/hadoop/mapred/local </value> </property> <name> mapred. system. dir </name> <value>/tmp/hadoop/mapred/system </value> </property> </configuration> 2.1.5 configure the masters file and slaves file [r Oot @ master conf] # vi masters 202.173.253.36 [root @ master conf] # vi slaves 202.173.253.36 note: In pseudo distribution mode, the namenode serving as the master is the same server as the datanode serving as the slave, so the ip address in the configuration file is the same. 2.1.6 edit the host name [root @ master ~] # Vi/etc/hosts # Do not remove the following line, or varous programs
That require Network functionality will fail.127.0.0.1 localhost
202.173.253.36 master
202.173.253.36 slave Note: because it is in pseudo distribution mode, the master and slave are a machine.
2.2 Hadoop startup2.2.1 enter the/usr/local/hadoop/hadoop-0.20.2/bin directory, format namenode [root @ master bin] # hadoop namenode-format
10/07/19 10:46:41 info namenode. namenode: startup_msg:
/*************************************** *********************
Startup_msg: Starting namenode
Startup_msg: host = Master/202.173.253.36
Startup_msg: ARGs = [-format]
Startup_msg: version = 0.20.2
STARTUP_MSG: build =
Https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 911707; compiled by 'chrisdo 'on Fri Feb 19 08:07:34 UTC 2010
**************************************** ********************/
Re-format filesystem in/usr/local/hadoop/hdfs/name?
(Y or N) Y
10/07/19 10:46:43 INFO namenode. FSNamesystem: fsOwner = root, root, bin, daemon, sys, adm, disk, wheel
10/07/19 10:46:43 INFO namenode. FSNamesystem: supergroup = supergroup
10/07/19 10:46:43 INFO namenode. FSNamesystem: isPermissionEnabled = true
10/07/19 10:46:43 INFO common. Storage: Image file of size 94 saved in 0 seconds.
10/07/19 10:46:43 INFO common. Storage: Storage directory/usr/local/hadoop/hdfs/name has been successfully formatted.
10/07/19 10:46:43 INFO namenode. NameNode: SHUTDOWN_MSG:
/*************************************** *********************
Shutdown_msg: Shutting Down namenode at Master/202.173.253.36
**************************************** * *****************/2.2.2 start all hadoop processes under the/usr/local/hadoop/hadoop-0.20.2/bin directory, after the start-all.sh command is run, run the JPS command to check whether the hadoop process is fully started. Under normal circumstances, there should be the following process: 10910 namenode
JPS 11431
11176 secondarynamenode
11053 datanode
11254 jobtracker
11378 TaskTracker I encountered the most problems during the setup process. The startup process is often incomplete. If datanode cannot be started normally, the startup of namenode or TaskTracker is abnormal. The solution is as follows: 1. disable firewall in Linux: use the service iptables stop command; 2. format namenode again: Execute hadoop namenode-format command 3 in the/usr/local/hadoop/hadoop-0.20.2/bin directory. restart the server 4. view the log file corresponding to datanode or namenode, Which is saved in the/usr/local/hadoop/hadoop-0.20.2/logs directory. Check the log error cause carefully (the error message in the last log is forgotten) the solution is to go to the/usr/local/hadoop/hdfs/name and usr/local/hadoop/hdfs/data Directories and delete all the files in the directories. 5. again in the/bin directory with the start-all.sh command to start all the processes, through the above several methods should be able to solve the problem of incomplete process startup. 2.2.3 run the following command to view the cluster status in the bin directory: hadoop dfsadmin-report
[Root @ master bin] # hadoop dfsadmin-report
Configured Capacity: 304427253760 (283.52 GB)
Present Capacity: 282767941632 (263.35 GB)
DFS Remaining: 282767904768 (263.35 GB)
DFS Used: 36864 (36 KB)
DFS Used %: 0%
Under replicated blocks: 0
Blocks with primary upt replicas: 0
Missing blocks: 0 -------------------------------------------------
Datanodes available: 1 (1 total, 0 dead) Name: 202.173.253.36: 50010
Decommission Status: Normal
Configured Capacity: 304427253760 (283.52 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 21659312128 (20.17 GB)
DFS Remaining: 282767904768 (263.35 GB)
DFS Used %: 0%
DFS Remaining %: 92.89%
Last contact: Mon Jul 19 11:07:22 CST 2010 2.3 view Hadoop working conditions on the WEB page open the IE browser and enter the IP address for deploying the Hadoop server: http: // localhost: 50070; http: // localhost: 50030.
3. Use HadopA test example of wordcount is a program that calculates the number of words in the input text. WordCount in the java package hadoop-0.20.2-examples.jar under the Hadoop home directory, perform the following steps in the/usr/local/hadoop/hadoop-0.20.2/bin/directory: hadoop fs-mkdir bxy (new directory name, which can be any name) [root @ master log] # hadoop fs-copyFromLocal secure.2 bxy (find any file and copy it to the bxy folder) run in/usr/local/hadoop/hadoop-0.20.2: [root @ master hadoop-0.20.2] # hadoop jar hadoop-0.20.2-examples.jar wordcount bxy output (submit job, here note that bxy and output are a group of tasks, run the wordcount program next time and create the directory bx. Y1 and output1 cannot be the same as bxy and output.) After the execution is completed, you can refresh the web interface to view the display of running job and completed job.