Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos6.3 in VMware, and Hadoop1.2.1. other versions are not tested. The pseudo-distributed configuration script includes core-site configuration.
Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos 6.3 in VMware, and Hadoop 1.2.1. Other versions are not tested. The pseudo-distributed configuration script includes core-site configuration.
Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos 6.3 in VMware, and Hadoop 1.2.1. Other versions are not tested.
Pseudo distributed configuration script
Includes configuring core-site.xml, hdfs-site.xml and mapred-site.xml, configuring ssh password-free login. [1]
#! /Bin/bash # Usage: Hadoop pseudo-distributed configuration # History: #20140426 complete basic functions # Check if user is rootif [$ (id-u )! = "0"]; then printf "Error: You must be root to run this script! \ N "exit 1fi # synchronous clock rm-rf/etc/localtimeln-s/usr/share/zoneinfo/Asia/Shanghai/etc/localtime # yum install-y ntpntpdate-u pool.ntp.org &>/dev/nullecho-e "Time: 'date' \ n "# The default structure is single Nic, IP = 'ifconfig eth0 | grep "inet \ addr" | awk' {print $2} '| cut-d ": "-f2' # pseudo Distributed function pseudo dodistributed () {cd/etc/hadoop/# Recovery Backup mv core-site.xml.bak core-site.xmlmv hdfs-site.xml.bak hdfs-site.xmlmv mapred-site.xml.bak mapred-site.xml core-site.xml # backup mv core-site.xml.bakmv hdfs-site.xml hdfs-site.xml.bakmv mapred-site.xml mapred-site.xml.bak core-site.xmlcat core-site.xml # use the following> <
Fs. default. name
Hdfs: // $ IP: 9000
Eof # use the following hdfs-site.xmlcat> hdfs-site.xml <
Dfs. replication
1
Eof # use the following mapred-site.xmlcat> mapred-site.xml <
Mapred. job. tracker
Usd ip: 9001
Eof} # configure ssh password-free login function PassphraselessSSH () {# generate a private key without repeating it [! -F ~ /. Ssh/id_dsa] & ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsacat ~ /. Ssh/authorized_keys | grep "'cat ~ /. Ssh/id_dsa.pub '"&>/dev/null & r = 0 | r = 1 # Add [$ r-eq 1] & cat ~ when no public key is available ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keyschmod 644 ~ /. Ssh/authorized_keys} # Execute function Execute () {# format A New Distributed File System hadoop namenode-format # Start the Hadoop daemon start-all.shecho-e "\ n ================ ========================================================== ================== "echo" hadoop log dir: $ HADOOP_LOG_DIR "echo" NameNode-http: // $ IP: 50070/"echo" JobTracker-http: // $ IP: 50030/"echo-e" \ n ================================ ========================================================== = "} pseudo dodistributed 2> & 1 | tee-a pseudo. logPassphraselessSSH 2> & 1 | tee-a pseudo. logExecute 2> & 1 | tee-a pseudo. log
Script test results
[root@hadoop hadoop]# ./pseudo.sh14/04/26 23:52:30 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = hadoop/216.34.94.184STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.2.1STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:27:42 PDT 2013STARTUP_MSG: java = 1.7.0_51************************************************************/Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) yFormat aborted in /tmp/hadoop-root/dfs/name14/04/26 23:52:40 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at hadoop/216.34.94.184************************************************************/starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-hadoop.outlocalhost: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-hadoop.outlocalhost: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-hadoop.outstarting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-hadoop.outlocalhost: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-hadoop.out========================================================================hadoop log dir : /var/log/hadoop/rootNameNode - http://192.168.60.128:50070/JobTracker - http://192.168.60.128:50030/=========================================================================
Access the network interfaces of NameNode and JobTracker through the browser on the host
Network interface used by the browser to access namenode
Browser access to jobtracker Network Interface
Run the test program
Copy the input file to the Distributed File System:
$ hadoop fs -put input input
View hdfs through network interface
View hdfs file system through NameNode Network Interface
Run the sample program
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output
View execution status through JobTracker Network Interface
Wordcount execution status
Execution result
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input out214/04/27 03:34:56 INFO input.FileInputFormat: Total input paths to process : 214/04/27 03:34:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library14/04/27 03:34:56 WARN snappy.LoadSnappy: Snappy native library not loaded14/04/27 03:34:57 INFO mapred.JobClient: Running job: job_201404270333_000114/04/27 03:34:58 INFO mapred.JobClient: map 0% reduce 0%14/04/27 03:35:49 INFO mapred.JobClient: map 100% reduce 0%14/04/27 03:36:16 INFO mapred.JobClient: map 100% reduce 100%14/04/27 03:36:19 INFO mapred.JobClient: Job complete: job_201404270333_000114/04/27 03:36:19 INFO mapred.JobClient: Counters: 2914/04/27 03:36:19 INFO mapred.JobClient: Job Counters14/04/27 03:36:19 INFO mapred.JobClient: Launched reduce tasks=114/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7289514/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/04/27 03:36:19 INFO mapred.JobClient: Launched map tasks=214/04/27 03:36:19 INFO mapred.JobClient: Data-local map tasks=214/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=2488014/04/27 03:36:19 INFO mapred.JobClient: File Output Format Counters14/04/27 03:36:19 INFO mapred.JobClient: Bytes Written=2514/04/27 03:36:19 INFO mapred.JobClient: FileSystemCounters14/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_READ=5514/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_READ=26014/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=16404114/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2514/04/27 03:36:19 INFO mapred.JobClient: File Input Format Counters14/04/27 03:36:19 INFO mapred.JobClient: Bytes Read=2514/04/27 03:36:19 INFO mapred.JobClient: Map-Reduce Framework14/04/27 03:36:19 INFO mapred.JobClient: Map output materialized bytes=6114/04/27 03:36:19 INFO mapred.JobClient: Map input records=214/04/27 03:36:19 INFO mapred.JobClient: Reduce shuffle bytes=6114/04/27 03:36:19 INFO mapred.JobClient: Spilled Records=814/04/27 03:36:19 INFO mapred.JobClient: Map output bytes=4114/04/27 03:36:19 INFO mapred.JobClient: Total committed heap usage (bytes)=41444147214/04/27 03:36:19 INFO mapred.JobClient: CPU time spent (ms)=291014/04/27 03:36:19 INFO mapred.JobClient: Combine input records=414/04/27 03:36:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=23514/04/27 03:36:19 INFO mapred.JobClient: Reduce input records=414/04/27 03:36:19 INFO mapred.JobClient: Reduce input groups=314/04/27 03:36:19 INFO mapred.JobClient: Combine output records=414/04/27 03:36:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=35343974414/04/27 03:36:19 INFO mapred.JobClient: Reduce output records=314/04/27 03:36:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=219597209614/04/27 03:36:19 INFO mapred.JobClient: Map output records=4
View results
[root@hadoop hadoop]# hadoop fs -cat out2/*hadoop 1hello 2world 1
You can also copy the files on the Distributed File System to a local device to view them.
[root@hadoop hadoop]# hadoop fs -get out2 out4[root@hadoop hadoop]# cat out4/*cat: out4/_logs: Is a directoryhadoop 1hello 2world 1
After all the operations are completed, stop the daemon process:
[root@hadoop hadoop]# stop-all.shstopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode
The host cannot access the network interface.
Because iptables is enabled, you need to add the corresponding port. You can also disable iptables directly in the test environment.
# Firewall configuration written by system-config-firewall# Manual customization of this file is not recommended.*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT-A INPUT -p icmp -j ACCEPT-A INPUT -i lo -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT-A INPUT -j REJECT --reject-with icmp-host-prohibited-A FORWARD -j REJECT --reject-with icmp-host-prohibitedCOMMIT
Browse the filesystem jump address is incorrect
For the NameNode network interface, Click Browse the filesystem to jump to localhost: 50075. [2] [3]
Modify the core-site.xml and change hdfs: // localhost: 9000 to the Virtual Machine IP address. (The above script has been changed to automatically configured as IP ).
The domain name can also be entered based on several changes, but the domain name must be resolved on the accessed machine. Therefore, domain names can be set for DNS servers in the public network.
The reduce task is stuck.
Add the IP Address [4] [5] corresponding to the host name in/etc/hosts. (The Hadoop installation script has been updated. This option is automatically configured)
127.0.0.1 localhost. localdomain localhost4 localhost4.localdomain4: 1 localhost. localdomain localhost6 localhost6.localdomain6127.0.0.1 hadoop # Add this line
References
[1]. Hadoop official documentation .? Http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
[2]. Stackoverflow .? Http://stackoverflow.com/questions/15254492/wrong-redirect-from-hadoop-hdfs-namenode-to-localhost50075
[3]. Iteye .? Http://yymmiinngg.iteye.com/blog/706909
[4]. Stackoverflow .? Http://stackoverflow.com/questions/10165549/hadoop-wordcount-example-stuck-at-map-100-reduce-0
[5]. Li Jun's blog .? Http://www.colorlight.cn/archives/32
This article complies with the CC copyright agreement. For more information, see the source in the form of links.
Link: http://www.annhe.net/article-2682.html