Hadoop pseudo-Distributed Operation

Last Update:2018-06-11 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos6.3 in VMware, and Hadoop1.2.1. other versions are not tested. The pseudo-distributed configuration script includes core-site configuration.

Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos 6.3 in VMware, and Hadoop 1.2.1. Other versions are not tested. The pseudo-distributed configuration script includes core-site configuration.

Hadoop can run in pseudo-distributed mode on a single node. At this time, each Hadoop daemon runs as an independent Java Process. This article uses automated scripts to configure the Hadoop pseudo-distributed mode. The test environment is Centos 6.3 in VMware, and Hadoop 1.2.1. Other versions are not tested.

Pseudo distributed configuration script

Includes configuring core-site.xml, hdfs-site.xml and mapred-site.xml, configuring ssh password-free login. [1]

#! /Bin/bash # Usage: Hadoop pseudo-distributed configuration # History: #20140426 complete basic functions # Check if user is rootif [$ (id-u )! = "0"]; then printf "Error: You must be root to run this script! \ N "exit 1fi # synchronous clock rm-rf/etc/localtimeln-s/usr/share/zoneinfo/Asia/Shanghai/etc/localtime # yum install-y ntpntpdate-u pool.ntp.org &>/dev/nullecho-e "Time: 'date' \ n "# The default structure is single Nic, IP = 'ifconfig eth0 | grep "inet \ addr" | awk' {print $2} '| cut-d ": "-f2' # pseudo Distributed function pseudo dodistributed () {cd/etc/hadoop/# Recovery Backup mv core-site.xml.bak core-site.xmlmv hdfs-site.xml.bak hdfs-site.xmlmv mapred-site.xml.bak mapred-site.xml core-site.xml # backup mv core-site.xml.bakmv hdfs-site.xml hdfs-site.xml.bakmv mapred-site.xml mapred-site.xml.bak core-site.xmlcat core-site.xml # use the following> <
 
  
  
  
   
    
     
Fs. default. name
    
    
     
Hdfs: // $ IP: 9000
    
   
  Eof # use the following hdfs-site.xmlcat> hdfs-site.xml <
  
   
   
   
    
     
      
Dfs. replication
     
     
      
1
     
    
   Eof # use the following mapred-site.xmlcat> mapred-site.xml <
   
    
    
    
     
      
        Mapred. job. tracker
      
      
        Usd ip: 9001
      
     
    Eof} # configure ssh password-free login function PassphraselessSSH () {# generate a private key without repeating it [! -F ~ /. Ssh/id_dsa] & ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsacat ~ /. Ssh/authorized_keys | grep "'cat ~ /. Ssh/id_dsa.pub '"&>/dev/null & r = 0 | r = 1 # Add [$ r-eq 1] & cat ~ when no public key is available ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keyschmod 644 ~ /. Ssh/authorized_keys} # Execute function Execute () {# format A New Distributed File System hadoop namenode-format # Start the Hadoop daemon start-all.shecho-e "\ n ================ ========================================================== ================== "echo" hadoop log dir: $ HADOOP_LOG_DIR "echo" NameNode-http: // $ IP: 50070/"echo" JobTracker-http: // $ IP: 50030/"echo-e" \ n ================================ ========================================================== = "} pseudo dodistributed 2> & 1 | tee-a pseudo. logPassphraselessSSH 2> & 1 | tee-a pseudo. logExecute 2> & 1 | tee-a pseudo. log

Script test results

[root@hadoop hadoop]# ./pseudo.sh14/04/26 23:52:30 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG:   host = hadoop/216.34.94.184STARTUP_MSG:   args = [-format]STARTUP_MSG:   version = 1.2.1STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:27:42 PDT 2013STARTUP_MSG:   java = 1.7.0_51************************************************************/Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) yFormat aborted in /tmp/hadoop-root/dfs/name14/04/26 23:52:40 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at hadoop/216.34.94.184************************************************************/starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-hadoop.outlocalhost: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-hadoop.outlocalhost: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-hadoop.outstarting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-hadoop.outlocalhost: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-hadoop.out========================================================================hadoop log dir : /var/log/hadoop/rootNameNode - http://192.168.60.128:50070/JobTracker - http://192.168.60.128:50030/=========================================================================

Access the network interfaces of NameNode and JobTracker through the browser on the host

Network interface used by the browser to access namenode

Browser access to jobtracker Network Interface

Run the test program

Copy the input file to the Distributed File System:

$ hadoop fs -put input input

View hdfs through network interface

View hdfs file system through NameNode Network Interface

Run the sample program

[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output

View execution status through JobTracker Network Interface

Wordcount execution status

Execution result

[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input out214/04/27 03:34:56 INFO input.FileInputFormat: Total input paths to process : 214/04/27 03:34:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library14/04/27 03:34:56 WARN snappy.LoadSnappy: Snappy native library not loaded14/04/27 03:34:57 INFO mapred.JobClient: Running job: job_201404270333_000114/04/27 03:34:58 INFO mapred.JobClient:  map 0% reduce 0%14/04/27 03:35:49 INFO mapred.JobClient:  map 100% reduce 0%14/04/27 03:36:16 INFO mapred.JobClient:  map 100% reduce 100%14/04/27 03:36:19 INFO mapred.JobClient: Job complete: job_201404270333_000114/04/27 03:36:19 INFO mapred.JobClient: Counters: 2914/04/27 03:36:19 INFO mapred.JobClient:   Job Counters14/04/27 03:36:19 INFO mapred.JobClient:     Launched reduce tasks=114/04/27 03:36:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7289514/04/27 03:36:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=014/04/27 03:36:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=014/04/27 03:36:19 INFO mapred.JobClient:     Launched map tasks=214/04/27 03:36:19 INFO mapred.JobClient:     Data-local map tasks=214/04/27 03:36:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=2488014/04/27 03:36:19 INFO mapred.JobClient:   File Output Format Counters14/04/27 03:36:19 INFO mapred.JobClient:     Bytes Written=2514/04/27 03:36:19 INFO mapred.JobClient:   FileSystemCounters14/04/27 03:36:19 INFO mapred.JobClient:     FILE_BYTES_READ=5514/04/27 03:36:19 INFO mapred.JobClient:     HDFS_BYTES_READ=26014/04/27 03:36:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=16404114/04/27 03:36:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2514/04/27 03:36:19 INFO mapred.JobClient:   File Input Format Counters14/04/27 03:36:19 INFO mapred.JobClient:     Bytes Read=2514/04/27 03:36:19 INFO mapred.JobClient:   Map-Reduce Framework14/04/27 03:36:19 INFO mapred.JobClient:     Map output materialized bytes=6114/04/27 03:36:19 INFO mapred.JobClient:     Map input records=214/04/27 03:36:19 INFO mapred.JobClient:     Reduce shuffle bytes=6114/04/27 03:36:19 INFO mapred.JobClient:     Spilled Records=814/04/27 03:36:19 INFO mapred.JobClient:     Map output bytes=4114/04/27 03:36:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=41444147214/04/27 03:36:19 INFO mapred.JobClient:     CPU time spent (ms)=291014/04/27 03:36:19 INFO mapred.JobClient:     Combine input records=414/04/27 03:36:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=23514/04/27 03:36:19 INFO mapred.JobClient:     Reduce input records=414/04/27 03:36:19 INFO mapred.JobClient:     Reduce input groups=314/04/27 03:36:19 INFO mapred.JobClient:     Combine output records=414/04/27 03:36:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=35343974414/04/27 03:36:19 INFO mapred.JobClient:     Reduce output records=314/04/27 03:36:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=219597209614/04/27 03:36:19 INFO mapred.JobClient:     Map output records=4

View results

[root@hadoop hadoop]# hadoop fs -cat out2/*hadoop  1hello   2world   1

You can also copy the files on the Distributed File System to a local device to view them.

[root@hadoop hadoop]# hadoop fs -get out2 out4[root@hadoop hadoop]# cat out4/*cat: out4/_logs: Is a directoryhadoop  1hello   2world   1

After all the operations are completed, stop the daemon process:

[root@hadoop hadoop]# stop-all.shstopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode

The host cannot access the network interface.

Because iptables is enabled, you need to add the corresponding port. You can also disable iptables directly in the test environment.

# Firewall configuration written by system-config-firewall# Manual customization of this file is not recommended.*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT-A INPUT -p icmp -j ACCEPT-A INPUT -i lo -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT-A INPUT -j REJECT --reject-with icmp-host-prohibited-A FORWARD -j REJECT --reject-with icmp-host-prohibitedCOMMIT

Browse the filesystem jump address is incorrect

For the NameNode network interface, Click Browse the filesystem to jump to localhost: 50075. [2] [3]

Modify the core-site.xml and change hdfs: // localhost: 9000 to the Virtual Machine IP address. (The above script has been changed to automatically configured as IP ).

The domain name can also be entered based on several changes, but the domain name must be resolved on the accessed machine. Therefore, domain names can be set for DNS servers in the public network.

The reduce task is stuck.

Add the IP Address [4] [5] corresponding to the host name in/etc/hosts. (The Hadoop installation script has been updated. This option is automatically configured)

127.0.0.1 localhost. localdomain localhost4 localhost4.localdomain4: 1 localhost. localdomain localhost6 localhost6.localdomain6127.0.0.1 hadoop # Add this line

References

[1]. Hadoop official documentation .? Http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

[2]. Stackoverflow .? Http://stackoverflow.com/questions/15254492/wrong-redirect-from-hadoop-hdfs-namenode-to-localhost50075

[3]. Iteye .? Http://yymmiinngg.iteye.com/blog/706909

[4]. Stackoverflow .? Http://stackoverflow.com/questions/10165549/hadoop-wordcount-example-stuck-at-map-100-reduce-0

[5]. Li Jun's blog .? Http://www.colorlight.cn/archives/32

This article complies with the CC copyright agreement. For more information, see the source in the form of links.
Link: http://www.annhe.net/article-2682.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More