在Linux單機上運行Hadoop-0.19.0執行個體

來源:互聯網
上載者:User

Hadoop-0.19.0的代碼可以到Apache上下載,連結為http://archive.apache.org/dist/hadoop/core/hadoop-0.19.0/,我使用的Linux機器是RHEL 5,Linux上安裝的Java版本為1.6.0_16,並且JAVA_HOME=/usr/java/jdk1.6.0_16。

 

實踐過程

 

1、ssh無密碼驗證登陸localhost

 

保證Linux系統的ssh服務已經啟動,並保證能夠通過無密碼驗證登陸本機Linux系統。如果不能保證,可以按照如下的步驟去做:

(1)啟動命令列視窗,執行命令列:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

(2)ssh登陸localhost,執行命令列:

$ ssh localhost

第一次登入,會提示你無法建立到127.0.0.1的串連,是否要建立,輸入yes即可,下面是能夠通過無密碼驗證登陸的資訊:

[root@localhost hadoop-0.19.0]# ssh localhost
Last login: Sun Aug  1 18:35:37 2010 from 192.168.0.104
[root@localhost ~]#

 

2、Hadoop-0.19.0配置

 

下載hadoop-0.19.0.tar.gz,大約是40.3M,解壓縮到Linux系統指定目錄,這裡我的是/root/hadoop-0.19.0目錄下。

下面按照有序的步驟來說明配置過程:

(1)修改hadoop-env.sh配置

將Java環境的配置進行修改後,並取消注釋“#”,修改後的行為:

  export JAVA_HOME=/usr/java/jdk1.6.0_16

(2)修改hadoop-site.xml配置

在<configuration>與</configuration>加上3個屬性的配置,修改後的設定檔內容為:

<?xml version="1.0"?><br /><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></p><p><!-- Put site-specific property overrides in this file. --><br /><configuration><br /> <property><br /> <name>fs.default.name</name><br /> <value>hdfs://localhost:9000</value><br /> </property><br /> <property><br /> <name>mapred.job.tracker</name><br /> <value>localhost:9001</value><br /> </property><br /> <property><br /> <name>dfs.replication</name><br /> <value>1</value><br /> </property><br /></configuration>

3、運行wordcount執行個體

wordcount例子是hadoop發行包中內建的執行個體,通過運行執行個體可以感受並嘗試理解hadoop在執行MapReduce任務時的執行過程。按照官方的“Hadoop Quick Start”教程基本可以容易地實現,下面簡單說一下我的練習過程。

導航到hadoop目錄下面,我的是/root/hadoop-0.19.0。

(1)格式化HDFS

執行格式化HDFS的命令列:

[root@localhost hadoop-0.19.0]# bin/hadoop namenode -format

格式化執行資訊如下所示:

10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG:   host = localhost/127.0.0.1STARTUP_MSG:   args = [-format]STARTUP_MSG:   version = 0.19.0STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008************************************************************/Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) yFormat aborted in /tmp/hadoop-root/dfs/name10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1************************************************************/

(2)啟動Hadoop相關後台進程

執行命令列:

[root@localhost hadoop-0.19.0]# bin/start-all.sh

啟動執行資訊如下所示:

starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.outlocalhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.outlocalhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.outstarting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.outlocalhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out

(3)準備執行wordcount任務的資料

首先,這裡在本地建立了一個資料目錄input,並拷貝一些檔案到該目錄下面,如下所示:

[root@localhost hadoop-0.19.0]# mkdir input
[root@localhost hadoop-0.19.0]# cp CHANGES.txt LICENSE.txt NOTICE.txt README.txt input/

然後,將本地目錄input上傳到HDFS檔案系統上,執行如下命令:

[root@localhost hadoop-0.19.0]# bin/hadoop fs -put input/ input

(4)啟動wordcount任務

執行如下命令列:

[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output

中繼資料目錄為input,輸出資料目錄為output。

任務執行資訊如下所示:

10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 410/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_000210/08/01 19:06:16 INFO mapred.JobClient:  map 0% reduce 0%10/08/01 19:06:22 INFO mapred.JobClient:  map 20% reduce 0%10/08/01 19:06:24 INFO mapred.JobClient:  map 40% reduce 0%10/08/01 19:06:25 INFO mapred.JobClient:  map 60% reduce 0%10/08/01 19:06:27 INFO mapred.JobClient:  map 80% reduce 0%10/08/01 19:06:28 INFO mapred.JobClient:  map 100% reduce 0%10/08/01 19:06:38 INFO mapred.JobClient:  map 100% reduce 26%10/08/01 19:06:40 INFO mapred.JobClient:  map 100% reduce 100%10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_000210/08/01 19:06:41 INFO mapred.JobClient: Counters: 1610/08/01 19:06:41 INFO mapred.JobClient:   File Systems10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes read=30148910/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes written=11309810/08/01 19:06:41 INFO mapred.JobClient:     Local bytes read=17400410/08/01 19:06:41 INFO mapred.JobClient:     Local bytes written=34817210/08/01 19:06:41 INFO mapred.JobClient:   Job Counters 10/08/01 19:06:41 INFO mapred.JobClient:     Launched reduce tasks=110/08/01 19:06:41 INFO mapred.JobClient:     Launched map tasks=510/08/01 19:06:41 INFO mapred.JobClient:     Data-local map tasks=510/08/01 19:06:41 INFO mapred.JobClient:   Map-Reduce Framework10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input groups=899710/08/01 19:06:41 INFO mapred.JobClient:     Combine output records=1086010/08/01 19:06:41 INFO mapred.JobClient:     Map input records=736310/08/01 19:06:41 INFO mapred.JobClient:     Reduce output records=899710/08/01 19:06:41 INFO mapred.JobClient:     Map output bytes=43407710/08/01 19:06:41 INFO mapred.JobClient:     Map input bytes=29987110/08/01 19:06:41 INFO mapred.JobClient:     Combine input records=3919310/08/01 19:06:41 INFO mapred.JobClient:     Map output records=3919310/08/01 19:06:41 INFO mapred.JobClient:     Reduce input records=10860

(5)查看任務執行結果

可以通過如下命令列:

bin/hadoop fs -cat output/*

執行結果,截取部分顯示如下所示:

vijayarenu      20violations.     1virtual 3vis-a-vis       1visible 1visit   1volume  1volume, 1volumes 2volumes.        1w.r.t   2wait    9waiting 6waiting.        1waits   3want    1warning 7warning,        1warnings        12warnings.       3warranties      1warranty        1warranty,       1

(6)終止Hadoop相關後台進程

執行如下命令列:

[root@localhost hadoop-0.19.0]# bin/stop-all.sh

執行資訊如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode 

已經將上面列出的5個進程jobtracker、tasktracker、namenode、datanode、secondarynamenode終止。

 

異常分析

 

在進行上述實踐過程中,可能會遇到某種異常情況,大致分析如下:

 

1、Call to localhost/127.0.0.1:9000 failed on local exception異常

(1)異常描述

可能你會在執行如下命令列的時候出現:

[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output

出錯異常資訊如下所示:

10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)        at org.apache.hadoop.examples.WordCount.run(WordCount.java:146)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused        at org.apache.hadoop.ipc.Client.call(Client.java:699)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)        at $Proxy0.getProtocolVersion(Unknown Source)        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)        ... 21 moreCaused by: java.net.ConnectException: Connection refused        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)        at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)        at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)        at org.apache.hadoop.ipc.Client.call(Client.java:685)        ... 33 more 

(2)異常分析 

從上述異常資訊分析,這句是關鍵:

Retrying connect to server: localhost/127.0.0.1:9000.

是說在嘗試10次串連到“server”時都無法成功,這就說明到server的通訊鏈路是不通的。我們已經在hadoop-site.xml中配置了namenode結點的值,如下所示:

<property><br /> <name>fs.default.name</name><br /> <value>hdfs://localhost:9000</value><br /> </property>

所以,敢肯定是無法串連到server,也就是很可能namenode進程根本就沒有啟動,更不必談要執行任務了。

上述異常,我類比的過程是:

格式化了HDFS,但是沒有執行bin/start-all.sh,直接啟動wordcount任務,就出現上述異常。

所以,應該執行bin/start-all.sh以後再啟動wordcount任務。

 

2、Input path does not exist異常

 

(1)異常描述

當你在當前hadoop目錄下面建立一個input目錄,並cp某些檔案到裡面,開始執行:

[root@localhost hadoop-0.19.0]# bin/hadoop namenode -format

[root@localhost hadoop-0.19.0]# bin/start-all.sh

這時候,你認為input已經存在,應該可以執行wordcount任務了:

[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output

結果拋出一堆異常,資訊如下:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)        at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

上述異常,我類比的過程是:

[root@localhost hadoop-0.19.0]# bin/hadoop fs -rmr input
Deleted hdfs://localhost:9000/user/root/input

[root@localhost hadoop-0.19.0]# bin/hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/root/output 

因為之前我已經成功執行過一次。

(2)異常分析

應該不用多說了,是因為本地的input目錄並沒有上傳到HDFS上,所出現org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input

在我的印象中,好像使用hadoop-0.16.4的時候,只要input目錄存在,是不用執行上傳命令,就可以啟動並執行,後期的版本是不行的。

只需要執行上傳的命令即可:

[root@localhost hadoop-0.19.0]# bin/hadoop fs -put input/ input

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.