在Windows下利用cygwin搭建hadoop環境.

來源:互聯網
上載者:User

1、  所需軟體
1.1、Cygwin
:http://www.cygwin.com/setup.exe
1.2、JDK 1.6.x
1.3、hadoop(本例用的是hadoop-0.18.2)
:http://download.csdn.net/detail/kkdelta/4381822
hadoop的官方網站 http://hadoop.apache.org/
2、  安裝
2.1、Cygwin安裝說明見文章:http://www.zihou.me/2010/02/19/1506/
補充:cygwin的bash是無法複製粘貼的,很不方便,所以可採用putty,是:http://download.csdn.net/detail/kkdelta/4381833
,將puttycyg.zip解壓後的三個exe檔案放到Cygwin安裝目錄HOME_PATH下bin目錄下,然後修改HOME_PATH下的Cygwin.bat檔案,建議用記事本開啟,然後將bash –login –i注釋掉,在前面加rem,也就是rem bash –login –i,或者:: bash –login –i,加入 start putty -load cygwin 即可。這樣將會用putty的方式啟動cygwin.
這樣一來就可以複製粘貼了,但注意的是預設的根目錄是Cygwin的HOME_PATH,如果要切換到其他主目錄,但如果你想要進入到其他根目錄,但如果你想要進入到其他根目錄,需要通過系統根目錄,子猴這裡的是/cygdrive,比如要進入到c盤,則為/cygdrive/c。
2.2、JDK的安裝省略了
2.3、hadoop-0.18.2安裝
將hadoop-0.18.2.tar.gz解壓,解壓後的目錄如hadoop-0.18.2,假設是放在E盤:
E:\hadoop-0.18.2,修改conf/hadoop-env.sh檔案,將export JAVA_HOME的值修改為你機上的jdk安裝目錄,比如/cygdrive/d/tools/jdk1.6.0_03,/cygdrive是Cygwin安裝成功後系統的根目錄.
3、  安裝和配置ssh
3.1、安裝
在Cygwin的根目錄下分別運行:
$ chmod +r /etc/group
$ chmod +r /etc/passwd
$ chmod +rwx /var
$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) yes
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges.  Should this script attempt to create a
*** Query: new local account 'sshd'? (yes/no) yes
*** Info: Updating /etc/sshd_config file
*** Info: Added ssh to C:\WINDOWS\system32\driversc\services
*** Info: Creating default /etc/inetd.d/sshd-inetd file
*** Info: Updated /etc/inetd.d/sshd-inetd
*** Warning: The following functions require administrator privileges!
*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] cygwin
(註:此處輸入的cygwin可以是任意的)
*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'.  Otherwise, it
*** Info: will start automatically after the next reboot.
*** Info: Host configuration finished. Have fun!
在詢問yes/no的地方,統一輸入yes,sshd就安裝好了。
3.2配置
3.2.1、啟動sshd服務
net start sshd
CYGWIN sshd 服務正在啟動
CYGWIN sshd 服務已經啟動成功
3.2.2、$ ssh localhost
試著串連本機看看,注意,如果在沒有啟動sshd服務,這個串連肯定是失敗的!關於此錯誤也可參見:

http://www.zihou.me/2010/02/19/1521/

如果沒問題,會出現下面一些內容:
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 08:03:20:43:48:39:29:66:6e:c5:61:ba:77:b2:2f:55.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
zihou@localhost's password:
會提示輸入你機子的登入密碼,輸入無誤後,會出現文本圖形,類似於歡迎的提示:
The Hippo says: Welcome to
3.2.3、建立ssh的通道
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/zihou/.ssh/id_dsa.
Your public key has been saved in /home/zihou/.ssh/id_dsa.pub.
The key fingerprint is:
6d:64:8e:a6:38:73:ab:c5:ce:71:cd:df:a1:ca:63:54 zihou@PC-04101515
The key's randomart image is:
+--[ DSA 1024]----+

|                 |

|                 |

|          o      |

|         *  E    |

|        S +.     |

|     o o +.      |

|    + * ..o   .  |

|     B + .o. o . |

|    ..+  .ooo .  |

+-----------------+
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
再執行遍$ ssh localhost看看,如果沒有問題,就說明sshd已經配置好了。
4、  配置hadoop
編輯conf/hadoop-site.xml
加入以下內容:
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5、 運行hadoop
進入到C:\hadoop-0.18.2,在cygwin下的操作如:
$ cd /cygdrive/c/hadoop-0.18.2,
$ bin/hadoop namenode -format 格式化一個新的Distributed File System,提示資訊如下:
12/06/19 14:46:17 INFO dfs.Storage: Storage directory \tmp\hadoop-YaoKun\dfs\name has been successfully formatted.
6、  啟動hadoop守護進程
$ bin/start-all.sh
starting namenode, logging to /cygdrive/c/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-namenode-NBK-DAL-625040.out
localhost: starting datanode, logging to /cygdrive/c/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-datanode-NBK-DAL-625040.out
localhost: starting secondarynamenode, logging to /cygdrive/c/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-secondarynamenode-NBK-DAL-625040.out
starting jobtracker, logging to /cygdrive/c/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-jobtracker-NBK-DAL-625040.out
localhost: starting tasktracker, logging to /cygdrive/c/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-tasktracker-NBK-DAL-625040.out
Hadoop守護進程的日誌寫入到 ${HADOOP_LOG_DIR} 目錄 (預設是 ${HADOOP_HOME}/logs).
瀏覽NameNode和JobTracker的網路介面,它們的地址預設為:
    NameNode - http://localhost:50070/
    JobTracker - http://localhost:50030/
7、  測試
下面的執行個體將已解壓的 conf 目錄拷貝作為輸入,尋找並顯示匹配給定Regex的條目。輸出寫入到指定的output目錄。(註:根目錄是hadoop的目錄)

在偽分布式模式上運行
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop fs -put conf input
12/06/19 15:10:33 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
12/06/19 15:10:33 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
12/06/19 15:10:33 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
12/06/19 15:10:33 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
put: Target input/conf is a directory

$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
如果沒有錯誤的話,會給出一堆資訊,如:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
12/06/19 15:46:55 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
12/06/19 15:46:56 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
12/06/19 15:46:57 INFO mapred.FileInputFormat: Total input paths to process : 10
12/06/19 15:46:57 INFO mapred.FileInputFormat: Total input paths to process : 10
12/06/19 15:46:58 INFO mapred.JobClient: Running job: job_201206191545_0001
12/06/19 15:46:59 INFO mapred.JobClient:  map 0% reduce 0%
12/06/19 15:47:05 INFO mapred.JobClient:  map 18% reduce 0%
12/06/19 15:47:09 INFO mapred.JobClient:  map 36% reduce 0%
12/06/19 15:47:11 INFO mapred.JobClient:  map 54% reduce 0%
12/06/19 15:47:13 INFO mapred.JobClient:  map 72% reduce 0%
12/06/19 15:47:15 INFO mapred.JobClient:  map 81% reduce 0%
12/06/19 15:47:16 INFO mapred.JobClient:  map 90% reduce 0%
12/06/19 15:47:17 INFO mapred.JobClient:  map 100% reduce 0%
12/06/19 15:47:26 INFO mapred.JobClient:  map 100% reduce 12%
12/06/19 15:47:31 INFO mapred.JobClient:  map 100% reduce 18%
12/06/19 15:47:32 INFO mapred.JobClient:  map 100% reduce 21%
12/06/19 15:47:36 INFO mapred.JobClient:  map 100% reduce 27%
12/06/19 15:47:39 INFO mapred.JobClient: Job complete: job_201206191545_0001
.......
查看輸出檔案:
將輸出檔案從Distributed File System拷貝到本地檔案系統查看:
$ bin/hadoop fs -get output output
$ cat output/*
或者
在Distributed File System上查看輸出檔案:
$ bin/hadoop fs -cat output/*
完成全部操作後,停止守護進程:
$ bin/stop-all.sh
這樣,hadoop就成功配置了!
說明:
Hadoop中文文檔地址:http://hadoop.apache.org/common/docs/r0.18.2/cn/
快速安裝說明手冊:http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html
Hadoop簡介:
Hadoop是一個開放原始碼的Distributed File System,屬於Apache中的一個項目,所謂Distributed File System(DistributedFile System),指的是具有執行遠程檔案存取的能力,並以透明方式對分布在網路上的檔案進行管理和存取,用戶端訪問的時候不需要知道檔案真正存放在哪裡。 Hadoop最初是包含在Nutch中的,後來,Nutch中實現的NDFS和MapReduce代碼剝離出來成立了一個新的開源項目,這就是 Hadoop。
過程中遇到的一些問題:
1,如果put時出現java.io.IOException: Not a file:hdfs://localhost:9000/user/icymary/input/test-in
解決辦法是bin/hadoop dfs -rmr input
2,java.io.IOException: Incompatible namespaceIDs in C:\tmp\hadoop-SYSTEM\dfs\dfs.data.dir: namenode namespaceID = 898136669; datanode namespaceID = 2127444065,原因:每次namenode format會重新建立一個namenodeId,而tmp下包含了上次format下的id,namenode format清空了namenode下的資料,但是沒有晴空
datanode下的資料,導致啟動時失敗,所要做的就是每次fotmat前,清空tmp一下的所有目錄.
參考連結:

http://www.zihou.me/html/2010/02/19/1525.html

http://tdcq.iteye.com/blog/1338777

http://blog.csdn.net/wh62592855/article/details/5752199#

http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html

本文用的hadoop是0.18.2,0.20版本在Linux下的安裝可以參照http://www.cnblogs.com/reckzhou/archive/2012/03/21/2409765.html

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.