Hadoop installation and configuration in windows and cygwin

Source: Internet
Author: User

Hadoop installation and configuration in windows and cygwin

Address: http://www.zihou.me/html/2010/02/19/1525.html

Use cygwin in Windows to install and configure hadoop in a Unix-like environment.

The sub-monkey is just getting started with hadoop. For the first time, the configuration was successful according to some instructions on the Internet, but some items were not very clear, so I ran the entire process again and recorded it again. I also wanted to have a clear context for the whole process. Please advise if it is incorrect.

1. Required Software

1.1. cygwin (up to now, the latest version is 2.685)

: Http://www.cygwin.com/setup.exe

1.2 JDK 1.6.x

Hadoop-0.20.1 1.3

: Http://apache.freelamp.com/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz

2. Installation

2.1, cygwin installation instructions see the article: http://www.zihou.me/2010/02/19/1506/

Supplement: bash of cygwin cannot be copied and pasted, which is inconvenient. Therefore, Putty can be used, which is:

Http://www.linuxboy.net/linux/rc/puttycyg.zip, put the three EXE files after the puttycyg.zip decompression in the cygwin installation directory home_path bin directory, and then modify the cygwin under home_path. BAT file. We recommend that you open it in notepad and comment out bash-login-I. Add REM, I .e. REM bash-login-I, or: Bash.
-Login-I: Add start putty-cygterm.

In this way, you can copy and paste the file, but note that the default root directory is cygwin's home_path. If you want to switch to another main directory, but if you want to enter another root directory, but if you want to enter another root directory, you need to go to the system root directory. The sub-monkey here is/cygdrive. For example, if you want to enter the E disk, It is/cygdrive/e.

2.2 JDK installation omitted

2.3 hadoop-0.20.1 Installation

Decompress hadoop-0.20.1.tar.gz and unzip the directory, such as the hadoop-0.20.1, on the E Disk:

E: \ hadoop-0.20.1, modify the conf/hadoop-env.sh file, change the value of export java_home to your JDK installation directory on the machine, such as/cygdrive/D/tools/jdk1.6.0 _ 03, /cygdrive is the root directory of the system after cygwin is installed successfully

3. install and configure SSH

3.1 Installation

Run the following commands in the root directory of cygwin:

$ Chmod + R/etc/group $ chmod + R/etc/passwd $ chmod + rwx/var $ ssh-host-config *** info: generating/etc/ssh_host_key *** info: generating/etc/ssh_host_rsa_key *** info: generating/etc/ssh_host_dsa_key *** info: creating default/etc/ssh_config file *** info: Creating default/etc/sshd_config file *** info: Privilege Separation is set to yes by default since OpenSSH 3. 3. * ** info: however, this requires A non-privileged account called 'sshd '. * ** info: for more info on Privilege Separation read/usr/share/doc/OpenSSH/readme. privsep. * ** query: Should Privilege Separation be used? (Yes/No)Yes* ** Info: note that creating a new user requires that the current account have *** info: Administrator privileges. shocould this script attempt to create a *** query: new local account 'sshd '? (Yes/No)Yes* ** Info: updating/etc/sshd_config file *** info: added SSH to c: \ windows \ system32 \ driversc \ Services *** info: creating default/etc/inetd. d/sshd-inetd file *** info: Updated/etc/inetd. d/sshd-inetd *** warning: The following funwing require administrator privileges! * ** Query: Do you want to install sshd as a service? * ** Query: (say "no" if it is already installed as a Service) (Yes/No)Yes* ** Query: Enter the value of cygwin for the daemon: []Cygwin(Note: The cygwin input here can be arbitrary.) *** Info: the sshd service has been installed under the LocalSystem *** info: Account (also known as system ). to start the service now, call *** info: 'net start sshd' or 'cygrunsrv-s sshd '. otherwise, it *** info: will start automatically after the next reboot. * ** info: Host Configuration finished. have fun!

When you ask yes/no, enter yes in a uniform manner and install sshd.

3.2 Configuration

3.2.1 start the sshd service

Net start sshd

Cygwin sshd service is starting

Cygwin sshd service started successfully

3.2.2, $ SSH localhost

Try to connect to the local machine. Note that if the sshd service is not started, the connection will definitely fail! For details about this error, see:
Http://www.zihou.me/2010/02/19/1521/

If there is no problem, the following content will appear:

The authenticity of host 'localhost (127.0.0.1)' can't be established.RSA key fingerprint is 08:03:20:43:48:39:29:66:6e:c5:61:ba:77:b2:2f:55.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.zihou@localhost's password:

You will be prompted to enter your logon password. After you enter the correct password, a text image will appear, similar to the welcome prompt:

The hippo says: Welcome

3.2.3 create an SSH Channel

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsaGenerating public/private dsa key pair.Your identification has been saved in /home/zihou/.ssh/id_dsa.Your public key has been saved in /home/zihou/.ssh/id_dsa.pub.The key fingerprint is:6d:64:8e:a6:38:73:ab:c5:ce:71:cd:df:a1:ca:63:54 zihou@PC-04101515The key's randomart image is:+--[ DSA 1024]----+|                 ||                 ||          o      ||         *  E    ||        S +.     ||     o o +.      ||    + * ..o   .  ||     B + .o. o . ||    ..+  .ooo .  |+-----------------+

$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Run $ SSH localhost again to check whether the sshd has been configured.

4. Configure hadoop

Edit CONF/hadoop-site.xml

Add the following content:

<configuration><property><name>fs.default.name</name><value>localhost:9000</value></property><property><name>mapred.job.tracker</name><value>localhost:9001</value></property><property><name>dfs.replication</name><value>1</value></property></configuration>

5. Run hadoop

Go to E: \ hadoop-0.20.1 and perform the following operations under cygwin, for example:/cygdrive/e/hadoop-0.20.1:

Bin/hadoop namenode-format A New Distributed File System with the following message:
10/02/19 17:32:26 warn Conf. Configuration: deprecated: hadoop-site.xml found in the classpath.
Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml
To override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
(I am not very clear about this section. I use the latest version)

10/02/19 17:32:26 info namenode. namenode: startup_msg:

/*************************************** *********************

Startup_msg: Starting namenode

Startup_msg: host = PC-04101515/192.168.0.14

Startup_msg: ARGs = [-format]

Startup_msg: version = 0.20.1

Startup_msg: Build =

Http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1-r 810220; compiled by 'oom 'on TUE Sep 1 20:55:56 UTC 2009

**************************************** ********************/

10/02/19 17:32:27 info namenode. fsnamesystem:

Fsowner = zihou, none, root, Administrators, Users

10/02/19 17:32:27 info namenode. fsnamesystem: supergroup = supergroup

10/02/19 17:32:27 info namenode. fsnamesystem: ispermissionenabled = true

10/02/19 17:32:28 info common. Storage: Image File of size 102 saved in 0 seconds.

10/02/19 17:32:28 info common. Storage: storage directory \ TMP \ hadoop-System \ DFS \ name has been successfully formatted.

10/02/19 17:32:28 info namenode. namenode: shutdown_msg:

/*************************************** *********************

Shutdown_msg: Shutting Down namenode at PC-04101515/192.168.0.14

**************************************** ********************/
6. Start the hadoop daemon

$ bin/start-all.shstarting namenode, logging to/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-namenode-PC-04101515.outlocalhost: datanode running as process 5200. Stop it first.localhost: secondarynamenode running as process 1664. Stop it first.starting jobtracker, logging to/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-jobtracker-PC-04101515.outlocalhost: starting tasktracker, logging to/cygdrive/e/hadoop-0.20.1/bin/../logs/hadoop-zihou-tasktracker-PC-04101515.out

(Note: If you start it for the first time, the prompt may be different from the above. I re-run it to write this article)

7. Test

Standalone Mode

The following example copies the decompressed conf directory as the input to find and display entries matching the given regular expression. The output is written to the specified output directory. (Note: the root directory is the hadoop directory)

$ Mkdir Input

$ Cp conf/*. xml Input

$ Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'

$ Cat output /*

Run $ bin/hadoop DFS-ls to check whether the *. xml file has been copied to the input file. The execution result is as follows:

Found 1 items

Drwxr-XR-X-zihou supergroup 0 2010-02-19 :44/user/zihou/Input

Indicates that the copy has passed.

Run in pseudo-distributed mode

Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'

If there are no errors, a bunch of information will be provided, such:
10/02/19 14:56:07 warn Conf. Configuration: deprecated: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site

. XML, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

10/02/19 14:56:08 info JVM. jv1_rics: initializing JVM metrics with processname = jobtracker, sessionid =

10/02/19 14:56:09 info mapred. fileinputformat: total input paths to process: 5

10/02/19 14:56:10 info mapred. jobclient: running job: job_local_0001

10/02/19 14:56:10 info mapred. fileinputformat: total input paths to process: 5

10/02/19 14:56:10 info mapred. maptask: numreducetasks: 1

10/02/19 14:56:10 info mapred. maptask: Io. Sort. MB = 100

10/02/19 14:56:10 info mapred. maptask: Data Buffer = 79691776/99614720

10/02/19 14:56:10 info mapred. maptask: Record Buffer = 262144/327680

...............
In this way, hadoop is successfully configured!

Description:

Hadoop Chinese document address: http://hadoop.apache.org/common/docs/r0.18.2/cn/

Quick installation Instruction Manual: http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html

Introduction to hadoop:

Hadoop is an open-source Distributed File System that belongs to a project in Apache. The so-called Distributed File System (Distributed File System) refers to the ability to execute remote file access, and transparently manage and access files distributed on the network. You do not need to know where the files are actually stored when accessing the client. Hadoop was originally included in nutch. Later, the NDfS and mapreduce Code implemented in nutch were stripped out to create a new open-source project, which is hadoop.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.