In Windows, cygwin is used to build the hadoop environment.

Source: Internet
Author: User
Tags hadoop fs

1. Required Software
1.1. cygwin
: Http://www.cygwin.com/setup.exe
1.2 JDK 1.6.x
1.3 hadoop (this example uses hadoop-0.18.2)
: Http://download.csdn.net/detail/kkdelta/4381822
Hadoop official website http://hadoop.apache.org/
2. Installation
2.1, cygwin installation instructions see the article: http://www.zihou.me/2010/02/19/1506/
Supplement: cygwin Bash can not copy and paste, very inconvenient, so Putty can be used, is: http://download.csdn.net/detail/kkdelta/4381833
Put the three EXE files decompressed by puttycyg.zip to the bin directory under home_path under cygwin installation directory, and then modify cygwin under home_path. BAT file. We recommend that you open it in notepad and comment out bash-login-I. Add REM, I .e. REM bash-login-I, or :: bash-login-I, add start putty-load cygwin. In this way, cygwin will be started using putty.
In this way, you can copy and paste the file, but note that the default root directory is cygwin's home_path. If you want to switch to another main directory, but if you want to enter another root directory, if you want to enter another root directory, you need to go to the system root directory. The sub-monkey here is/cygdrive. For example, if you want to enter disk C, it is/cygdrive/C.
2.2 JDK installation omitted
2.3 hadoop-0.18.2 Installation
Decompress hadoop-0.18.2.tar.gz and unzip the directory, such as the hadoop-0.18.2. Suppose it is on the E Disk:
E: \ hadoop-0.18.2, modify the conf/hadoop-env.sh file, change the value of export java_home to your JDK installation directory on the machine, such as/cygdrive/D/tools/jdk1.6.0 _ 03, /cygdrive is the root directory of the system after cygwin is installed successfully.
3. install and configure SSH
3.1 Installation
Run the following commands in the root directory of cygwin:
$ Chmod + R/etc/group
$ Chmod + R/etc/passwd
$ Chmod + rwx/var
$ Ssh-host-config
* ** Info: generating/etc/ssh_host_key
* ** Info: generating/etc/ssh_host_rsa_key
* ** Info: generating/etc/ssh_host_dsa_key
* ** Info: Creating default/etc/ssh_config File
* ** Info: Creating default/etc/sshd_config file
* ** Info: Privilege Separation is set to yes by default since OpenSSH 3.3.
* ** Info: However, this requires a non-privileged account called 'sshd '.
* ** Info: for more info on Privilege Separation read/usr/share/doc/OpenSSH/readme. privsep.
* ** Query: Should Privilege Separation be used? (Yes/No) Yes
* ** Info: note that creating a new user requires that the current account have
* ** Info: Administrator privileges. shocould this script attempt to create
* ** Query: new local account 'sshd '? (Yes/No) Yes
* ** Info: updating/etc/sshd_config file
* ** Info: added SSH to c: \ windows \ system32 \ driversc \ Services
* ** Info: Creating default/etc/inetd. d/sshd-inetd File
* ** Info: Updated/etc/inetd. d/sshd-inetd
* ** Warning: the following functions require administrator privileges!
* ** Query: Do you want to install sshd as a service?
* ** Query: (say "no" if it is already installed as a Service) (Yes/No) Yes
* ** Query: Enter the value of cygwin for the daemon: [] cygwin
(Note: The cygwin input here can be arbitrary)
* ** Info: the sshd service has been installed under the LocalSystem
* ** Info: Account (also known as system). To start the service now, call
* ** Info: 'net start sshd' or 'cygrunsrv-s sshd'. Otherwise, it
* ** Info: will start automatically after the next reboot.
* ** Info: Host Configuration finished. Have fun!
When you ask yes/no, enter yes in a uniform manner and install sshd.
3.2 Configuration
3.2.1 start the sshd service
Net start sshd
Cygwin sshd service is starting
Cygwin sshd service started successfully
3.2.2, $ SSH localhost
Try to connect to the local machine. Note that if the sshd service is not started, the connection will definitely fail! For details about this error, see:

Http://www.zihou.me/2010/02/19/1521/

If there is no problem, the following content will appear:
The authenticity of host 'localhost (127.0.0.1) 'can't be established.
RSA key fingerprint is 08: 03: 20: 43: 48: 39: 29: 66: 6e: C5: 61: BA: 77: B2: 2f: 55.
Are you sure you want to continue connecting (Yes/No )? Yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Zihou @ localhost's password:
You will be prompted to enter your logon password. After you enter the correct password, a text image will appear, similar to the welcome prompt:
The hippo says: Welcome
3.2.3 create an SSH Channel
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
Generating public/private DSA key pair.
Your identification has been saved in/home/zihou/. Ssh/id_dsa.
Your public key has been saved in/home/zihou/. Ssh/id_dsa.pub.
The key fingerprint is:
6D: 64: 8e: A6: 38: 73: AB: C5: Ce: 71: CD: DF: A1: CA: 63: 54 zihou @ PC-04101515
The key's randomart image is:
+ -- [DSA 1024] ---- +

|

|

| O |

| * E |

| S +. |

| O +. |

| + *. O. |

| B +. o. o. |

|... +. Ooo. |

+ ----------------- +
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Run $ SSH localhost again to check whether the sshd has been configured.
4. Configure hadoop
Edit CONF/hadoop-site.xml
Add the following content:
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> localhost: 9000 </value>
</Property>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Property>
</Configuration>
5. Run hadoop
Enter to c: \ hadoop-0.18.2, the operations under cygwin such:
$ CD/cygdrive/C/hadoop-0.18.2,
$ Bin/hadoop namenode-format A New Distributed File System with the following message:
12/06/19 14:46:17 info DFS. Storage: storage directory \ TMP \ hadoop-yaokun \ DFS \ name has been successfully formatted.
6. Start the hadoop daemon
$ Bin/start-all.sh
Starting namenode, logging to/cygdrive/C/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-namenode-NBK-DAL-625040.out
Localhost: Starting datanode, logging to/cygdrive/C/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-datanode-NBK-DAL-625040.out
Localhost: Starting secondarynamenode, logging to/cygdrive/C/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-secondarynamenode-NBK-DAL-625040.out
Starting jobtracker, logging to/cygdrive/C/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-jobtracker-NBK-DAL-625040.out
Localhost: Starting tasktracker, logging to/cygdrive/C/hadoop-0.18.2/bin/../logs/hadoop-YaoKun-tasktracker-NBK-DAL-625040.out
Logs of the hadoop daemon are written to the $ {hadoop_log_dir} directory (default: $ {hadoop_home}/logs ).
Browse the network interfaces of namenode and jobtracker. Their addresses are:
Namenode-http: // localhost: 50070/
Jobtracker-http: // localhost: 50030/
7. Test
The following example copies the decompressed conf directory as the input to find and display entries matching the given regular expression. The output is written to the specified output directory. (Note: the root directory is the hadoop directory)

Run in pseudo-distributed mode
$ Mkdir Input
$ Cp conf/*. xml Input
$ Bin/hadoop FS-put conf Input
12/06/19 15:10:33 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
12/06/19 15:10:33 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
12/06/19 15:10:33 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
12/06/19 15:10:33 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
Put: Target input/conf is a directory

$ Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'
If there are no errors, a bunch of information will be provided, such:
$ Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'
12/06/19 15:46:55 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
12/06/19 15:46:56 warn fs. filesystem: "localhost: 9000" is a deprecated filesystem name. Use "HDFS: // localhost: 9000/" instead.
12/06/19 15:46:57 info mapred. fileinputformat: total input paths to process: 10
12/06/19 15:46:57 info mapred. fileinputformat: total input paths to process: 10
12/06/19 15:46:58 info mapred. jobclient: running job: job_201206191545_0001
12/06/19 15:46:59 info mapred. jobclient: Map 0% reduce 0%
12/06/19 15:47:05 info mapred. jobclient: Map 18% reduce 0%
12/06/19 15:47:09 info mapred. jobclient: Map 36% reduce 0%
12/06/19 15:47:11 info mapred. jobclient: Map 54% reduce 0%
12/06/19 15:47:13 info mapred. jobclient: Map 72% reduce 0%
12/06/19 15:47:15 info mapred. jobclient: Map 81% reduce 0%
12/06/19 15:47:16 info mapred. jobclient: Map 90% reduce 0%
12/06/19 15:47:17 info mapred. jobclient: Map 100% reduce 0%
12/06/19 15:47:26 info mapred. jobclient: Map 100% reduce 12%
12/06/19 15:47:31 info mapred. jobclient: Map 100% reduce 18%
12/06/19 15:47:32 info mapred. jobclient: Map 100% reduce 21%
12/06/19 15:47:36 info mapred. jobclient: Map 100% reduce 27%
12/06/19 15:47:39 info mapred. jobclient: job complete: job_201206191545_0001
.......
View the output file:
Copy the output file from the Distributed File System to the local file system:
$ Bin/hadoop FS-Get output
$ Cat output /*
Or
View the output file on the Distributed File System:
$ Bin/hadoop FS-cat output /*
After all the operations are completed, stop the daemon process:
$ Bin/stop-all.sh
In this way, hadoop is successfully configured!
Note:
Hadoop Chinese document address: http://hadoop.apache.org/common/docs/r0.18.2/cn/
Quick installation Instruction Manual: http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html
Hadoop introduction:
Hadoop is an open-source Distributed File System that belongs to a project in Apache. The so-called Distributed File System (distributedfile System) refers to the ability to execute remote file access, and transparently manage and access files distributed on the network. You do not need to know where the files are actually stored when accessing the client. Hadoop was originally included in nutch. Later, the NDfS and mapreduce Code implemented in nutch were stripped out to create a new open-source project, which is hadoop.
Some problems encountered in the process:
1. If Java. Io. ioexception: Not a file: HDFS: // localhost: 9000/user/icymary/input/test-In occurs during put
Solution: Bin/hadoop DFS-RMR Input
2, Java. io. ioexception: incompatible namespaceids in c: \ TMP \ hadoop-System \ DFS. data. dir: namenode namespaceid = 898136669; datanode namespaceid = 2127444065, cause: A namenodeid will be re-created each time namenode format is created, and TMP contains the ID in the last format, namenode format clears the data in namenode, but there is no clear Blank
Data under datanode causes startup failure. All you need to do is to clear all the TMP Directories Before each fotmat.
Reference link:

Http://www.zihou.me/html/2010/02/19/1525.html

Http://tdcq.iteye.com/blog/1338777

Http://blog.csdn.net/wh62592855/article/details/5752199 #

Http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html

This article uses hadoop is 0.18.2, 0.20 installed in Linux can refer to the http://www.cnblogs.com/reckzhou/archive/2012/03/21/2409765.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.