Hadoop Construction Under cygwin

Source: Internet
Author: User

Today I tried to build hadoop under cygwin, there are a lot of documents online (recommended reference: http://space.itpub.net /? Uid-26812308-action-viewspace-itemid-748143), but the actual attempt is still a lot of pitfall, mark the memo:

1. Select hadoop version. Although hadoop has been released to version 1.0 and later, if you want to deploy a pseudo-distributed map reduce in cygwin, we recommend using 0.20.2 because there are two important bugs in a later version:
Https://issues.apache.org/jira/browse/HADOOP-7682
Https://issues.apache.org/jira/browse/HADOOP-8274

These two bugs have not been fixed until version 1.0.4. These two bugs are mainly because cygwin has processed windows path ing and symbolic links trick, however, Java APIs do not consider these features (although there are some hadoop-core jar packages that claim to be able to fix bugs on the Internet, they actually only fix bug7682, so that HDFS can be used normally, but it will cause an error in the map reduce program ). If you have to try the new version of hadoop before fixing these two problems, we recommend that you refer:
Http://en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin#root_Group

2. for sshd installation, you must use a new privilege user for Windows 7 and later systems. We recommend that you use the cyg_server provided by sshd by default as the user name instead of the existing user name in LocalSystem, because sshd needs to specify several special permissions for the Service Running sshd, you can use the following command to compare them:
Editrights-l-u
User Name

3. If you need to reinstall the sshd service, you can use
SC Delete sshd
Or
Cygrunsrv-r sshd

NOTE: If services is enabled in windows during command execution. to view the MSC or other services, close the window and re-open the window before you can see that the service is deleted. Otherwise, the service is marked as deleted.

4. Use SSH-host-config to reconfigure and install the service. Run the following command to start the service (or start the service on the control interface ):
Net start sshd
Or:
Cygrunsrv-s sshd

5. If the service fails to be started, the following error occurs: cygrunsrv: Error starting a service: startservice: Win32 Error 1069.
We recommend that you try the following solutions:
A. ExecuteMkpasswd>/etc/passwdAndMkgroup>/etc/groupTo update the cygwin user and group permission table.
B. If the above method does not work, use passwd to reset the cyg_server (assuming that the privilege user name has not been modified) password.
C. If it still does not work, you can delete the/var/empty directory and re-Execute ssh-host-config to correct the/var/empty directory permission.

6. If you want to start hadoop in pseudo-distributed mode, you need to configure the trust relationship for the local SSH client. For more information, see the online documentation.

7. Remember to format namenode before starting HDFS:
Hadoop namenode-format

8. if DFS. name. dir and DFS. data. DIR is located in two different TMP directories (this is because namenode is started with $ user in cygwin, and datanode is started by hadoop through SSH using cyg_server ), you can use CONF/hadoop. env Configuration:
ExportHadoop_ident_string= Current cygwin login Username

In addition, although the directory we configured for hadoop is/tmp/hadoop-user name, some data in cygwin will be written to the cygwin installation disk (assuming cygwin is installed in C: \ cygwin) in the tmp directory of the root directory, which is also caused by inconsistent processing of POSIX path names by cygwin and Java. Some data is written under c: \ cygwin \ TMP, some are written under c: \ TMP (the data should be written together), which will cause HDFS startup failure. To solve this problem, we recommend that you create an NTFS Symbolic Link under the cygwin installation disk in Windows cmd mode, for example:
Mklink-d c: \ TMP c: \ cygwin \ TMP

The preceding command directs TMP in C: To the tmp directory in cygwin directory, in this way, no matter which directory is written, it will eventually be written to the same location (in fact, NTFS already perfectly supports symbolic connections and hard connections, and I don't know why cygwin does not use the operating system's own functions, you must use a custom lnk file, resulting in bug8274 ).

9. When eclipse runs wordcount, it prompts that the CHMOD solution cannot be found: add c: \ cygwin \ bin to the system path environment variable, and then restart eclipse.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.