Windows Hadoop Programming Environment Configuration Guide

Last Update:2015-09-29 Source: Internet

Author: User

Tags hadoop fs java se

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Liu Yong Email: [email protected]

This blog record author in work and research experience, on the one hand to their work and life leave a mark, on the other hand if you can help, then fortunately even!

Introduction

In view of the recent Hadoop programming, in order to consider programming convenience, in the Windows Local compiler source program, and then directly access the Hadoop cluster, so that the vast number of programmers to provide great convenience. Some practical experience has been accumulated in this process, and some solutions have been provided for the problems that may be encountered in the process (initial stage), and we hope to help you.

Environment Introduction

Hadoop cluster: Hadoop 2.7.1, Cent OS 6.5,jdk 1.7

Eclipse Environment: Windows 7, Eclipse LUNA,JDK 1.8

Environment Construction

1) Install Eclipse Hadoop Plugin

Download the same plug-in as the Hadoop version from the network, such as Hadoop-eclipse-plugin-2.7.1.jar, and copy it to the Eclipse installation directory plugins, such as D:\Program\eclipse\plugins, Then restart Eclipse.

2) Configuring the MapReduce plugin

Windows->preference->hadoop Map/reduce, this document configures the Hadoop processing directory in D:\hadoop. It should be noted that the directory indicates the relevant jar packages required for subsequent compilation of the source program and the required library files (required by Windows compilation).

3) Switching angle of view

Windows->open Perspective->other.

4) Configure Map/reduce Locations

This section configures the location and related information for accessing the Hadoop cluster under Windows. Click Map/reduce locations->new Hadoop locations->general. During parameter configuration, you need to be concerned with the Host and Portin the box and must be consistent with the DFS configuration in the Hadoop cluster. As for user name, it is generally recommended that Hadoop be installed with the username in the cluster, and that changing administrator to Hadoop in Windows users and groups saves a lot of unnecessary hassle. This article uses the username root under Windows and will follow up on the problem (see FAQ 5).

Map/reduce locations->new Hadoop locations->advanced parameters. In the configuration process, only care in the box Hadoop.tmp.dir must be consistent with the Hadoop cluster core-site.xml configuration. At this point, the Eclipse Hadoop plugin is installed.

5) Testing and verification

In the project navigation bar, expand Dfs Locations, and if you can see the actions that users make to a file on a Hadoop cluster through the Hadoop FS Operations Command, the plug-in installation is complete.

programming exceptions and Solutions

After the author's practical experience in the programming process, the written program can be accessed under the Hadoop cluster (clustered locally in the Linux environment), but for the convenience of programming, it may cause problems when it is written under Windows, and the corresponding solution is given based on the author's actual experience.

1) FAQ:java.lang.NullPointerException

This exception occurs because the file system type cannot be identified because the HDFs temporary file exists when compiling under Windows.

Solution: Download files such as hadoop.lib,Winutils.exe , and so on from the network with the same Hadoop version number and copy them to the Hadoop directory bin, such as D:\hadoop\bin, As well as Windows systems, such as C:\Windows\System32, it is also important to note that the downloaded file needs to correspond to the number of CPU bits, i.e. 32-bit or 64-bit. The author of this article has been tossing the CPU bit for a long time. In addition, the author gives the most insurance recommendations, using the Windows SDK to compile the Hadoop source code locally in Windows, and then extract the above lib, exe files, the problem is solved.

2) FAQ:java.lang.UnsatisfiedLinkError:

Further analysis, the exception occurs because there is no appropriate resolution of the HDFS temporary file problem in Windows, so this situation may be the Hadoop.lib and Hadoop cluster version is not correct, on the other hand, may be inconsistent with the number of CPU bits.

Solution: Locate the correct version file from the network, or recompile the Hadoop source code locally.

3) FAQ:org.apache.hadoop.mapreduce.lib.input.InvalidInputException

This exception occurs because the organization of the HDFs file system directory is not explicitly caused, and if you do not specify HDFs, the path under Windows is the default path name.

Solution: Expand Dfs Locations in the Eclipse catalog bar to step through the location of the final file, such as the test file, Hdfs://192.168.101.244:9000/user/root/input, which needs to be added hdfs:/ /ip:port.

4) FAQ:java.lang.UnsupportedClassVersionError

This exception occurs because Hadoop programs that are compiled on Windows do not match the JDK versions that are used on the Hadoop cluster.

Solution: First look at the JDK version on the Hadoop cluster and install the same version on Windows. If JDK 1.8 is used on the window, and jdk1.7 is used on the Hadoop cluster, install JDK 1.7 on Windows and modify the build Path and Compeler for the desired project, and set it to Java SE 1.7, then recompile.

5) FAQ:org.apache.hadoop.security.AccessControlException:Permission denied

After the program runs abnormally stop, without any output results, the exception occurs, because there is no permission to access Hadoop HDFs locally, so in the permission section to deny access and exit, such as the root of this article does not configure permissions to access HDFs, and therefore cannot see the data results, in addition, Many users do not like the default user name Hadoop, so setting permissions is more important.

Solution: Based on the test environment, this article modifies the core-site.xml of the cluster as shown, and closes the access permission check for the HDFs file system. However, in a real-world application, it is set according to the application scenario.

Windows Hadoop Programming Environment Configuration Guide

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More