Windows Hadoop Programming Environment Configuration Guide

Source: Internet
Author: User
Tags hadoop fs java se

Liu Yong Email: [email protected]

This blog record author in work and research experience, on the one hand to their work and life leave a mark, on the other hand if you can help, then fortunately even!

 Introduction

In view of the recent Hadoop programming, in order to consider programming convenience, in the Windows Local compiler source program, and then directly access the Hadoop cluster, so that the vast number of programmers to provide great convenience. Some practical experience has been accumulated in this process, and some solutions have been provided for the problems that may be encountered in the process (initial stage), and we hope to help you.

Environment Introduction

Hadoop cluster: Hadoop 2.7.1, Cent OS 6.5,jdk 1.7

Eclipse Environment: Windows 7, Eclipse LUNA,JDK 1.8

Environment Construction

1) Install Eclipse Hadoop Plugin

Download the same plug-in as the Hadoop version from the network, such as Hadoop-eclipse-plugin-2.7.1.jar, and copy it to the Eclipse installation directory plugins, such as D:\Program\eclipse\plugins, Then restart Eclipse.

2) Configuring the MapReduce plugin

Windows->preference->hadoop Map/reduce, this document configures the Hadoop processing directory in D:\hadoop. It should be noted that the directory indicates the relevant jar packages required for subsequent compilation of the source program and the required library files (required by Windows compilation).

3) Switching angle of view

Windows->open Perspective->other.

4) Configure Map/reduce Locations

This section configures the location and related information for accessing the Hadoop cluster under Windows. Click Map/reduce locations->new Hadoop locations->general. During parameter configuration, you need to be concerned with the Host and Portin the box and must be consistent with the DFS configuration in the Hadoop cluster. As for user name, it is generally recommended that Hadoop be installed with the username in the cluster, and that changing administrator to Hadoop in Windows users and groups saves a lot of unnecessary hassle. This article uses the username root under Windows and will follow up on the problem (see FAQ 5).

Map/reduce locations->new Hadoop locations->advanced parameters. In the configuration process, only care in the box Hadoop.tmp.dir must be consistent with the Hadoop cluster core-site.xml configuration. At this point, the Eclipse Hadoop plugin is installed.

5) Testing and verification

In the project navigation bar, expand Dfs Locations, and if you can see the actions that users make to a file on a Hadoop cluster through the Hadoop FS Operations Command, the plug-in installation is complete.

programming exceptions and Solutions

After the author's practical experience in the programming process, the written program can be accessed under the Hadoop cluster (clustered locally in the Linux environment), but for the convenience of programming, it may cause problems when it is written under Windows, and the corresponding solution is given based on the author's actual experience.

1) FAQ:java.lang.NullPointerException

This exception occurs because the file system type cannot be identified because the HDFs temporary file exists when compiling under Windows.

Solution: Download files such as hadoop.lib,Winutils.exe , and so on from the network with the same Hadoop version number and copy them to the Hadoop directory bin, such as D:\hadoop\bin, As well as Windows systems, such as C:\Windows\System32, it is also important to note that the downloaded file needs to correspond to the number of CPU bits, i.e. 32-bit or 64-bit. The author of this article has been tossing the CPU bit for a long time. In addition, the author gives the most insurance recommendations, using the Windows SDK to compile the Hadoop source code locally in Windows, and then extract the above lib, exe files, the problem is solved.

2) FAQ:java.lang.UnsatisfiedLinkError:

Further analysis, the exception occurs because there is no appropriate resolution of the HDFS temporary file problem in Windows, so this situation may be the Hadoop.lib and Hadoop cluster version is not correct, on the other hand, may be inconsistent with the number of CPU bits.

Solution: Locate the correct version file from the network, or recompile the Hadoop source code locally.

3) FAQ:org.apache.hadoop.mapreduce.lib.input.InvalidInputException

This exception occurs because the organization of the HDFs file system directory is not explicitly caused, and if you do not specify HDFs, the path under Windows is the default path name.

Solution: Expand Dfs Locations in the Eclipse catalog bar to step through the location of the final file, such as the test file, Hdfs://192.168.101.244:9000/user/root/input, which needs to be added hdfs:/ /ip:port.

4) FAQ:java.lang.UnsupportedClassVersionError

This exception occurs because Hadoop programs that are compiled on Windows do not match the JDK versions that are used on the Hadoop cluster.

Solution: First look at the JDK version on the Hadoop cluster and install the same version on Windows. If JDK 1.8 is used on the window, and jdk1.7 is used on the Hadoop cluster, install JDK 1.7 on Windows and modify the build Path and Compeler for the desired project, and set it to Java SE 1.7, then recompile.

5) FAQ:org.apache.hadoop.security.AccessControlException:Permission denied

After the program runs abnormally stop, without any output results, the exception occurs, because there is no permission to access Hadoop HDFs locally, so in the permission section to deny access and exit, such as the root of this article does not configure permissions to access HDFs, and therefore cannot see the data results, in addition, Many users do not like the default user name Hadoop, so setting permissions is more important.

Solution: Based on the test environment, this article modifies the core-site.xml of the cluster as shown, and closes the access permission check for the HDFs file system. However, in a real-world application, it is set according to the application scenario.

Windows Hadoop Programming Environment Configuration Guide

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.