Liu Yong Email: [email protected]
This blog record author in work and research experience, on the one hand to their work and life leave a mark, on the other hand if you can help, then fortunately even!
Introduction
In view of the recent Hadoop programming, in order to consider programming convenience, in the Windows Local compiler source program, and then directly access the Hadoop cluster, so that the vast number of programmers to provide great convenience. Some practical experience has been accumulated in this process, and some solutions have been provided for the problems that may be encountered in the process (initial stage), and we hope to help you.
Environment Introduction
Hadoop cluster: Hadoop 2.7.1, Cent OS 6.5,jdk 1.7
Eclipse Environment: Windows 7, Eclipse LUNA,JDK 1.8
Environment Construction
1) Install Eclipse Hadoop Plugin
Download the same plug-in as the Hadoop version from the network, such as Hadoop-eclipse-plugin-2.7.1.jar, and copy it to the Eclipse installation directory plugins, such as D:\Program\eclipse\plugins, Then restart Eclipse.
2) Configuring the MapReduce plugin
Windows->preference->hadoop Map/reduce, this document configures the Hadoop processing directory in D:\hadoop. It should be noted that the directory indicates the relevant jar packages required for subsequent compilation of the source program and the required library files (required by Windows compilation).
3) Switching angle of view
Windows->open Perspective->other.
4) Configure Map/reduce Locations
This section configures the location and related information for accessing the Hadoop cluster under Windows. Click Map/reduce locations->new Hadoop locations->general. During parameter configuration, you need to be concerned with the Host and Portin the box and must be consistent with the DFS configuration in the Hadoop cluster. As for user name, it is generally recommended that Hadoop be installed with the username in the cluster, and that changing administrator to Hadoop in Windows users and groups saves a lot of unnecessary hassle. This article uses the username root under Windows and will follow up on the problem (see FAQ 5).
Map/reduce locations->new Hadoop locations->advanced parameters. In the configuration process, only care in the box Hadoop.tmp.dir must be consistent with the Hadoop cluster core-site.xml configuration. At this point, the Eclipse Hadoop plugin is installed.
5) Testing and verification
In the project navigation bar, expand Dfs Locations, and if you can see the actions that users make to a file on a Hadoop cluster through the Hadoop FS Operations Command, the plug-in installation is complete.
programming exceptions and Solutions
After the author's practical experience in the programming process, the written program can be accessed under the Hadoop cluster (clustered locally in the Linux environment), but for the convenience of programming, it may cause problems when it is written under Windows, and the corresponding solution is given based on the author's actual experience.
1) FAQ:java.lang.NullPointerException
This exception occurs because the file system type cannot be identified because the HDFs temporary file exists when compiling under Windows.
Solution: Download files such as hadoop.lib,Winutils.exe , and so on from the network with the same Hadoop version number and copy them to the Hadoop directory bin, such as D:\hadoop\bin, As well as Windows systems, such as C:\Windows\System32, it is also important to note that the downloaded file needs to correspond to the number of CPU bits, i.e. 32-bit or 64-bit. The author of this article has been tossing the CPU bit for a long time. In addition, the author gives the most insurance recommendations, using the Windows SDK to compile the Hadoop source code locally in Windows, and then extract the above lib, exe files, the problem is solved.
2) FAQ:java.lang.UnsatisfiedLinkError:
Further analysis, the exception occurs because there is no appropriate resolution of the HDFS temporary file problem in Windows, so this situation may be the Hadoop.lib and Hadoop cluster version is not correct, on the other hand, may be inconsistent with the number of CPU bits.
Solution: Locate the correct version file from the network, or recompile the Hadoop source code locally.
3) FAQ:org.apache.hadoop.mapreduce.lib.input.InvalidInputException
This exception occurs because the organization of the HDFs file system directory is not explicitly caused, and if you do not specify HDFs, the path under Windows is the default path name.
Solution: Expand Dfs Locations in the Eclipse catalog bar to step through the location of the final file, such as the test file, Hdfs://192.168.101.244:9000/user/root/input, which needs to be added hdfs:/ /ip:port.
4) FAQ:java.lang.UnsupportedClassVersionError
This exception occurs because Hadoop programs that are compiled on Windows do not match the JDK versions that are used on the Hadoop cluster.
Solution: First look at the JDK version on the Hadoop cluster and install the same version on Windows. If JDK 1.8 is used on the window, and jdk1.7 is used on the Hadoop cluster, install JDK 1.7 on Windows and modify the build Path and Compeler for the desired project, and set it to Java SE 1.7, then recompile.
5) FAQ:org.apache.hadoop.security.AccessControlException:Permission denied
After the program runs abnormally stop, without any output results, the exception occurs, because there is no permission to access Hadoop HDFs locally, so in the permission section to deny access and exit, such as the root of this article does not configure permissions to access HDFs, and therefore cannot see the data results, in addition, Many users do not like the default user name Hadoop, so setting permissions is more important.
Solution: Based on the test environment, this article modifies the core-site.xml of the cluster as shown, and closes the access permission check for the HDFs file system. However, in a real-world application, it is set according to the application scenario.
Windows Hadoop Programming Environment Configuration Guide