Original link: http://www.cnblogs.com/vincentzh/p/6055850.html
Last weekend was supposed to write this, the result did not think of last weekend, their environment did not set up, the operation of the problem, dragged until Monday to solve the problem. Just this week will also review the content of the previous reading, while reviewing the code understanding, the impression is very deep, to see the things understand also more deeply.
Directory
- 1. Overview
- 2. Environment Preparation
- 3. Plug-in configuration
- 4. configuration file System connection
- 5. Test Connection
- 6. Code writing and execution
- 7, the problem carding
- 7.1 Console no log output issues
- 7.2 Permissions Issues
1. Overview
Hadoop provides Java APIs for the development of handlers and, likewise, the ability to develop and debug large programs by building a familiar eclipse development environment on-premises, without deploying the code, and executing and outputting results through eclipse. It is convenient to debug and validate data processing logic through the result of sampling data processing. After the code processing logic is verified correctly, all programs can be packaged and uploaded to the cluster for complete data processing.
Before you build your development environment, you need to deploy your own Hadoop environment, which is more realistic, and the scheduling and configuration of the cluster is almost no different from the configuration maintenance of large clusters (Hadoop environment is built in: Hadoop standalone/pseudo-distributed deployment, Hadoop cluster/ Distributed deployment).
Generally through a single machine and pseudo-distributed environment to develop and debug programs, in a single-machine environment using a local file system, the ability to use the Linux command to easily obtain and view the execution results of the code, on the contrary, in the pseudo-distribution and cluster environment, the code directly from HDFs read and output data, Compared to the local environment needs to put/get the data between local and HDFs, a lot of trouble, the development of debugging procedures are using data sampling, or code execution time is too long, in a single machine and pseudo-distribution environment to verify that the code will be deployed on the cluster of complete data processing. LZ in the virtual deployment of two sets of environments, one is a pseudo-distributed environment, the other is a small cluster, of course, the single-machine/pseudo-distribution/cluster can switch between each other, but their own deployment of the environment, in order to switch the trouble, simply two sets of environment are set up, need to do that environment, Connect directly through the development environment.
2. Environment Preparation
1) Configure the cluster and start all daemons, cluster build See: Hadoop standalone/pseudo-distributed deployment, Hadoop cluster/Distributed deployment.
2) Install Eclipse, locally installed with the same version of JDK and HADOOP on the cluster.
3. Configure Plug-ins
Download Hadoop2.x-eclipse-plugin.jar, put it into Eclipse's \plugins directory, restart Eclipse, and in Windows->show view->other you will see map/ The Reduce view, while the left engineering space appears DFS Locations something like a folder.
4. configuration file System connection
Switch to the Map/reduce view for configuration.
The configuration here needs to be noted, consistent with the configuration in the Core-site.xml configuration file on your cluster. The configuration of the LZ configuration file is attached. Of course, someone in the host directly write the hostname, but to the host name and IP mapping is directly written in the Hadoop environment, where the local environment is not able to parse you wrote in the ' Master ' or ' Hadoop ', the most straightforward is to configure with IP, core-site.xml configuration file also use IP for configuration.
5. Test Connection
Configuration to complete the test connection, you need to confirm that all daemons start correctly.
6. Code writing and execution
Test code can try to write their own, if only through the environment to build a successful addiction, go to the official website directly to take it. Links are here.
When you need to write code or copy code and you encounter this problem, why does the engineering workspace have no mapreduce development-related packages, where the MapReduce development package is going to be found, right here.
Before the code test, the new project did not have the relevant jar package that the MapReduce developer needed to use, which is why the same version of Hadoop needed to be installed locally in Windows, which will be used in its installation directory to develop the jar needed to compile the MapReduce program. Package. Set the Hadoop path for Windows Local installation in Windows->preferences->hadoop Map/reduce (for example: E:\ProgramPrivate\ hadoop-2.6.0), the Hadoop-related jar package is automatically imported when the setup is complete and the Hadoop project is created.
I'm going to stick to the API provided by some common basic implementation class to implement the WordCount code bar, specific parameter configuration can be referenced.
1 package com.cnblogs.vincentzh.hadooptest; 2 3 Import java.io.IOException; 4 5 Import Org.apache.hadoop.conf.Configuration; 6 Import Org.apache.hadoop.fs.Path; 7 Import org.apache.hadoop.io.LongWritable; 8 Import Org.apache.hadoop.io.Text; 9 Import org.apache.hadoop.mapred.fileinputformat;10 Import org.apache.hadoop.mapred.fileoutputformat;11 Import ORG.APACHE.HADOOP.MAPRED.JOBCLIENT;12 Import org.apache.hadoop.mapred.jobconf;13 Import ORG.APACHE.HADOOP.MAPRED.LIB.LONGSUMREDUCER;14 Import org.apache.hadoop.mapred.lib.tokencountmapper;15 16//Through The basic implementation class implementation provided by the Hadoop API WordCount17 public class WordCount218 {All public static void main (string[] args) 20 {21 Jobclient client = new Jobclient (), Configuration conf = new configuration (), jobconf jobconf = n EW jobconf (CONF), Jobconf.setjobname ("WordCount2"), + path in = new Path ("hdfs://192.168.1.11 0:9000/user/hadoop/input "); path out = new Path (" hdfs://192.168.1.110:9000/user/hadoop/output "); Fileinputformat.addinputpath (jobconf, in); Fileoutputformat.setoutputpath (jobconf, out); Jobconf.setmapperclass (Tokencountmapper.class); Jobconf.setcombinerclass (LongSumReduce R.class); Jobconf.setreducerclass (Longsumreducer.class); Jobconf.setoutputkeyclass (Text.class); 34 Jobconf.setoutputvalueclass (Longwritable.class);//client.setconf (jobconf); Panax Notoginseng try38 {Jobclient.runjob (jobconf),}41 catch (IOException e), E.prin Tstacktrace (); 44} 45}46}
When execution completes, there will be a corresponding job execution statistics output, and the Refresh folder will see the output files in the DFS file system on the left.
7, the problem carding
Running time may be a lot of problems, here is only listed under the LZ encountered problems and solutions, did not meet the nature is not to share with passers-by.
7.1 Console no log output issues
When Eclipse executes the MapReduce program, there is no output information from the program console, there is no log information, no execution information, and there is no way to know the results of the program execution.
Cause: The console has no log output because the log configuration is not in project.
Solution: Copy the Log4j.properties file from the Hadoop configuration file directory ($HADOOP _home/etc/hadoop/) directly into the project.
7.2 Permissions Issues
Execution of the MapReduce program error message in eclipse similar to: org.apache.hadoop.security.accesscontrolexception:o Rg.apache.hadoop.security.AccessControlException:Permission Denied:user=john, Access=write, inode= "input": Hadoop: Supergroup:rwxr-xr-x ...
The reason: HDFS on Hadoop only when the deployment environment users have read and write permissions, most people should be using the ' Hadoop ' bar, and our development environment is built locally in Windows, the execution of the program is directly with the Local users to submit and execute the job, Hadoop needs to authenticate user submissions when submitting jobs and executing jobs, and users on Windows do not have the right to read and write HDFS files and submit and execute jobs.
Solution: Set the property dfs.permission to False in the Mapred-site.xml configuration file.
Design of the MapReduce development environment based on Eclipse