Problem Description:
Work with multiple files in Hadoop, one map for each file.
I used the method to generate a file containing all the full paths of the files to be compressed on HDFs. Each map task obtains a path name as input.
When debugging in Eclipse, the FileSystem object used in the map to process files on HDFs is a static member variable in the entire class, running without errors in eclipse, packaged into a jar to commit to the cluster, and then in the map function
filestatus filestatus = tmpfs.getfilestatus (inputdir);
java.. NullPointerException card for 2 days, do not know what is wrong.
only thought of it yesterday afternoon
the
Tmpfs is an empty object and is not assigned a value.
Although TMPFS is declared as a static variable in the outermost class and has an assignment in the main function, within the map function or Nullpointer.
tmpfs
This also verifies that the debug runner in Eclipse is running locally, except that it calls the Hadoop class library and does not see the submitted app information on the 8088-Port monitoring Web page.
must be packaged as a jar and run with the Bin/hadoop jar to actually commit to the cluster run. and the static variables that are initialized inside the main function are still uninitialized in the map, and guessing is the map task running on the cluster, and the local main function is independent of each other.
Corrected code:
1 @Override2 Public voidmap (Object key, Text value,3 context Context)4 throws5 IOException, interruptedexception {6Configuration conf =context. GetConfiguration ();7FileSystem Tmpfs = Filesystem.get (Uri.create ("hdfs://192.168.2.2:9000"), Conf); 8 9Path InputDir =NewPath (Value.tostring ());//gets the path object of the file to be processedTenFilestatus Filestatus =Tmpfs.getfilestatus (inputdir); One A //do the appropriate processing - -Context.write (NewText (Value.tostring ()),NewText ("")); the}
Config conf = context. GetConfiguration ();//Get the Configuration object configured in the job through the context
FileSystem Tmpfs = Filesystem.get (Uri.create ("hdfs://192.168.2.2:9000"), conf);//need to assign a value inside the map function
Appendix:
How to work with multiple files, one map per file?
For example, to compress (zipping) some files on a cluster, you can use the following methods:
- Using Hadoop streaming and user-written mapper scripts:
- Generate a file containing the full path of all files to be compressed on HDFs. Each map task obtains a path name as input.
- Create a mapper script that implements the following functions: Obtain a file name, copy the file locally, compress the file, and send it to the desired output directory.
- Using the existing Hadoop framework:
- Add the following command to the main function:
Fileoutputformat.setcompressoutput (conf, true); Fileoutputformat.setoutputcompressorclass (conf, org.apache.hadoop.io.compress.GzipCodec.class); Conf.setoutputformat (nonsplitabletextinputformat.class); Conf.setnumreducetasks (0);
- To write the map function:
public void Map (writablecomparable key, writable value, outputcollector output, Reporter Reporter) throws IOException { output.collect ((Text) value, null); }
- Note The output file name differs from the original file name
Reasons for using FileSystem objects in the [Hadoop] map function and their solutions java.lang.NullPointerException