Reasons for using FileSystem objects in the [Hadoop] map function and their solutions java.lang.NullPointerException

Source: Internet
Author: User

Problem Description:

Work with multiple files in Hadoop, one map for each file.

I used the method to generate a file containing all the full paths of the files to be compressed on HDFs. Each map task obtains a path name as input.

When debugging in Eclipse, the FileSystem object used in the map to process files on HDFs is a static member variable in the entire class, running without errors in eclipse, packaged into a jar to commit to the cluster, and then in the map function

 filestatus filestatus =  tmpfs.getfilestatus (inputdir); 
java.. NullPointerException card for 2 days, do not know what is wrong.

only thought of it yesterday afternoon
the Tmpfs is an empty object and is not assigned a value.
Although TMPFS is declared as a static variable in the outermost class and has an assignment in the main function, within the map function or Nullpointer.
tmpfs
This also verifies that the debug runner in Eclipse is running locally, except that it calls the Hadoop class library and does not see the submitted app information on the 8088-Port monitoring Web page. 
must be packaged as a jar and run with the Bin/hadoop jar to actually commit to the cluster run. and the static variables that are initialized inside the main function are still uninitialized in the map, and guessing is the map task running on the cluster, and the local main function is independent of each other.

Corrected code:
1 @Override2          Public voidmap (Object key, Text value,3 context Context)4         throws5 IOException, interruptedexception {6Configuration conf =context. GetConfiguration ();7FileSystem Tmpfs = Filesystem.get (Uri.create ("hdfs://192.168.2.2:9000"), Conf); 8             9Path InputDir =NewPath (Value.tostring ());//gets the path object of the file to be processedTenFilestatus Filestatus =Tmpfs.getfilestatus (inputdir); One  A                 //do the appropriate processing -  -Context.write (NewText (Value.tostring ()),NewText ("")); the}
Config conf = context. GetConfiguration ();//Get the Configuration object configured in the job through the context
FileSystem Tmpfs = Filesystem.get (Uri.create ("hdfs://192.168.2.2:9000"), conf);//need to assign a value inside the map function

Appendix:

How to work with multiple files, one map per file?

For example, to compress (zipping) some files on a cluster, you can use the following methods:

    1. Using Hadoop streaming and user-written mapper scripts:
      • Generate a file containing the full path of all files to be compressed on HDFs. Each map task obtains a path name as input.
      • Create a mapper script that implements the following functions: Obtain a file name, copy the file locally, compress the file, and send it to the desired output directory.
    2. Using the existing Hadoop framework:
      • Add the following command to the main function:
               Fileoutputformat.setcompressoutput (conf, true);       Fileoutputformat.setoutputcompressorclass (conf, org.apache.hadoop.io.compress.GzipCodec.class);       Conf.setoutputformat (nonsplitabletextinputformat.class);       Conf.setnumreducetasks (0);
      • To write the map function:
               public void Map (writablecomparable key, writable value,                                outputcollector output,                                Reporter Reporter) throws IOException {            output.collect ((Text) value, null);       }
      • Note The output file name differs from the original file name

Reasons for using FileSystem objects in the [Hadoop] map function and their solutions java.lang.NullPointerException

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.