The client uses the Java API to remotely manipulate HDFs and remotely submit Mr Tasks (source code and exception handling)

Last Update:2018-07-25 Source: Internet

Author: User

Tags exception handling hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Two classes, one HDFs file operation class, one is the WordCount word Count class, all from the Internet view. Code on:

Package mapreduce;
Import java.io.IOException;
Import java.util.ArrayList;
Import java.util.List;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.fs.BlockLocation;
Import Org.apache.hadoop.fs.FSDataInputStream;
Import Org.apache.hadoop.fs.FSDataOutputStream;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.hdfs.DistributedFileSystem;
Import Org.apache.hadoop.hdfs.protocol.DatanodeInfo;
Import Org.apache.hadoop.io.IOUtils; /** * File operation on HDFS * @author Liuxingjiaofu * * */public class Hdfs_file {//read the ' file from HDFS public
			void ReadFile (Configuration conf, String FileName) {try{filesystem HDFs = filesystem.get (conf);
			Fsdatainputstream dis = hdfs.open (new Path (FileName)); 
		     Ioutils.copybytes (DIS, System.out, 4096, false);
		Dis.close ();
		}catch (IOException e) {//TODO auto-generated catch block E.printstacktrace (); }}//copy THe file from HDFS to local public void GetFile (Configuration conf, string srcfile, String dstfile) {try {Filesyst
			  Em HDFs = filesystem.get (conf);
			  Path Srcpath = new Path (srcfile);
			  Path Dstpath = new Path (dstfile);
		Hdfs.copytolocalfile (True,srcpath, Dstpath);
		}catch (IOException e) {//TODO auto-generated catch block E.printstacktrace (); }//copy The local file to HDFS public void Putfile (Configuration conf, string srcfile, String dstfile) {try {F
		  Ilesystem HDFs = filesystem.get (conf);
		  Path Srcpath = new Path (srcfile);
		  Path Dstpath = new Path (dstfile);
		Hdfs.copyfromlocalfile (Srcpath, Dstpath);
		catch (IOException e) {//TODO auto-generated catch block E.printstacktrace (); }//create the new file public Fsdataoutputstream CreateFile (Configuration conf, String FileName) {try {Filesy
		  Stem HDFs = filesystem.get (conf);
		  Path PATH = new Path (FileName);
		 Fsdataoutputstream outputstream = hdfs.create (path); return outputstream;
		catch (IOException e) {//TODO auto-generated catch block E.printstacktrace ();
	return null; }//rename the file name public boolean renamefile (Configuration conf, string SrcName, String dstname) {try {Config
			uration config = new Configuration ();
			FileSystem HDFs = filesystem.get (config);
			Path Frompath = new Path (srcname);
			Path Topath = new Path (dstname);
			Boolean isrenamed = Hdfs.rename (Frompath, Topath);
		return isrenamed;
		}catch (IOException e) {//TODO auto-generated catch block E.printstacktrace ();
	return false; }//delete the file//Tyep = True, delete the directory/type = False, delete the file public boolean delfile (confi
			  Guration conf, String FileName, Boolean type {try {filesystem HDFs = filesystem.get (conf);
			  Path PATH = new Path (FileName);
			  Boolean isdeleted = Hdfs.delete (path, type);
		return isdeleted; }catch (IOException e) {//TODO auto-generated catch block E.prinTstacktrace ();
	return false; //get HDFS File modification time public long getfilemodtime (Configuration conf, String FileName) {try{Fil
			  Esystem HDFs = filesystem.get (conf);
			  Path PATH = new Path (FileName);
			  Filestatus filestatus = hdfs.getfilestatus (path);
			  Long modificationtime = Filestatus.getmodificationtime ();
		return modificationtime;
		}catch (IOException e) {e.printstacktrace ();
	return 0; }//check If a file exists in HDFS public boolean checkfileexist (Configuration conf, String FileName) {try{Fi
			  Lesystem HDFs = filesystem.get (conf);
			  Path PATH = new Path (FileName);
			  Boolean isexists = hdfs.exists (path);
		return isexists;
		}catch (IOException e) {e.printstacktrace ();
	return false; }//get the locations of a file in the HDFS cluster public list<string []> getfilebolckhost (Configuration conf, St
			  Ring FileName) {try{list<string []> List = new arraylist<string []> (); FileSystem HDFs = filesystem.get (conf);
			  Path PATH = new Path (FileName);
	
			  Filestatus filestatus = hdfs.getfilestatus (path);
			  
			  blocklocation[] blklocations = hdfs.getfileblocklocations (filestatus, 0, Filestatus.getlen ());
			  int blkcount = Blklocations.length;
			    for (int i=0 i < Blkcount i++) {string[] hosts = blklocations[i].gethosts ();
			   List.add (hosts);
			} return list;
			}catch (IOException e) {e.printstacktrace ();
	return null; }//get a list of all the nodes host names in the HDFS cluster//have no authorization to does this operation public Str
			  Ing[] Getallnodename (Configuration conf) {try{filesystem fs = Filesystem.get (conf);
			  Distributedfilesystem HDFs = (distributedfilesystem) fs;
			  datanodeinfo[] Datanodestats = Hdfs.getdatanodestats ();
			  string[] names = new String[datanodestats.length];
			for (int i = 0; i < datanodestats.length i++) {Names[i] = Datanodestats[i].gethostname ();  return names;
			}catch (IOException e) {System.out.println ("error!!!!");
		E.printstacktrace ();
	return null; }
}

WordCount

Package mapreduce;
Import java.io.IOException;

Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;

Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Mywordcount {public static class Wordcountmapper extends Mapper<longwritable, text, text, intwritable
		>{private final static intwritable one = new intwritable (1);
		Private Text Word = new text (); public void Map (longwritable key, Text value, Context context) throws IOException, interruptedexception{String line = V
			Alue.tostring (); StringTokenizer ITR = new StringtokenIzer (line);
				while (Itr.hasmoreelements ()) {Word.set (Itr.nexttoken ());
			Context.write (Word, one); }} public static class Wordcountreducer extends Reducer<text, intwritable, Text, intwritable>{public vo ID reduce (Text key, iterable<intwritable>values, context context) throws IOException, interruptedexception{int su
			m = 0;
			for (intwritable str:values) {sum + = Str.get ();
		} context.write (Key, New intwritable (sum)); }/** * 2 args, the file you want to count words and the directory you want to save the result * @param args /home/hadooper/testmp/testtext/home/hadooper/testmp/testresult * @throws Exception */public static void main (Strin
		G args[]) throws exception{//First define two temporary folders, here you can use the random function + filename, so the chance of duplicate name is very small.
		String dstfile = "TEMP_SRC";
		String srcfile = "TEMP_DST";
		This generates a file manipulation object.
		
		Hdfs_file File = new Hdfs_file ();
		Configuration conf = new Configuration ();  Must!!! Config The fs.default.name is the same toThe value in Core-site.xml conf.set ("Fs.default.name", "hdfs://node1");

 Conf.set ("Mapred.job.tracker", "node1:54311"); From the local upload file to the HDFs, can be a file or directory file.
		
		Putfile (conf, args[0], dstfile);
		System.out.println ("Up OK");		
		Job Job = new Job (conf, "Mywordcount");
		
		Job.setjarbyclass (Mywordcount.class);
		
		Job.setinputformatclass (Textinputformat.class);
		Job.setoutputkeyclass (Text.class);
		
		Job.setoutputvalueclass (Intwritable.class);
		Job.setmapperclass (Wordcountmapper.class);
		Job.setreducerclass (Wordcountreducer.class);
		Job.setcombinerclass (Wordcountreducer.class);
		Note that the input output here should be the file or directory fileinputformat.setinputpaths (Job, New Path (Dstfile)) under HDFs.
Fileoutputformat.setoutputpath (Job, New Path (Srcfile));
		Start running Job.waitforcompletion (true); Save files from HDFs to local file.
		GetFile (conf, Srcfile, args[1]);
System.out.println ("Down of the result ok!"); Deletes temporary files or directory file.
		Delfile (conf, dstfile, true); File.
		Delfile (conf, srcfile, true);
	System.out.println ("Delete file on HDFs ok!"); }
}

Several errors were encountered during the period:

1.HDFS version problem --call to node1/172.*.*.*:8020 failed in local exception:java.io.EOFException

Main () {.....
Configuration conf = new Configuration ();
Conf.set ("Fs.default.name", "Hdfs://node1"), corresponding to the values in Conf/core-site, must
Hdfs_file File = new Hdfs_file ();
Print all node name
string[] host_name = file. Getallnodename (conf);
......}
Public string[] Getallnodename (Configuration conf) {
try{
Configuration config = new Configuration ();
FileSystem fs = Filesystem.get (conf);
Distributedfilesystem HDFs = (distributedfilesystem) fs;
datanodeinfo[] Datanodestats = Hdfs.getdatanodestats ();
string[] names = new String[datanodestats.length];
for (int i = 0; i < datanodestats.length; i++) {
Names[i] = Datanodestats[i].gethostname ();
}
return names;
}catch (IOException e) {
SYSTEM.OUT.PRINTLN ("eeeeeeeeeeeeeeeeeeeerror!!!!");
E.printstacktrace ();
}
return null;
}
Abnormal:
EEEEEEEEEEEEEEEEEEEERROR!!!!
Java.io.IOException:Call to node1/172.10.39.250:8020 failed on the local exception:java.io.EOFException
At Org.apache.hadoop.ipc.Client.wrapException (client.java:775)
At Org.apache.hadoop.ipc.Client.call (client.java:743)
At Org.apache.hadoop.ipc.rpc$invoker.invoke (rpc.java:220)
At $Proxy 0.getProtocolVersion (Unknown Source)
At Org.apache.hadoop.ipc.RPC.getProxy (rpc.java:359)
At Org.apache.hadoop.hdfs.DFSClient.createRPCNamenode (dfsclient.java:112)
At Org.apache.hadoop.hdfs.dfsclient.<init> (dfsclient.java:213)
At Org.apache.hadoop.hdfs.dfsclient.<init> (dfsclient.java:176)
At Org.apache.hadoop.hdfs.DistributedFileSystem.initialize (distributedfilesystem.java:82)
At Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:1378)
At org.apache.hadoop.fs.filesystem.access$200 (filesystem.java:66)
At Org.apache.hadoop.fs.filesystem$cache.get (filesystem.java:1390)
At Org.apache.hadoop.fs.FileSystem.get (filesystem.java:196)
At Org.apache.hadoop.fs.FileSystem.get (filesystem.java:95)
At MapReduce. Hdfs_file.getallnodename (hdfs_file.java:151)
At MapReduce. File_operation.main (file_operation.java:15)
caused by:java.io.EOFException
At Java.io.DataInputStream.readInt (datainputstream.java:392)
At Org.apache.hadoop.ipc.client$connection.receiveresponse (client.java:501)
At Org.apache.hadoop.ipc.client$connection.run (client.java:446)
Exception in thread "main" java.lang.NullPointerException
At MapReduce. File_operation.main (FILE_OPERATION.JAVA:16)
Cause: Versioning problem, make sure that the jar package in Java is the same version of the jar package as the Hadoop cluster
2.HDFSPermission Issues

Org.apache.hadoop.security.AccessControlException:org.apache.hadoop.security.AccessControlException:Permission Denied:user=hadooper, Access=write, inode= "/user": root:supergroup:drwxr-xr-x

Solution of
(1 added this entry to Conf/hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
(2. Release the permissions to write to the directory Hadoop directory, as follows: $ Hadoop fs-chmod 777/user/
I'm using a 2nd option.

3.HDFS 2011-12-20 17:00:32 Org.apache.hadoop.util.NativeCodeLoader <clinit>
Warning: Unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable

You can set whether a local library is used in the profile core-site.xml of Hadoop:
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>should Native Hadoop libraries, if present, be used.</description>
</property>

Hadoop is configured by default to enable local libraries.
In addition, you can set the location in the environment variable to use the local library:
Export Java_library_path=/path/to/hadoop-native-libs
Sometimes you will find that the local library with Hadoop is not available, in which case you need to compile the local library yourself. In the $hadoop_home directory, use the following command:
Ant Compile-native
Once the compilation is complete, you can find the appropriate file in the $hadoop_home/build/native directory, and then specify the path to the file or move the compiled file to the default directory
I tried, that is 64, my computer is 32 bits, no source code, can not compile, that had a section of the program to find out which code out of this warning, I am
try {
FileSystem HDFs = filesystem.get (conf);
Path Srcpath = new Path (srcfile);
Path Dstpath = new Path (dstfile);
Hdfs.copytolocalfile (True,srcpath, Dstpath);//Navigate to this sentence
}catch (IOException e) {
At this stage, it can only be so, why, Java is not cross-platform?

4.mr-jar Package Missing

ClassNotFoundException:org.codehaus.jackson.map.JsonMappingException
Noclassdeffounderror:org/apache/commons/httpclient/httpmethod

Add jar package to Java project

Jackson-core-asl-1.5.2.jar
Jackson-mapper-asl-1.5.2.jar

Commons-httpclient-3.0.1.jar

I'm not used to adding all the jar packs to the project, and I think it's easy to add more and waste time and space.
It's good to finish the first MapReduce.

5. Remote job hangs off, incredibly still can run successfully, found that Mapred.job.tracker property is not set, the default in the local operation, its value in Namenode Mapred-site.xml see

Conf.set ("Mapred.job.tracker", "node1:54311");

The configuration is over, the run can be initialized, but the Mapper class is not found:

Info: Task Id:attempt_201112221123_0010_m_000000_0, status:failed
Java.lang.runtimeexception:java.lang.classnotfoundexception:mapreduce.mywordcount$wordcountmapper
At Org.apache.hadoop.conf.Configuration.getClass (configuration.java:996)
At Org.apache.hadoop.mapreduce.JobContext.getMapperClass (jobcontext.java:212)
At Org.apache.hadoop.mapred.MapTask.runNewMapper (maptask.java:611)
At Org.apache.hadoop.mapred.MapTask.run (maptask.java:325)
At Org.apache.hadoop.mapred.child$4.run (child.java:270)
At Java.security.AccessController.doPrivileged (Native method)
At Javax.security.auth.Subject.doAs (subject.java:396)
At Org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1127)
At Org.apache.hadoop.mapred.Child.main (child.java:264)

The program into a jar package on the Jobtracker on the Hadoop cluster is available, normal, the result is correct, but run on the client is reported above error, for the time being not resolved.

Summary

1. Remote operation HDFs file and remote submit Mr Task, two items that must be configured (other temporarily not found):

Conf.set ("Fs.default.name", "Hdfs://node1"), corresponding to the values in Conf/core-site.xml, must
Conf.set ("Mapred.job.tracker", "node1:54311");//mapred-site.xml 2. Patiently analyze problems and solve problems

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More