HDFS is a distributed file system. For programmers, HDFS is a common file system. hadoop encapsulates the underlying layer. Programmers operate files on HDFS according to the corresponding APIs, there is no much difference from operations on local disk files. However, this problem may still occur during initial contact.
For example, when obtaining a filesystem instance
Java. Lang. nullpointerexception
At org. Apache. hadoop. conf. configuration. Get (configuration. Java: 382)
At org. Apache. hadoop. conf. configuration. getboolean (configuration. Java: 570)
At org. Apache. hadoop. fs. filesystem. Get (filesystem. Java: 192)
At hadoop. Test. urlcat. copyfiletoanotherfile (urlcat. Java: 38) // This is a method I wrote and an error is returned.
At hadoop. Test. urlcat. Main (urlcat. Java: 83)
Code:
Package hadoop. test;
Import java. Io. bufferedinputstream;
Import java. Io. fileinputstream;
Import java. Io. ioexception;
Import java. Io. inputstream;
Import java. Io. outputstream;
Import java.net. malformedurlexception;
Import java.net. Uri;
Import java.net. url;
Import org. Apache. hadoop. conf. configuration;
Import org. Apache. hadoop. conf. configured;
Import org. Apache. hadoop. fs. filesystem;
Import org. Apache. hadoop. fs. fsurlstreamhandlerfactory;
Import org. Apache. hadoop. fs. path;
Import org. Apache. hadoop. HDFS. distributedfilesystem;
Import org. Apache. hadoop. Io. ioutils;
Import org. Apache. hadoop. util. progressable;
Public class urlcat extends configured {
/×Static {
Configuration. adddefaultresource ("hdfs-default.xml ");
Configuration. adddefaultresource ("hdfs-site.xml ");
Configuration. adddefaultresource ("mapred-default.xml ");
Configuration. adddefaultresource ("mapred-site.xml ");
}
×/Errors will be reported if this static block does not exist
Public void copyfiletoanotherfile (string [] ARGs)
{
Inputstream in = NULL;
Outputstream out = NULL;
Try {
String sourcefile = ARGs [0];
String targetfile = ARGs [1];
In = new bufferedinputstream (New fileinputstream (sourcefile ));
Configuration conf = new configuration ();
System. Out. println (CONF );
System. Out. println (URI. Create (targetfile) = NULL );
System. Out. println (CONF = NULL );
System. Out. println (filesystem. Get (URI. Create (targetfile), conf) = NULL );
Filesystem FS = distributedfilesystem. Get (URI. Create (targetfile), conf );
System. Out. println (FS );
Out = FS. Create (New Path (targetfile), new progressable (){
Public void progress () {system. Out. Print (".");}
});
Ioutils. copybytes (In, out, 4096, true );
} Catch (exception e ){
// Todo auto-generated Catch Block
E. printstacktrace ();
} Finally
{
Ioutils. closestream (in );
Ioutils. closestream (out );
}
}
Static {
URL. seturlstreamhandlerfactory (New fsurlstreamhandlerfactory ());
}
Public static void displayfile (string [] ARGs)
{
Inputstream in = NULL;
Try {
In = new URL (ARGs [0]). openstream ();
Ioutils. copybytes (in, system. Out, 4096, false );
} Catch (malformedurlexception e ){
// Todo auto-generated Catch Block
E. printstacktrace ();
} Catch (ioexception e ){
// Todo auto-generated Catch Block
E. printstacktrace ();
} Finally
{
Ioutils. closestream (in );
}
}
/**
* @ Param ARGs
*/
Public static void main (string [] ARGs ){
// Todo auto-generated method stub
New urlcat (). copyfiletoanotherfile (ARGs );
// Urlcat. displayfile (ARGs );
//
}
}
Cause: configuration seems to load only two basic files, so you need to manually import other configuration files
Configuration class: defaultresources. Add ("hadoop-default.xml ");
Finalresources. Add ("hadoop-site.xml ");
Next we will describe the entire code to the execution process, hoping to help people who are new to hadoop programming:
1. You must configure the Java environment java_home and class_path.
Export java_home =/usr/lib/JVM/Java-6-sun
Export Path = $ path: $ java_home/bin
Export classpath =.:/usr/lib/JVM/Java-6-sun/lib
2. write code locally. Of course, you can use the eclipse tool.
3. Set hadoop_classpath.
Hadoop_classpath points to the root directory of the class file, for example,/home/hadoop/eclipseworkspace/testproject/bin in the root directory of hadoop. Test.
4. Run the commandHadoop. Test. urlcat/home/hadoop/documents/test.txt HDFS: // 192.186.54.1: 8020/user/hadoop/test.txt
Another error occurred: Java. Lang. illegalargumentexception: Wrong FS: HDFS: // 192.186.54.1: 8020/user/hadoop/test.txt, expected: HDFS: // hadoop1
At org. Apache. hadoop. fs. filesystem. checkpath (filesystem. Java: 310)
At org. Apache. hadoop. HDFS. distributedfilesystem. checkpath (distributedfilesystem. Java: 99)
At org. Apache. hadoop. HDFS. distributedfilesystem. getpathname (distributedfilesystem. Java: 155)
At org. Apache. hadoop. HDFS. distributedfilesystem. Create (distributedfilesystem. Java: 195)
At org. Apache. hadoop. fs. filesystem. Create (filesystem. Java: 484)
At org. Apache. hadoop. fs. filesystem. Create (filesystem. Java: 384)
At hadoop. Test. urlcat. copyfiletoanotherfile (urlcat. Java: 46)
At hadoop. Test. urlcat. Main (urlcat. Java: 86)
Cause: the HDFS command cannot be used to describe the IP address. The hostname is required. Run the following command:
Hadoop. Test. urlcat/home/hadoop/documents/test.txt HDFS: // hadoop1: 8020/user/hadoop/test.txt
Everything is OK.
My configuration file is IP rather than hostname, because there is no DNS server to help resolve, but the command still needs to use hostname.
To sum up, pay attention to the following two points. Configuration and HDFS: // hostname: Port/user/pathtofile/File