Uploading and downloading files on HDFs is the basic operation of the cluster, in the guide to Hadoop, there are examples of code for uploading and downloading files, but there is no clear way to configure the Hadoop client, after lengthy searches and debugging, How to configure a method for using clustering, and to test the available programs that you can use to manipulate files on the cluster. First, you need to configure the corresponding environment variables:
Copy Code code as follows:
Hadoop_home= "/home/work/tools/java/hadoop-client/hadoop"
For f in $hadoop _home/hadoop-*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
For f in $hadoop _home/lib/*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
Hadoopvfs_home= "/home/work/tools/java/hadoop-client/hadoop-vfs"
For f in $hadoopvfs _home/lib/*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
Export Ld_library_path= $LD _library_path:/home/work/tools/java/hadoop-client/hadoop/lib/native/linux-amd64-64/
Where Ld_library_path is the path to the library to use when calling, and Hadoop_classpath is the various jar packs in our Hadoop client
One thing to note is that it is best not to use the Hadoop_home variable, which is an environment variable used by the system, and it is best not to conflict with it.
Methods of compiling classes:
Copy Code code as follows:
Javac-classpath $CLASSPATH: $hadoop _classpath Hdfsutil.java
Methods to run:
Copy Code code as follows:
Java-classpath $CLASSPATH: $hadoop _classpath hdfsutil
However, in the actual use of the process, will report no permission such as errors, or you can ensure that the code is not a problem, in the runtime will also report some strange mistakes
So the question is, what the hell is this?
Answer: This is because there are no configuration files configured for the corresponding cluster
Because in the "Hadoop Authority Guide" in the book, weakening the configuration of things, so the specific use of the cluster when there will be problems, how to solve it, this way:
Copy Code code as follows:
this.conf = new Configuration (false);
Conf.addresource ("./hadoop-site.xml");
Conf.addresource ("./hadoop-default.xml");
Conf.set ("Fs.hdfs.impl", Org.apache.hadoop.hdfs.DistributedFileSystem.class.getName ()); Conf.set ("Fs.file.impl", Org.apache.hadoop.fs.LocalFileSystem.class.getName ());
Why is this, the book is just very simple:
this.conf = new Configuration ();
That's because the default your cluster is local, so you don't need to configure, but in the actual use of the process, the configuration of each cluster is different, so we want to introduce the cluster configuration
This is a very important point, because the actual use of the process we are using the Hadoop client, but also is a good environment for the cluster, so we need to do a good job of local configuration
Hadoop-site.xml and Hadoop-default.xml These two files in the use of the client's Conf directory, in the AddResource time to designate a good directory on the line
After the configuration mentioned above is fully matched, the program can really run, so configuration is a very important link.
The following is the corresponding tool code, interested to see it, the use of the file flow of the way to do, this way can also get through the FTP and HDFs files between the mutual transmission:
Import Java.io.BufferedInputStream;
Import Java.io.FileInputStream;
Import java.io.FileNotFoundException;
Import Java.io.FileOutputStream;
Import java.io.IOException;
Import Java.io.InputStream;
Import Java.io.OutputStream;
Import Java.net.URI;
Import Java.net.URL;
Import java.io.*;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FSDataInputStream;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;
Import org.apache.hadoop.util.Progressable;
public class Hdfsutil {private String Hdfs_node = "";
Private String Hdfs_path = "";
Private String File_path = "";
Private String Hadoop_site = "";
Private String Hadoop_default = "";
Private Configuration conf = null;
Public Hdfsutil (String hdfs_node) {this.hdfs_node = Hdfs_node;
Public String Gethdfsnode () {return this.hdfs_node;
} public void Sethdfspath (String hdfs_path) {this.hdfs_path = Hdfs_path; } public StRing Gethdfspath () {return this.hdfs_path;
} public void SetFilePath (String file_path) {this.file_path = File_path;
Public String GetFilePath () {return this.file_path;
} public void Sethadoopsite (String hadoop_site) {this.hadoop_site = Hadoop_site;
Public String Gethadoopsite () {return this.hadoop_site;
} public void Sethadoopdefault (String hadoop_default) {this.hadoop_default = Hadoop_default;
Public String Gethadoopdefault () {return this.hadoop_default; public int Setconfigure (Boolean flag) {if (flag = = False) {if (This.gethadoopsite () = "" | |
This.gethadoopdefault () = = "") {return-1;
else {this.conf = new Configuration (false);
Conf.addresource (This.gethadoopdefault ());
Conf.addresource (This.gethadoopsite ());
Conf.set ("Fs.hdfs.impl", Org.apache.hadoop.hdfs.DistributedFileSystem.class.getName ()); Conf.set ("Fs.file.impl", ORG.APACHE.HADOOP.FS.localfilesystem.class.getname ());
return 0;
} this.conf = new Configuration ();
return 0;
Public Configuration Getconfigure () {return this.conf; public int UpLoad (string localname, String remotename) throws FileNotFoundException, IOException {inputstream in
Stream = null;
FileSystem fs = null;
try{instream = new Bufferedinputstream (new FileInputStream (LocalName));
FS = Filesystem.get (Uri.create (This.hdfs_node), this.conf); OutputStream OutStream = fs.create (new Path (remotename), new progressable () {public void progress () {Sy
Stem.out.print ('. ');
}
});
Ioutils.copybytes (instream, OutStream, 4096, true);
Instream.close ();
return 0;
catch (IOException e) {instream.close ();
E.printstacktrace ();
return-1; } public int UpLoad (InputStream instream, String remotename) throws FileNotFoundException, IOException {Filesy Stem fs = Null
try{fs = Filesystem.get (Uri.create (This.hdfs_node), this.conf); OutputStream OutStream = fs.create (new Path (remotename), new progressable () {public void progress () {Sy
Stem.out.print ('. ');
}
});
Ioutils.copybytes (instream, OutStream, 4096, true);
Instream.close ();
return 0;
catch (IOException e) {instream.close ();
E.printstacktrace ();
return-1;
} public int Donwload (string remotename, string localname, int lines) throws FileNotFoundException, IOException {
FileOutputStream fos = null;
InputStreamReader ISR = null;
BufferedReader br = null;
String str = NULL;
OutputStreamWriter OSW = null;
BufferedWriter BUFFW = null;
PrintWriter pw = null;
FileSystem fs = null;
InputStream instream = null;
try {fs = Filesystem.get (uri.create (This.hdfs_node + remotename), this.conf);
instream = Fs.open (new Path (This.hdfs_node + remotename)); FOS = new FileOutputStream (localname);
OSW = new OutputStreamWriter (FOS, "UTF-8");
BUFFW = new BufferedWriter (OSW);
PW = new PrintWriter (BUFFW);
ISR = new InputStreamReader (instream, "UTF-8");
br = new BufferedReader (ISR);
while (str = Br.readline ())!= null && lines > 0) {lines--;
Pw.println (str);
The catch (IOException e) {throw new IOException ("couldn ' t write.", e);
finally {pw.close ();
Buffw.close ();
Osw.close ();
Fos.close ();
Instream.close ()} return 0;
}//main to test public static void main (string[] args) {String hdfspath = null;
String localname = null;
String hdfsnode = null;
int lines = 0;
if (args.length = = 4) {Hdfsnode = args[0];
Hdfspath = args[1];
LocalName = args[2];
Lines = Integer.parseint (args[3]);
} else{hdfsnode = "hdfs://nj01-nanling-hdfs.dmop.baidu.com:54310"; Hdfspath = "/app/ps/spider/wdmqa/wangweilong/test/hdfsutil.java";
LocalName = "/home/work/workspace/project/dhc2-0/dhc/base/ftp/papapa";
lines = 5;
} hdfsutil hdfsutil = new Hdfsutil (Hdfsnode);
Hdfsutil.setfilepath (Hdfsutil.gethdfsnode () +hdfspath);
Hdfsutil.sethadoopsite ("./hadoop-site.xml");
Hdfsutil.sethadoopdefault ("./hadoop-default.xml");
Hdfsutil.setconfigure (FALSE);
try {hdfsutil.donwload (Hdfspath, LocalName, lines);
catch (IOException e) {e.printstacktrace ();
}
}
If you want to learn about downloading files on FTP, refer to this article:
FTP Download Tool
If you want to get through FTP and HDFs file mutual transmission, as long as the creation of a class, call the two articles in the tool interface can be done, write their own code, measured effectively.
The above is the entire content of this article, I hope to be able to master Java has helped.
Please take a moment to share the article with your friends or leave a comment. We will sincerely thank you for your support!