Java combined with Hadoop cluster file upload download _java

Source: Internet
Author: User

Uploading and downloading files on HDFs is the basic operation of the cluster, in the guide to Hadoop, there are examples of code for uploading and downloading files, but there is no clear way to configure the Hadoop client, after lengthy searches and debugging, How to configure a method for using clustering, and to test the available programs that you can use to manipulate files on the cluster. First, you need to configure the corresponding environment variables:

Copy Code code as follows:

Hadoop_home= "/home/work/tools/java/hadoop-client/hadoop"
For f in $hadoop _home/hadoop-*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
For f in $hadoop _home/lib/*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
Hadoopvfs_home= "/home/work/tools/java/hadoop-client/hadoop-vfs"
For f in $hadoopvfs _home/lib/*.jar; Todo
Hadoop_classpath=${hadoop_classpath}: $f
Done
Export Ld_library_path= $LD _library_path:/home/work/tools/java/hadoop-client/hadoop/lib/native/linux-amd64-64/

Where Ld_library_path is the path to the library to use when calling, and Hadoop_classpath is the various jar packs in our Hadoop client
One thing to note is that it is best not to use the Hadoop_home variable, which is an environment variable used by the system, and it is best not to conflict with it.
Methods of compiling classes:

Copy Code code as follows:

Javac-classpath $CLASSPATH: $hadoop _classpath Hdfsutil.java

Methods to run:

Copy Code code as follows:

Java-classpath $CLASSPATH: $hadoop _classpath hdfsutil

However, in the actual use of the process, will report no permission such as errors, or you can ensure that the code is not a problem, in the runtime will also report some strange mistakes
So the question is, what the hell is this?
Answer: This is because there are no configuration files configured for the corresponding cluster
Because in the "Hadoop Authority Guide" in the book, weakening the configuration of things, so the specific use of the cluster when there will be problems, how to solve it, this way:

Copy Code code as follows:

this.conf = new Configuration (false);
Conf.addresource ("./hadoop-site.xml");
Conf.addresource ("./hadoop-default.xml");
Conf.set ("Fs.hdfs.impl", Org.apache.hadoop.hdfs.DistributedFileSystem.class.getName ()); Conf.set ("Fs.file.impl", Org.apache.hadoop.fs.LocalFileSystem.class.getName ());

Why is this, the book is just very simple:

this.conf = new Configuration ();
That's because the default your cluster is local, so you don't need to configure, but in the actual use of the process, the configuration of each cluster is different, so we want to introduce the cluster configuration
This is a very important point, because the actual use of the process we are using the Hadoop client, but also is a good environment for the cluster, so we need to do a good job of local configuration
Hadoop-site.xml and Hadoop-default.xml These two files in the use of the client's Conf directory, in the AddResource time to designate a good directory on the line

After the configuration mentioned above is fully matched, the program can really run, so configuration is a very important link.

The following is the corresponding tool code, interested to see it, the use of the file flow of the way to do, this way can also get through the FTP and HDFs files between the mutual transmission:

Import Java.io.BufferedInputStream;
Import Java.io.FileInputStream;
Import java.io.FileNotFoundException;
Import Java.io.FileOutputStream;
Import java.io.IOException;
Import Java.io.InputStream;
Import Java.io.OutputStream;
Import Java.net.URI;
Import Java.net.URL;

Import java.io.*;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FSDataInputStream;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;

Import org.apache.hadoop.util.Progressable;
  public class Hdfsutil {private String Hdfs_node = "";
  Private String Hdfs_path = "";
  Private String File_path = "";
  Private String Hadoop_site = "";
  Private String Hadoop_default = "";

  Private Configuration conf = null;
  Public Hdfsutil (String hdfs_node) {this.hdfs_node = Hdfs_node;
  Public String Gethdfsnode () {return this.hdfs_node;
  } public void Sethdfspath (String hdfs_path) {this.hdfs_path = Hdfs_path; } public StRing Gethdfspath () {return this.hdfs_path;
  } public void SetFilePath (String file_path) {this.file_path = File_path;
  Public String GetFilePath () {return this.file_path;
  } public void Sethadoopsite (String hadoop_site) {this.hadoop_site = Hadoop_site;
  Public String Gethadoopsite () {return this.hadoop_site;
  } public void Sethadoopdefault (String hadoop_default) {this.hadoop_default = Hadoop_default;
  Public String Gethadoopdefault () {return this.hadoop_default; public int Setconfigure (Boolean flag) {if (flag = = False) {if (This.gethadoopsite () = "" | |
      This.gethadoopdefault () = = "") {return-1;
        else {this.conf = new Configuration (false);
        Conf.addresource (This.gethadoopdefault ());
        Conf.addresource (This.gethadoopsite ());
        Conf.set ("Fs.hdfs.impl", Org.apache.hadoop.hdfs.DistributedFileSystem.class.getName ()); Conf.set ("Fs.file.impl", ORG.APACHE.HADOOP.FS.localfilesystem.class.getname ());
      return 0;
    } this.conf = new Configuration ();
  return 0;
  Public Configuration Getconfigure () {return this.conf; public int UpLoad (string localname, String remotename) throws FileNotFoundException, IOException {inputstream in
    Stream = null;
    FileSystem fs = null;
      try{instream = new Bufferedinputstream (new FileInputStream (LocalName));
      FS = Filesystem.get (Uri.create (This.hdfs_node), this.conf); OutputStream OutStream = fs.create (new Path (remotename), new progressable () {public void progress () {Sy
        Stem.out.print ('. ');

      }
      });
      Ioutils.copybytes (instream, OutStream, 4096, true);
      Instream.close ();
    return 0;
      catch (IOException e) {instream.close ();
      E.printstacktrace ();
    return-1; } public int UpLoad (InputStream instream, String remotename) throws FileNotFoundException, IOException {Filesy Stem fs = Null
      try{fs = Filesystem.get (Uri.create (This.hdfs_node), this.conf); OutputStream OutStream = fs.create (new Path (remotename), new progressable () {public void progress () {Sy
        Stem.out.print ('. ');

      }
      });
      Ioutils.copybytes (instream, OutStream, 4096, true);
      Instream.close ();
    return 0;
      catch (IOException e) {instream.close ();
      E.printstacktrace ();
    return-1;
    } public int Donwload (string remotename, string localname, int lines) throws FileNotFoundException, IOException {
    FileOutputStream fos = null;
    InputStreamReader ISR = null;
    BufferedReader br = null;
    String str = NULL;
    OutputStreamWriter OSW = null;
    BufferedWriter BUFFW = null;
    PrintWriter pw = null;
    FileSystem fs = null;
    InputStream instream = null;
      try {fs = Filesystem.get (uri.create (This.hdfs_node + remotename), this.conf);
instream = Fs.open (new Path (This.hdfs_node + remotename));      FOS = new FileOutputStream (localname);
      OSW = new OutputStreamWriter (FOS, "UTF-8");
      BUFFW = new BufferedWriter (OSW);
      PW = new PrintWriter (BUFFW);
      ISR = new InputStreamReader (instream, "UTF-8");
      br = new BufferedReader (ISR);
        while (str = Br.readline ())!= null && lines > 0) {lines--;
      Pw.println (str);
    The catch (IOException e) {throw new IOException ("couldn ' t write.", e);
      finally {pw.close ();
      Buffw.close ();
      Osw.close ();
      Fos.close ();
  Instream.close ()} return 0;
    }//main to test public static void main (string[] args) {String hdfspath = null;
    String localname = null;
    String hdfsnode = null;

    int lines = 0;
      if (args.length = = 4) {Hdfsnode = args[0];
      Hdfspath = args[1];
      LocalName = args[2];
    Lines = Integer.parseint (args[3]);
      } else{hdfsnode = "hdfs://nj01-nanling-hdfs.dmop.baidu.com:54310"; Hdfspath = "/app/ps/spider/wdmqa/wangweilong/test/hdfsutil.java";
      LocalName = "/home/work/workspace/project/dhc2-0/dhc/base/ftp/papapa";
    lines = 5;
    } hdfsutil hdfsutil = new Hdfsutil (Hdfsnode);
    Hdfsutil.setfilepath (Hdfsutil.gethdfsnode () +hdfspath);
    Hdfsutil.sethadoopsite ("./hadoop-site.xml");
    Hdfsutil.sethadoopdefault ("./hadoop-default.xml");
    Hdfsutil.setconfigure (FALSE);
    try {hdfsutil.donwload (Hdfspath, LocalName, lines);
    catch (IOException e) {e.printstacktrace ();
 }
  }

If you want to learn about downloading files on FTP, refer to this article:

FTP Download Tool

If you want to get through FTP and HDFs file mutual transmission, as long as the creation of a class, call the two articles in the tool interface can be done, write their own code, measured effectively.

The above is the entire content of this article, I hope to be able to master Java has helped.

Please take a moment to share the article with your friends or leave a comment. We will sincerely thank you for your support!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.