International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Java

Hadoop In-depth Study: (ii)--java access HDFs

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprint please indicate the source, http://blog.csdn.net/lastsweetop/article/details/9001467

All source code on GitHub, Https://github.com/lastsweetop/styhadoop read data using Hadoop URL read A simpler way to read HDFS data is to open a stream via Java.net.URL, but before that, it's Seturlstreamhandlerfactory method is set to Fsurlstreamhandlerfactory (the factory takes the parse HDFs This method can only be invoked once, so write it in a static block. The copybytes of the Ioutils class is then called to copy the HDFs data stream to the standard output stream System.out, the first two parameters are well understood, one input, one output, the third is the cache size, and the fourth specifies whether the stream is closed after the copy is completed. We want to set this to false, the standard output stream does not close, we want to manually close the input stream.

Package com.sweetop.styhadoop;

Import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
Import Org.apache.hadoop.io.IOUtils;

Import Java.io.InputStream;
Import Java.net.URL;

/**
 * Created with IntelliJ idea.
 * User:lastsweetop
 * date:13-5-31 * time
 : Morning 10:16 * To change this
 template use File | Settings | File Templates.
 *
/public class Urlcat {

    static {
        url.seturlstreamhandlerfactory (new Fsurlstreamhandlerfactory ());
    } Public

    static void Main (string[] args) throws Exception {
        inputstream in = null;
        try {in
            = new URL (Args[0]). OpenStream ();
            Ioutils.copybytes (in, System.out, 4096, false);
        finally {
            ioutils.closestream (in);}}}

reading data using the FileSystem APIThe first is to instantiate the FileSystem object, passing the FileSystem class's Get method, where a Java.net.URL and a configuration configuration are passed in. Then filesystem can open a stream through a path object, followed by the example above

Package com.sweetop.styhadoop;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;

Import Java.io.InputStream;
Import Java.net.URI;

/**
 * Created with IntelliJ idea.
 * User:lastsweetop
 * date:13-5-31 * time
 : Morning 11:24 * To change this
 template use File | Settings | File Templates.
 */Public
class Filesystemcat {public
    static void Main (string[] args) throws Exception {
        String uri=args[ 0];
        Configuration conf=new Configuration ();
        FileSystem Fs=filesystem.get (Uri.create (URI), conf);
        InputStream In=null;
        try {
            In=fs.open (new Path (URI));
            Ioutils.copybytes (in, System.out, 4096, false);   finally {
            ioutils.closestream (in);}}}

FsdatainputstreamThe object returned by the filesystem open stream is a Fsdatainputstream object that implements the Seekable interface.

Public interface Seekable {
    void Seek (long L) throws java.io.IOException;
    Long GetPos () throws java.io.IOException;
    Boolean Seektonewsource (Long L) throws java.io.IOException;
}

The Seek method can jump to any location in the file, where we jump to the original location of the file and reread it again.

public class Filesystemdoublecat {public
    static void Main (string[] args) throws Exception {
        String uri = args[0];
        Configuration conf = new Configuration ();
        FileSystem fs = Filesystem.get (Uri.create (URI), conf);
        Fsdatainputstream In=null;
        try {in
            = Fs.open (new Path (URI));
            Ioutils.copybytes (in, System.out, 4096, false);
            In.seek (0);
            Ioutils.copybytes (in, System.out, 4096, false);   finally {
            ioutils.closestream (in);}}}

Fsdatainputstream also implements the Positionedreadable interface,

Public interface Positionedreadable {
    int read (long l, byte[] bytes, int i, int i1) throws java.io.IOException;
    void readfully (Long l, byte[] bytes, int i, int i1) throws java.io.IOException;
    void readfully (Long l, byte[] bytes) throws java.io.IOException;
}

Can be in any position (first argument), offset (third parameter), length (fourth argument), to array (second argument)
It's not going to happen here, you can try it. Write DataThere are many ways to create a file in the FileSystem class, the simplest of which is

Create (Path f) throws IOException

It also has a number of overloaded methods that can specify whether to force overwriting existing files, the file's recurrence factor, the size of the write cache, the file's block size, the file's permissions, and so on. You can also specify a callback interface:

Public interface Progressable {
    void progress ();
}

Like ordinary file system, also support apend operation, write log is most commonly used

Append (Path f) throws IOException

But not all Hadoop file systems support Append,hdfs support, S3 does not support it. Here is an example of copying local files to HDFs

Package com.sweetop.styhadoop;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;

Import org.apache.hadoop.util.Progressable;
Import Java.io.BufferedInputStream;
Import Java.io.FileInputStream;
Import Java.io.InputStream;
Import Java.io.OutputStream;

Import Java.net.URI;
 /** * Created with IntelliJ idea. * User:lastsweetop * date:13-6-2 * Time: PM 4:54 * To change this template use File | Settings |
 File Templates. */public class Filecopywithprogress {public static void main (string[] args) throws Exception {String LocalS
        rc = Args[0];

        String DST = args[1];

        InputStream in = new Bufferedinputstream (new FileInputStream (LOCALSRC));
        Configuration conf = new Configuration ();
        FileSystem fs = Filesystem.get (Uri.create (DST), conf);
         OutputStream out = fs.create (new Path (DST), new progressable () {@Override   public void Progress () {System.out.print (".");

        }
        });
    Ioutils.copybytes (in, out, 4096, true);

DirectoryHow to create a directory:

mkdirs (Path f) throws IOException

The Mkdirs method automatically creates all parent directories that do not exist RetrieveRetrieving a directory and viewing information about directories and files is an essential feature of any operating system, HDFS is no exception, but there are some special places:
FilestatusFilestatus encapsulates metadata for HDFS files and directories, including the length of the file, block size, number of repetitions, modification time, owner, permission, and so on, filesystem Getfilestatus can obtain this information,

Package com.sweetop.styhadoop;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;

Import Org.apache.hadoop.fs.Path;
Import java.io.IOException;

Import Java.net.URI;
 /** * Created with IntelliJ idea. * User:lastsweetop * date:13-6-2 * Time: PM 8:58 * To change this template use File | Settings |
 File Templates. */public class Showfilestatus {public static void main (string[] args) throws IOException {path Path = new
        Path (Args[0]);
        Configuration conf = new Configuration ();
        FileSystem fs = Filesystem.get (Uri.create (args[0)), conf);
        Filestatus status = Fs.getfilestatus (path);
        SYSTEM.OUT.PRINTLN ("path =" + Status.getpath ());
        System.out.println ("owner =" + Status.getowner ());
        System.out.println ("Block size =" + status.getblocksize ());
        System.out.println ("Permission =" + status.getpermission ()); SYSTEM.OUT.PRINTLN ("replication = "+ status.getreplication ()); }
}

Listing FilesSometimes you may need to find a set of files that meet your requirements, and the following example can help you by using the FileSystem Liststatus method to obtain a set of Filestatus objects that meet the criteria, liststatus several overloaded methods and can pass in multiple paths , you can also use Pathfilter to filter, we will talk about it below. There is also an important way to fileutils.stat2paths a set of Filestatus objects into a set of path objects, which is a very handy way to do this.

Package com.sweetop.styhadoop;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.FileUtil;

Import Org.apache.hadoop.fs.Path;
Import java.io.IOException;

Import Java.net.URI;
 /** * Created with IntelliJ idea. * User:lastsweetop * date:13-6-2 * Time: PM 10:09 * To change this template use File | Settings |
 File Templates. 
        */public class Liststatus {public static void main (string[] args) throws IOException {String uri = args[0];
        Configuration conf = new Configuration ();

        FileSystem fs = Filesystem.get (Uri.create (URI), conf);
        path[] paths = new Path[args.length];
        for (int i = 0; i < paths.length i++) {paths[i] = new Path (args[i));
        } filestatus[] status = Fs.liststatus (paths);
        path[] Listedpaths = fileutil.stat2paths (status); for (Path p:listedpaths) {System.out.println(p); }
    }
}

PathfilterAnd then we'll talk about the Pathfilter interface, the interface only needs to implement one of the methods, that is, the Accpet method, the method returns True when the expression is filtered, we implement a regular filter, and it works in the following example

Package com.sweetop.styhadoop;

Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.fs.PathFilter;

/**
 * Created with IntelliJ idea.
 * User:lastsweetop
 * date:13-6-3 * Time
 : PM 2:49 * To change this
 template use File | Settings | File Templates.
 * * Public
class Regexexludepathfilter implements Pathfilter {

    private final String regex;

    Public Regexexludepathfilter (String regex) {
        This.regex = regex;
    }

    @Override Public
    Boolean Accept (path Path) {return
        !path.tostring (). Matches (regex);
    }

File PatternsWhen you need a lot of files, a list of paths is very convenient, HDFS provides a wildcard list file method, through the FileSystem Globstatus method provides this convenient, Globstatus also have overloaded method, using Pathfilter filter, So let's combine two to achieve

Package com.sweetop.styhadoop;

Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileStatus;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.FileUtil;
Import Org.apache.hadoop.fs.Path;

Import java.io.IOException;
Import Java.net.URI;

/**
 * Created with IntelliJ idea.
 * User:lastsweetop
 * date:13-6-3 * Time
 : PM 2:37 * To change this
 template use File | Settings | File Templates.
 */Public
class Globstatus {public
    static void Main (string[] args) throws IOException {
        String uri = args[ 0];
        Configuration conf = new Configuration ();
        FileSystem fs = Filesystem.get (Uri.create (URI), conf);

        filestatus[] Status = Fs.globstatus (new Path (URI), New Regexexludepathfilter ("^.*/1901"));
        path[] Listedpaths = fileutil.stat2paths (status);
        for (Path p:listedpaths) {
            System.out.println (p);}}}

Delete DataIt's easier to delete data

Delete (Path F,
                               Boolean recursive)
                        throws IOException

The first parameter is clear, and the second parameter indicates whether the file in the subdirectory or directory is recursively deleted, but can be ignored when path is directory, but the directory is empty or path is a file, but if the path is a directory and is not empty, if recursive is false, Then the deletion throws an IO exception.

Thank Tom White, this article mostly from the Great God's definitive guide, but the Chinese version of the translation is too bad, on the basis of the original English and some official documents to add some of their own understanding. It's all about reading notes, the superfluous.

If my article is helpful to you, please use Alipay to reward:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop In-depth Study: (ii)--java access HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support