Take the file stream as an aspect. Read the source code in the hadoop source code org. Apache. hadoop. FS package. The stream can be divided into two types: input stream and output stream. The following is a simple classification of two types for reading and analysis.
Input stream class
The hierarchy of interfaces and classes related to the input stream is as follows:
Java. io. inputstream (Java. io. closeable) <br/> using Java. io. filterinputstream <br/> implements Java. io. datainputstream (implements Java. io. datainput) <br/> invalid Org. apache. hadoop. FS. fsdatainputstream (implements Org. apache. hadoop. FS. seekable, org. apache. hadoop. FS. positionedreadable) <br/> Org. apache. hadoop. FS. harfilesystem. harfsdatainputstream
The fsdatainputstream class implements the seekable and positionedreadable interfaces, enabling file input streams in the hadoop file system to perform stream search and locate the semantics of stream read respectively.
The seekable interface is defined as follows:
Package Org. apache. hadoop. FS; </P> <p> Import Java. io. *; </P> <p>/** stream that permits seeking. */<br/> Public interface seekable {<br/>/** <br/> * searches for files in the forward direction from the position POs in the specified file. <Br/> */<br/> void seek (long POS) throws ioexception; </P> <p>/** <br/> * return the current offset position in the file stream. <Br/> */<br/> long getpos () throws ioexception; </P> <p>/** <br/> * searches for a different copy of the file data from the targetpos position. True is returned if the file data is searched. Otherwise, false is returned. <Br/> */<br/> Boolean seektonewsource (long targetpos) throws ioexception; <br/>}
The methods defined in the seekable interface are all operations based on the location of the file stream, making it easier to perform stream operations between the file system or the file system.
The positionedreadable interface is defined as follows:
Package Org. apache. hadoop. FS; </P> <p> Import Java. io. *; <br/> Import Org. apache. hadoop. FS. *; </P> <p> Public interface positionedreadable {<br/>/** <br/> * reads a maximum of length bytes from a file stream, to the byte buffer, it is read from the given position. <Br/> * This read mode does not change the current offset of the file, and this method is thread-safe. <Br/> */<br/> Public int read (long position, byte [] buffer, int offset, int length) throws ioexception; </P> <p>/** <br/> * read the length of the byte in the file stream to the byte buffer, it is read from the given position. <Br/> * This read mode does not change the current offset of the file, and this method is thread-safe. <Br/> */<br/> Public void readfully (long position, byte [] buffer, int offset, int length) throws ioexception; </P> <p>/** <br/> * read the buffer length bytes in the file stream to the byte buffer, it is read from the given position <br/> * This read method does not change the current offset of the file, and this method is thread-safe. <Br/> */<br/> Public void readfully (long position, byte [] buffer) throws ioexception; <br/>}
The positionedreadable interface defines three location-based stream read operations.
Then, the fsdatainputstream class inherits from the datainputstream class and implements the above two interfaces, which must implement the operations defined in the interface:
Package Org. apache. hadoop. FS; </P> <p> Import Java. io. *; </P> <p> public class fsdatainputstream extends datainputstream implements seekable, positionedreadable {</P> <p> Public fsdatainputstream (inputstream in) <br/> throws ioexception {<br/> super (in); // call the construction method of the base class to initialize an inputstream in basic class attribute <br/> If (! (In instanceof seekable) |! (In instanceof positionedreadable) {// ensure that inputstream in must implement the seekable and positionedreadable interfaces. <Br/> throw new illegalargumentexception (<br/> "in is not an instance of seekable or positionedreadable "); <br/>}</P> <p> Public synchronized void seek (long desired) throws ioexception {<br/> (seekable) in ). seek (desired); // set to search for the input stream in from the desired position of in <br/>}</P> <p> Public long getpos () throws ioexception {<br/> return (seekable) in ). getpos (); <br/>}</P> <p> Public int read (long position, byte [] buffer, int offset, int length) <br/> throws ioexception {<br/> return (positionedreadable) in ). read (Position, buffer, offset, length); <br/>}</P> <p> Public void readfully (long position, byte [] buffer, int offset, int length) <br/> throws ioexception {<br/> (positionedreadable) in ). readfully (Position, buffer, offset, length); <br/>}</P> <p> Public void readfully (long position, byte [] buffer) <br/> throws ioexception {<br/> (positionedreadable) in ). readfully (Position, buffer, 0, buffer. length); <br/>}</P> <p> Public Boolean seektonewsource (long targetpos) throws ioexception {<br/> return (seekable) in ). seektonewsource (targetpos); <br/>}< br/>}
The most significant feature of the fsdatainputstream input stream class is that it can perform stream operations based on the stream location.
In addition, the org. Apache. hadoop. FS package also defines the RAF (Random Access File)-based input stream class, which can randomly read the stream object. The inheritance relationship is as follows:
Java. io. inputstream (implements Java. io. closeable) <br/> Org. apache. hadoop. FS. fsinputstream (implements Org. apache. hadoop. FS. seekable, org. apache. hadoop. FS. positionedreadable) <br/> Org. apache. hadoop. FS. fsinputchecker <br/> extends Org. apache. hadoop. FS. checksumfilesystem. checksumfsinputchecker
First, let's look at the abstract input stream class. The source code is as follows:
Package Org. apache. hadoop. FS; </P> <p> Import Java. io. *; </P> <p> public abstract class fsinputstream extends inputstream implements seekable, positionedreadable {<br/>/** <br/> * searches for pos at the given offset position. The next read operation starts from this position. <Br/> */<br/> public abstract void seek (long POS) throws ioexception; </P> <p>/** <br/> * returns the current forward offset of the file. <br/> */<br/> public abstract long getpos () throws ioexception; </P> <p>/** <br/> * searches for copies of different file data. If yes, true is returned, otherwise, false is returned. <br/> */<br/> Public Abstract Boolean seektonewsource (long targetpos) throws ioexception; </P> <p> Public int read (long position, byte [] buffer, int offset, int length) <br/> throws ioexception {<br/> synchronized (this) {<br/> long oldpos = getpos (); <br/> int nread =-1; <br/> try {<br/> seek (position); <br/> nread = read (buffer, offset, length ); <br/>}finally {<br/> seek (oldpos); <br/>}< br/> return nread; <br/>}</P> <p> Public void readfully (long position, byte [] buffer, int offset, int length) <br/> throws ioexception {<br/> int nread = 0; <br/> while (nread <length) {<br/> int nbytes = read (Position + nread, buffer, offset + nread, length-nread); <br/> If (nbytes <0) {<br/> throw new eofexception ("End of file reached before reading fully. "); <br/>}< br/> nread + = nbytes; <br/>}</P> <p> Public void readfully (long position, byte [] buffer) <br/> throws ioexception {<br/> readfully (Position, buffer, 0, buffer. length); <br/>}< br/>}
Output stream class
The hierarchy of the interfaces and classes related to the output stream is as follows:
Java. io. outputstream (implements Java. io. closeable, Java. io. flushable) <br/> using Java. io. filteroutputstream <br/> implements Java. io. dataoutputstream <br/> invalid Org. apache. hadoop. FS. fsdataoutputstream (implements Org. apache. hadoop. FS. syncable)
The fsdataoutputstream output stream class implements a position-based buffer output stream class positioncache. The implementation of this class is as follows:
/** <Br/> * This positioncache class is a buffer stream class that caches the position of the output stream. <Br/> */<br/> Private Static class positioncache extends filteroutputstream {</P> <p> private filesystem. statistics; <br/> long position; // The offset position of the output stream object out in the cache </P> <p> Public positioncache (outputstream out, filesystem. statistics stats, long POS) throws ioexception {<br/> super (out); // initialize the outputstream out object inherited from the base class <br/> Statistics = stats; <br/> position = Pos; <br/>}</P> <p> Public void Write (int B) throws ioexception {<br/> out. write (B); // write a byte B to the output stream object out <br/> position ++; // Add 1 to the offset position of the output stream in the cache <br/> If (Statistics! = NULL) {<br/> Statistics. incrementbyteswritten (1); // update the statistical data object of the file system <br/>}</P> <p> Public void write (byte B [], int off, int Len) throws ioexception {<br/> out. write (B, off, Len); // <br/> Position + = Len; // update cache <br/> If (Statistics! = NULL) {<br/> Statistics. incrementbyteswritten (LEN); // Update file Statistics <br/>}</P> <p> Public long getpos () throws ioexception {<br/> return position; // return the current location to be written in the output stream <br/>}</P> <p> Public void close () throws ioexception {<br/> out. close (); // close the output stream <br/>}< br/>
After creating a positioncache buffer Stream object, you can write data into the buffer stream to the file and use it as a buffer. The related data includes: filesystem statistics. statistics: the current location of the stream to be written. Whenever data needs to be written to the file system, a write location is obtained from the positioncache buffer stream (that is, from this location in the stream ).
The fsdataoutputstream output stream class constructs an fsdataoutputstream output stream object through a positioncache buffer stream:
Public fsdataoutputstream (outputstream out, filesystem. statistics stats, long startposition) throws ioexception {<br/> super (New positioncache (Out, stats, startposition); // buffered out stream, cached data objects include stats and startposition <br/> wrappedstream = out; <br/>}
Instantiate the fsdataoutputstream class to obtain the current stream object used to write data, that is, the output stream class outputstream type attribute wrappedstream encapsulated by this class. wrappedstream is an outputstream.
Based on the above constructor method, some parameters are set by default, and the following two constructor methods are obtained:
@ Deprecated <br/> Public fsdataoutputstream (outputstream out) throws ioexception {<br/> This (Out, null ); <br/>}</P> <p> Public fsdataoutputstream (outputstream out, filesystem. statistics stats) <br/> throws ioexception {<br/> This (Out, stats, 0); <br/>}
The fsdataoutputstream class is implemented as follows:
Public long getpos () throws ioexception {<br/> return (positioncache) out ). getpos (); <br/>}</P> <p> Public void close () throws ioexception {<br/> out. close (); <br/>}</P> <p> Public outputstream getwrappedstream () {<br/> return wrappedstream; <br/>}</P> <p>/** wrappedstream is a stream class that must implement the syncable interface and force synchronization of all buffers */<br/> Public void sync () throws ioexception {<br/> If (wrappedstream instanceof syncable) {<br/> (syncable) wrappedstream ). sync (); <br/>}< br/>}
Among them, the Sync method is used to synchronize the operation of the stream buffer, so that the cached stream object is synchronized with the original output stream object to ensure the correctness of the written data.