Hadoop has an abstract file system concept, and HDFs is just one of the implementations, Java abstract class Org.apache.hadoop.fs.FileSystem defines a filesystem interface in Hadoop, which is a filesystem that implements this interface, as well as other file system implementations, such as the local file system and Rawloca that use the native disk file system. Lfilesystem and so on.
One: HDFs command-line interface
Similar to the traditional file system, HDFS provides a command-line interface to manipulate the file system. The command form of HDFs is generally:%hadoop fs-ls/temp. where Hadoop represents calling Hadoop commands, FS represents a filesystem in Hadoop, and-ls is a command in HDFs that requires a "-" in front of the file System command, followed by the parameters of the command. The meaning of the example command is to view all the file information in the/temp directory in HDFs.
Two: Java interface
There are two ways to write Java programs to access the HDFs file system, one is to read the data through a URL, and the other is to access the file system through the Hadoop filesystem API.
- Reading data from a Hadoop URL
One of the simplest ways to read data from HDFs is to open an input stream through the Java.net.URL class and read the data by reading the stream. Special settings are required for the Java program to recognize the URL scheme for Hadoop. The solution for Hadoop is to call Java.net.URL's Seturlstreamhandlerfactory method and set a fsurlstreamhandlerfactory to identify the file URL scheme for HDFs. In addition, Hadoop provides a ioutils class to handle streams and so on. In a Java virtual machine, the Seturlstreamhandlerfactory method is only allowed to be called once, so if a third-party component in the same virtual machine uses the same method, this cannot be used.
- Accessing data through the API
When the Seturlstreamhandlerfactory method cannot be used, only the FileSystem API provided by Hadoop is used to access HDFs. FileSystem provides a GET method to obtain an instance of a file system, which is determined by an object that configures the configuration class, and the content of the object is determined by the contents of the configuration file, specifically the Core-site.xml decision. The Get method can also have two parameters, the URI and user, which represents the URI scheme, or a file system instance, which represents the user. After you get the file system instance, you can open the file to get Fsdatainputstream. This stream inherits the Java.io.DataInputStream and implements two interfaces, allowing it to read the contents of a particular location in the file, and obtain random reads such as the offset of the starting point of the file where it is now located.
HDFs write files can use Create or append to get a Fsdataoutputstream object that can be used to append data to the end of the file. Hadoop does not support random file write operations. The API also provides the action for creating a new folder. Filestatus is used to save file information, while Filesystemapi provides Liststatus method to list all the files in a directory. Provides a Delete method to delete a file or directory.
- Wildcard characters and Pathfilter
Processing a series of files is a very common requirement, and sometimes you need to select files that match certain criteria from a series of files as input, and you can use wildcards to implement only a subset of the files. However, wildcards do not necessarily meet requirements, and pathfilter can be used to further optimize the implementation of more complex filtering requirements.
Hadoop File System interface