The main class used for file operations in Hadoop is located in the org. apache. hadoop. fs package. Basic file operations include open, read, write, and close. In fact, the file API of Hadoop is generic and can be used in file systems other than HDFS.
The starting point of the Hadoop file API is the FileSystem class, which is an abstract class that interacts with the file system. Different implementation subclasses exist to process HDFS and local file systems. You can call the factory method FileSystem. get (Configuration conf) to obtain the FileSystem instance. The Configuration class is applicable to special classes that retain key/Value Configuration parameters. Its default instantiation method is based on the resource configuration of the HDFS system.
You can obtain the FileSystem object of the HDFS interface as follows:
Configuration conf = new Configuration ();
FileSystem hdfs = FileSystem. get (conf );
To obtain a FileSystem object dedicated to the local file system:
FileSystem local = FileSystem. getLocal (conf );
The Hadoop file API uses the Path object to compile the file and directory name, and uses the FileStatus object to store the metadata of the file and directory. Use the listStatus () method to obtain the file list in a directory:
Path inputDir = new Path (args [0]);
FileStatus [] inputFiles = local. listStatus (inputDir );
The length of the array inputFiles is equal to the number of files in the specified directory. In inputFiles, each FileStatus object has metadata information, such as the file length, permission, and modification time.
You can use the command line bin/hadoop fs-put to copy the local file to HDFS, or you can implement it yourself.
After the following program is compiled and packaged, you can directly run the following command to implement your own upload function:
Hadoop jar filecopy. jar FileCopy cite2.txt cite2.txt
The following is the FileCopy code.
Import java.net. URI;
Import java. io. InputStream;
Import java. io. OutputStream;
Import java. io. BufferedInputStream;
Import java. io. FileInputStream;
Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. FileSystem;
Import org. apache. hadoop. fs. FSDataInputStream;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. io. IOUtils;
Public class FileCopy
{
Public static void main (String [] args) throws Exception
{
If (args. length! = 2 ){
System. err. println ("Usage: filecopy <source> <target> ");
System. exit (2 );
}
Configuration conf = new Configuration ();
InputStream input = new BufferedInputStream (new FileInputStream (args [0]);
FileSystem fs = FileSystem. get (URI. create (args [1]), conf );
OutputStream output = fs. create (new Path (args [1]);
IOUtils. copyBytes (input, output, 4096, true );
}
}
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]