HDFS -- how to copy files to HDFS

Source: Internet
Author: User
Tags hadoop fs

The main class used for file operations in Hadoop is located in the org. apache. hadoop. fs package. Basic file operations include open, read, write, and close. In fact, the file API of Hadoop is generic and can be used in file systems other than HDFS.

The starting point of the Hadoop file API is the FileSystem class, which is an abstract class that interacts with the file system. Different implementation subclasses exist to process HDFS and local file systems. You can call the factory method FileSystem. get (Configuration conf) to obtain the FileSystem instance. The Configuration class is applicable to special classes that retain key/Value Configuration parameters. Its default instantiation method is based on the resource configuration of the HDFS system.

You can obtain the FileSystem object of the HDFS interface as follows:

Configuration conf = new Configuration ();

FileSystem hdfs = FileSystem. get (conf );

To obtain a FileSystem object dedicated to the local file system:

FileSystem local = FileSystem. getLocal (conf );

The Hadoop file API uses the Path object to compile the file and directory name, and uses the FileStatus object to store the metadata of the file and directory. Use the listStatus () method to obtain the file list in a directory:

Path inputDir = new Path (args [0]);

FileStatus [] inputFiles = local. listStatus (inputDir );

The length of the array inputFiles is equal to the number of files in the specified directory. In inputFiles, each FileStatus object has metadata information, such as the file length, permission, and modification time.

You can use the command line bin/hadoop fs-put to copy the local file to HDFS, or you can implement it yourself.

After the following program is compiled and packaged, you can directly run the following command to implement your own upload function:

Hadoop jar filecopy. jar FileCopy cite2.txt cite2.txt

The following is the FileCopy code.

Import java.net. URI;
Import java. io. InputStream;
Import java. io. OutputStream;
Import java. io. BufferedInputStream;
Import java. io. FileInputStream;
Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. FileSystem;
Import org. apache. hadoop. fs. FSDataInputStream;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. io. IOUtils;
Public class FileCopy
{
Public static void main (String [] args) throws Exception
{
If (args. length! = 2 ){
System. err. println ("Usage: filecopy <source> <target> ");
System. exit (2 );
}
Configuration conf = new Configuration ();
InputStream input = new BufferedInputStream (new FileInputStream (args [0]);
FileSystem fs = FileSystem. get (URI. create (args [1]), conf );
OutputStream output = fs. create (new Path (args [1]);
IOUtils. copyBytes (input, output, 4096, true );
}
}

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.