2. HDFS operations

Source: Internet
Author: User
Tags hadoop fs

1. Use command line
1) four common command lines
<1> Create an archive file
Purpose:
Because hadoop is designed to process big data, the ideal data should be a multiple of blocksize. Namenode loads all metadata to the memory at startup.
When a large number of files smaller than blocksize exist, they not only occupy a large amount of storage space, but also occupy a large amount of namenode memory.
Archive can Package Multiple small files into a large file for storage, and the packaged files can still be operated through mapreduce, because
The packaged file consists of two parts: Index and storage. The index part records the original directory structure and file status.
Usage:
Hadoop archive-archivename test. Har-P/A/B/E/F/G/G1/home
-Archivename test. Har indicates the name of the specified archive file (Har is the suffix),-P specifies the path of the file to be packaged, there can be multiple,
The final parameter indicates the location of the archive file (relative path)
Example:
Hadoop archive-archivename ins. Har-p hdfs: // 192.168.80.11: 9000/user. hadoop/in
Mapreduce is also used for running discovery.
<2> distcp distributed Replication
Purpose:
Copy files in parallel in the same file system and must be of the same version.
If the version is inconsistent, HDFS may cause an error because RPC is incompatible. In this case, you can use the HTTP-based hftp protocol,
However, the target is still HDFS:
Hadoop distcp hftp: // namenode: 50070/user/hadoop/input HDFS: // namenode: 9000/user/hadoop/input1
You can also use webhdfs. Both the source address and target address can use webhdfs, which is fully compatible
Hadoop distcp webhdfs: // namenode: 50070/user/hadoop/input webhdfs: // namenode: 50070/user/hadoop/input1
Usage:
Hadoop distcp HDFS: // namenode1/Foo HDFS: // namenode2/Bar
Copy the contents in the foo directory of the first cluster to the bar directory of the second cluster. If the bar does not exist, a new one is created.
Multiple Source paths can be specified. The source path must be an absolute path.
By default, distcp skips existing files in the target path, but overwrites the files by using the-overwrite option,
You can also use-updata to select only the modified files to be updated.
Distcp distributes a large number of files evenly to map for execution. Each file has a single map. If the total size is less than 256 m,
Distcp only allocates one map. However, if the result of the split is map and greater than 20, the number of maps of each node is calculated as 20.
You can use-m to increase the number of maps.
<3> jar
Purpose:
Run a jar file containing hadoop Running code
Example:
Hadoop jar package name main class method parameters
For example:
Hadoop jar sample. Jar mainmethod ARGs
<4> FS
Purpose:
Run a common basic file command


2) 18 common basic commands
Format: hadoop FS-CMD <AGRs>. You can use hadoop FS-help for help.
<1>-cat
Purpose: Output a specified file to the screen.
Example: hadoop FS-cat URI
<2>-copyfromlocal
Purpose: Copy local files to HDFS
Example: hadoop FS-copyfromlocal localuri
<3>-copytolocal
Purpose: copy a file from HDFS to a local file.
Example: hadoop FS-copytolocal localuri
<4>-CP
Purpose: copy an object from the original path to the target path. The original path can contain multiple objects and the target path has only one object.
Example:
Hadoop FS-CP/user/file/user/Files
Hadoop FS-CP/user/file1/user/file2/user/Files
<5>-Du
Purpose: when no specific file is specified, only the directory is specified, the size of each file in the directory is displayed. If a specific directory is specified, the size of all files in the directory is displayed.
Displays the size of all HDFS files.
Example: hadoop FS-du URI
<6>-DUS
Purpose: display the size of the target file or directory.
Example: hadoop FS-DUS or
Hadoop FS-dus hdfs: // master: 9000/user/hadoop/in
<7>-expunge
Purpose: Clear the recycle bin.
Example: hadoop FS-expunge
<8>-Get
Purpose: copy a file to a local file system.
Example: hadoop FS-Get HDFS: // master: 9000/user/hadoop/In/INS/home/hadoop/
<9>-ls
Purpose: traverse. If a directory is used, a list of its subfiles is returned.
Example: hadoop FS-ls returns the list of all folders in the same file
Hadoop FS-ls HDFS: // master: 9000/user/hadoop/in returns all files and folders in the in folder.
<10>-LSR
Purpose: recursively retrieve files
Example: hadoop FS-LSR displays the list of all files in the file system
<11>-mkdir
Purpose: Create a folder. If the parent path does not exist, create the parent directory together.
Example: hadoop FS-mkdir HDFS: // master: 9000/user/hadopp/in2
<12>-MV
Purpose: Move files in the same file system. The target can contain multiple
Example: hadoop FS-mv src target
<13>-put
Purpose: copy one or more paths from the local file system to the target file system.
Example: hadoop FS-put localfile HDFS: // master: 9000/user/hadoop/in
Hadoop FS-Put/home/hadoop/jdk1.6.0 _ 24/HDFS: // master: 9000/user/hadoop/in
<14>-rm
Purpose: delete a specified file. Non-empty directories and files are required.
Example: hadoop FS-RM URI
<15>-RMR
Purpose: recursively delete a specified directory and its sub-Files
Example: hadoop FS-RMR URI
<16>-setrep
Purpose: change the number of copies of a file.
Example: hadoop FS-setrep-W 3-r HDFS: // master: 9000/uer/hadoop/In/INS
<17>-test
Purpose: Use ezd to check the file
Example:-E: Check whether the file exists. If yes, 0 is returned.
-Z: Check whether the file is 0 bytes. If yes, 0 is returned.
-D: Check whether the path is a directory. If yes, 1 is returned. If no, 0 is returned.
<18>-Text
Purpose: output the original file in text format. The running format is ZIP or text.
Example: hadoop FS-text srcfile

* The file path in HDFS can be directly written to/user/hadoop/In/INS, with the previous HDFS: // 192.68.80.11: 9000 omitted,
Because it has been defined in the core-site.xml.

2. Use a web browser to browse HDFS files
Enter http: // 192.168.80.11: 50070 in the browser to view HDFS information and logs.
 
3. Use filesystem APIs to operate HDFS files
1) read data on HDFS
 

Import Java. io. ioexception; import java.net. uri; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. FS. fsdatainputstream; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. ioutils;/*** read data */public class hdfsoper1 {public static void main (string [] ARGs) throws ioexception {configuration conf = new configuration (); string Path = "HDFS: // 192.168.80.11: 9000/user/hadoop/In/INS"; filesystem FS = filesystem. get (URI. create (PATH), conf); fsdatainputstream FSIN = FS. open (New Path (PATH); ioutils. copybytes (FSIN, system. out, 1024 );}}

2) You can set the read location if you have selected a read location.
 

Import Java. io. ioexception; import java.net. uri; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. FS. fsdatainputstream; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. ioutils;/*** you can use the seek () method to set the read location */public class hdfsoper2 {public static void main (string [] ARGs) throws ioexception {configuration conf = new configuration (); string Path = "HDFS: // 192.168.80.11: 9000/user/hadoop/In/INS"; filesystem FS = filesystem. get (URI. create (PATH), conf); fsdatainputstream FSIN = FS. open (New Path (PATH); ioutils. copybytes (FSIN, system. (Out, 1024); FSIN. seek (18); system. out. println ("********** read again ***********"); ioutils. copybytes (FSIN, system. out, 1024 );}}

3) upload local files to HDFS

  import java.io.BufferedInputStream;  import java.io.FileInputStream;  import java.io.IOException;  import java.io.InputStream;  import java.io.OutputStream;  import java.net.URI;  import org.apache.hadoop.conf.Configuration;  import org.apache.hadoop.fs.FileSystem;  import org.apache.hadoop.fs.Path;  import org.apache.hadoop.io.IOUtils;  import org.apache.hadoop.util.Progressable;  /**   * upload file to HDFS   */  public class Hdfsoper3 {   public static void main(String[] args) throws IOException {    Configuration conf = new Configuration();    //source file    String source = "/home/hadoop/jdk-6u24-linux-i586.bin";    InputStream in = new BufferedInputStream(new FileInputStream(source));    //target file    String target = "hdfs://192.168.80.11:9000/user/hadoop/in/jdk.bin";        FileSystem fs = FileSystem.get(URI.create(target), conf);    OutputStream out = fs.create(new Path(target), new Progressable() {     @Override     //when upload 64KB to hdfs, then print a * in the console     public void progress() {      System.out.print("*");     }    });        IOUtils.copyBytes(in, out, 4096, true);   }  }

4) delete files in HDFS

  import java.io.IOException;  import java.net.URI;  import org.apache.hadoop.conf.Configuration;  import org.apache.hadoop.fs.FileSystem;  import org.apache.hadoop.fs.Path;  /**   * delete file by FileSystem.delete( new Path(), true);   */  public class hdfsoper4 {   public static void main(String[] args) throws IOException {    Configuration conf = new Configuration();        String path = "hdfs://192.168.80.11:9000/user/hadoop/in/jdk.bin";        FileSystem fs = FileSystem.get(URI.create(path), conf);        //if we want delete a directory, then true    fs.delete(new Path(path), true);   }  }

This article from the "accumulated water" blog, please be sure to retain this source http://xiaochu.blog.51cto.com/1048262/1436715

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.