Application Scenarios
Keeping a large number of small files in our HDFs (and of course not producing small files is a best practice) will make Namenode's namespace a big deal. The namespace holds the Inode information for the HDFs file, and the more files it needs, the greater the Namenode memory, but the memory is limited after all (this is the current Hadoop mishap).
The following image shows the structure of the Har document. The Har file is generated through MapReduce, and the source file is not deleted after the job ends.
har command Description parameter "-P" for src path prefix src can write multiple path
Archive-archivename name-p <parent path> <src>* <dest> generate har file single src folder:
Hadoop archive-archivename 419.har-p/fc/src/20120116/419/user/heipark multiple src folders Hadoop archive-archivename combine.ha R-p/fc/src/20120116/419 334/user/heipark do not specify SRC path, directly archive the parent path (this example is "/fc/src/20120116/", "/user/heipark" is still the output Out path), this trick is from the source of the upside out, hey. Hadoop archive-archivename combine.har-p/fc/src/20120116//user/heipark
Using the pattern matching src path, the following example archives data for folders 10, 11, December. The trick is also from the source. Hadoop archive-archivename combine.har-p/fc/src/2011 1[0-2]/user/heipark view har file
Hadoop fs-ls har:////user/heipark/20120108_15.har/
#输出如下:
drw-r--r---hdfs Hadoop 0 2012-01-17 16:30/user/ heipark/20120108_15.har/2025
drw-r--r---hdfs hadoop 0 2012-01-17 16:30/user/heipark/20120108_15.har/2029
# View har files using the HDFs file system
Hadoop fs-ls/user/yue.zhang/20120108_15.har/
#输出如下:
-rw-r--r--2 hdfs Hadoop 0 2012-01-17 16:30/user/heipark/20120108_15.har/_success
-rw-r--r--5 hdfs hadoop 2411 2012-01-17 16:30/user/ Heipark/20120108_15.har/_index
-rw-r--r--5 hdfs Hadoop 2012-01-17 16:30/user/heipark/20120108_15.har/_ Masterindex
-rw-r--r--2 hdfs hadoop 191963 2012-01-17 16:30/user/heipark/20120108_15.har/part-0
Har Java API (harfilesystem) Java code
public static void Main (string[] args) throws Exception {
configuration conf = new Configuration ();
Conf.set ("Fs.default.name", "hdfs://xxx.xxx.xxx.xxx:9000");
Harfilesystem fs = new Harfilesystem ();
Fs.initialize (New URI ("Har:///user/heipark/20120108_15.har"), conf);
filestatus[] Liststatus = fs.liststatus (New Path ("Sub_dir"));
for (Filestatus filestatus:liststatus) {
System.out.println (Filestatus.getpath (). toString ());
}
}
Reference article:
http://denqiang.com/?m=20111114
http://c.hocobo.net/2010/08/05/har/