Comparison of compression effects of common compression methods for Hadoop files

Source: Internet
Author: User
Tags comparison file system

For the several compression methods commonly used in Hadoop files, I wrote a Java program to compare them.

The expectation is, given a large file (bigfile.txt), we compress them in various ways and then eventually replicate them to HDFs.

The code is simple: construct an instance of codec and then let it create an output stream to the HDFs

* */package COM.CHARLES.HADOOP.FS; 
Import Java.io.BufferedInputStream; 
Import Java.io.FileInputStream; 
Import Java.io.InputStream; 
Import Java.io.OutputStream; 
     
Import Java.net.URI; 
Import org.apache.hadoop.conf.Configuration; 
Import Org.apache.hadoop.fs.FileSystem; 
Import Org.apache.hadoop.fs.Path; 
Import Org.apache.hadoop.io.IOUtils; 
Import Org.apache.hadoop.io.compress.CompressionCodec; 
Import Org.apache.hadoop.io.compress.CompressionCodecFactory; 
Import Org.apache.hadoop.io.compress.GzipCodec; 
     
Import Org.apache.hadoop.util.ReflectionUtils; /** * * Description: * * * @author Charles.wang * @created May, 3:23:21 PM * */public class Hadoo 
        Pcodec {/** * @param args/public static void main (string[] args) throws Exception { 
     
        TODO auto-generated Method Stub String inputfile = "Bigfile.txt"; String OutputFolder = "hdfs://192.168.129.35:9000/user/hadoop-user/Codec/"; 
     
        String outputfile= "bigfile.gz"; 
        Read the configuration of the Hadoop file system Configuration conf = new Configuration (); 
     
        Conf.set ("Hadoop.job.ugi", "Hadoop-user,hadoop-user"); Test the efficiency of various compression formats//gzip long gziptime = Copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io . Compress. 
        Gzipcodec "," GZ "); bzip2 Long bzip2time = Copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io.compress.BZip2Codec" 
        , "bz2"); Deflate Long deflatetime = copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io.compress.Default 
             
        Codec "," deflate "); 
        SYSTEM.OUT.PRINTLN ("The compressed file name is:" +inputfile); 
        SYSTEM.OUT.PRINTLN ("Use gzip compression, Time:" +gziptime+ "milliseconds!"); 
        System.out.println ("Use bzip2 compression, time is:" +bzip2time+ "milliseconds!"); 
    SYSTEM.OUT.PRINTLN ("Use deflate compression, time is:" +deflatetime+ "milliseconds!"); public static long Copyandzipfile (Configuration conf, StringInputfile, String OutputFolder, String codecclassname, String suffixname) throws Exception {Long St 
     
        Arttime = System.currenttimemillis ();  
     
        Because the local file system is based on the java.io package, we create a local file input stream InputStream in = new Bufferedinputstream (new FileInputStream (Inputfile)); 
        Remove extension extract baseName String baseName = inputfile.substring (0, Inputfile.indexof (".")); Constructs the output file name, which is the path name + base name + extension String outputfile = outputfolder + BaseName + "." 
             
     
        +suffixname; 
     
        FileSystem fs = Filesystem.get (Uri.create (outputfile), conf); Create a codec that dynamically generates the example COMPRESSIONCODEC codec = (COMPRESSIONCODEC) reflectionutils.newinstance (Class) based on the incoming class name through the reflection mechanism. 
     
        Forname (Codecclassname), conf); 
        Creates a compressed file output stream that points to the HDFs target file outputstream out = Codec.createoutputstream (Fs.create (New Path (outputfile))); Use the Ioutils tool to copy files from the local file system to the HDFs destination file try {ioutils.copybytes (in, OUT, conf); 
            Finally {Ioutils.closestream (in); 
        Ioutils.closestream (out); 
     
        Long endtime = System.currenttimemillis (); 
    return endtime-starttime; } 
     
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.