For the several compression methods commonly used in Hadoop files, I wrote a Java program to compare them.
The expectation is, given a large file (bigfile.txt), we compress them in various ways and then eventually replicate them to HDFs.
The code is simple: construct an instance of codec and then let it create an output stream to the HDFs
* */package COM.CHARLES.HADOOP.FS;
Import Java.io.BufferedInputStream;
Import Java.io.FileInputStream;
Import Java.io.InputStream;
Import Java.io.OutputStream;
Import Java.net.URI;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.IOUtils;
Import Org.apache.hadoop.io.compress.CompressionCodec;
Import Org.apache.hadoop.io.compress.CompressionCodecFactory;
Import Org.apache.hadoop.io.compress.GzipCodec;
Import Org.apache.hadoop.util.ReflectionUtils; /** * * Description: * * * @author Charles.wang * @created May, 3:23:21 PM * */public class Hadoo
Pcodec {/** * @param args/public static void main (string[] args) throws Exception {
TODO auto-generated Method Stub String inputfile = "Bigfile.txt"; String OutputFolder = "hdfs://192.168.129.35:9000/user/hadoop-user/Codec/";
String outputfile= "bigfile.gz";
Read the configuration of the Hadoop file system Configuration conf = new Configuration ();
Conf.set ("Hadoop.job.ugi", "Hadoop-user,hadoop-user"); Test the efficiency of various compression formats//gzip long gziptime = Copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io . Compress.
Gzipcodec "," GZ "); bzip2 Long bzip2time = Copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io.compress.BZip2Codec"
, "bz2"); Deflate Long deflatetime = copyandzipfile (conf, Inputfile, OutputFolder, "Org.apache.hadoop.io.compress.Default
Codec "," deflate ");
SYSTEM.OUT.PRINTLN ("The compressed file name is:" +inputfile);
SYSTEM.OUT.PRINTLN ("Use gzip compression, Time:" +gziptime+ "milliseconds!");
System.out.println ("Use bzip2 compression, time is:" +bzip2time+ "milliseconds!");
SYSTEM.OUT.PRINTLN ("Use deflate compression, time is:" +deflatetime+ "milliseconds!"); public static long Copyandzipfile (Configuration conf, StringInputfile, String OutputFolder, String codecclassname, String suffixname) throws Exception {Long St
Arttime = System.currenttimemillis ();
Because the local file system is based on the java.io package, we create a local file input stream InputStream in = new Bufferedinputstream (new FileInputStream (Inputfile));
Remove extension extract baseName String baseName = inputfile.substring (0, Inputfile.indexof (".")); Constructs the output file name, which is the path name + base name + extension String outputfile = outputfolder + BaseName + "."
+suffixname;
FileSystem fs = Filesystem.get (Uri.create (outputfile), conf); Create a codec that dynamically generates the example COMPRESSIONCODEC codec = (COMPRESSIONCODEC) reflectionutils.newinstance (Class) based on the incoming class name through the reflection mechanism.
Forname (Codecclassname), conf);
Creates a compressed file output stream that points to the HDFs target file outputstream out = Codec.createoutputstream (Fs.create (New Path (outputfile))); Use the Ioutils tool to copy files from the local file system to the HDFs destination file try {ioutils.copybytes (in, OUT, conf);
Finally {Ioutils.closestream (in);
Ioutils.closestream (out);
Long endtime = System.currenttimemillis ();
return endtime-starttime; }
}