1. Enable compression for the Map intermediate output.
Generally for intermediate output compression using a low compression ratio, high compression decompression speed compression algorithm, such as Lzo,snappy
Set hive.exec.compress.intermediate=true;
Set mapred.map.output.compression.codec=Com.hadoop.compression.lzo.LzoCodec;
2. Enable compression for final output results
It is important to note that some compression formats do not support sharding, so that subsequent mapre-reduce tasks will not be processed in parallel.
Set hive.exec.compress.output=true;
Set Mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec;
3. Use sequence file format for output
CREATE table tname stored as sequencefile;
turn on compression for sequence file files
Set Mapred.output.compression.type=block;
Common compression formats:
deflate org.apache.hadoop.io.compress.defaultcodec
gzip org.apache.hadoop.io.compress.gzipcodec
bzip org.apache.hadoop.io.compress.bzip2codec
snappy org.apache.hadoop.io.compress.snappycodec
To enable compression for a hive task