[Author]: Kwu
Configuring hive compression based on Cloudera MANAGER5 configures the compression of hive, which is actually the compression of the configuration MapReduce, including the running results and the compression of intermediate results.
1. Configuration based on hive command line
Set Hive.enforce.bucketing=true;set Hive.exec.compress.output=true;set Mapred.output.compress=true;set Mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec;set io.compression.codecs= Org.apache.hadoop.io.compress.GzipCodec;
In the command line of Hive Run as above code, here is gzip compression.
2. xml file-based compression configuration
Mapred-site.xml
<property> <name>mapred.output.compress</name> <value>true</value> <description>should the job outputs be compressed? </description></property><property> <name>mapred.output.compression.codec</name > <value>org.apache.hadoop.io.compress.GzipCodec</value> <description>if the job Outputs is compressed, how should they be compressed? </description></property>
Hive-site.xml
<property> <name>hive.enforce.bucketing</name> <value>true</value></ property><property> <name>hive.exec.compress.output</name> <value>true</ value></property><property> <name>io.compression.codecs</name> <value >org.apache.hadoop.io.compress.GzipCodec</value></property>
3.Configuring hive compression based on Cloudera Manager5
1) The Mr Configuration based on yarn
2) Configuration of Hive
Add the following content
<property> <name>hive.enforce.bucketing</name> <value>true</value></ property><property> <name>hive.exec.compress.output</name> <value>true</ value></property><property> <name>io.compression.codecs</name> <value >org.apache.hadoop.io.compress.GzipCodec</value></property>
The configuration is complete, and MapReduce includes the hive run results in gzip compression.
Configuring hive compression based on Cloudera MANAGER5