Small files automatically merged and output based on Hive Optimization
1. First set the small file standard in the hive-site.xml.
<Property>
<Name> hive. merge. smallfiles. avgsize </name>
<Value> 536870912 </value>
<Description> When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. this is only done for map-only jobs if hive. merge. mapfiles is true, and for map-reduce jobs if hive. merge. mapredfiles is true. </description>
</Property>
2. Only map mapreduce output and merge small files.
<Property>
<Name> hive. merge. mapfiles </name>
<Value> true </value>
<Description> Merge small files at the end of a map-only job </description>
</Property>
3. Output mapreduce with reduce and merge small files.
<Property>
<Name> hive. merge. mapredfiles </name>
<Value> true </value>
<Description> Merge small files at the end of a map-reduce job </description>
</Property>
Hive programming guide PDF (Chinese Version)
Hadoop cluster-based Hive Installation
Differences between Hive internal tables and external tables
Hadoop + Hive + Map + reduce cluster installation and deployment
Install in Hive local standalone Mode
WordCount word statistics for Hive Learning
Hive operating architecture and configuration and deployment
Hive details: click here
Hive: click here
This article permanently updates the link address: