HQL operation
1. Distribute by
Distribute by Col Scatter data across Col columns to different reduce
Sort
Sort by col sorts data according to Col column
Select COL,CO2 from table_name distribute by col1 Sort by col1
Asc,col2 desc;
The combination of both ensures that the output of each reduce is orderly
Application Scenarios:
* The file size of map output is not uniform
* Reduce output file size is not uniform
* Too many small files
* File size is very large
2. Cluster by
put the same worth of data together and sort, for example:
Cluster by Col
Distribute by col order by Col
3. Union All
merging data from multiple tables into a single table, hive does not support union
Select col from (select a as col from T1 union all select B as col from T2) TMP
Requirements:
field names are the same
Same as field type
The same number of fields
child tables cannot have aliases
If you need to query data from a merged table, the merged table must have an alias
HQL Statement of Hive operation