https://issues.apache.org/jira/browse/HIVE-2340
Select Userid,count (*) from the U_data group by UserID ORDER by UserID would product MRR.
I think when the result of Userid,count (*) is small (one reduce can process the result). This query plan can optimize to MR?
To prevent-reducer merging, the reducer merging only kicks
Optimizer thinks it gets a perf boost.
MR-MRR is not a big win when it comes Tez, due to Container-reuse-
Going wide on the large cardinality in case of missing map-side
Aggregation'll be safer.
If Hive.map.aggr=true and the UserID set fits within memory, then smushing
The reducers would be nicer.
To reset the wide-narrow checks, do
Set hive.optimize.reducededuplication.min.reducer=1;
But being aware that it'll fail (I1ve seen full disks) as a scale upwards
to the + + Tb cases.
Cheers,
Gopal
Hive.optimize.reducededuplication.min.reducer
Reduce deduplication Merges-RSs (reduce sink operators) by moving Key/parts/reducer-num of the child RS to PA Rent RS. That means if reducer-num of the child RS was fixed (order by or forced bucketing) and small, it can make very slow, single MR the optimization would be disabled if number of reducers are less than specified value.
TEZ MRR optimize to MR?