Hive configuration has a parameter hive.mapred.mode, divided into nonstrict,strict, the default is Nonstrict
If set to strict, three of cases of statements in the compile link do filter:
1. Cartesian product join. In this case, the reduce join key is not specified, so only one reducer is enabled, resulting in a performance bottleneck when the volume of data is large
Use only 1 reducer in case of Cartesian product
if (reducekeys.size () = = 0) {
numreds = 1;
Cartesian product isn't supported in strict mode
if (Conf.getvar (HiveConf.ConfVars.HIVEMAPREDMODE). Equalsignorecase (
"strict")) {
throw new Semanticexception (ERRORMSG.NO_CARTESIAN_PRODUCT.GETMSG ());
}
}
2. The order is not followed by limit. The order by forces the reduce number to be set to 1, without limit, and all the data is sink to the reduce end for full sorting.
if (Sortexprs = = null) {
SORTEXPRS = Qb.getparseinfo (). Getorderbyforclause (dest);
if (Sortexprs!= null) {
assert numreducers = = 1;
In strict mode, the presence of order by, limit must is specified
Integer limit = Qb.getparseinfo (). Getdestlimit (dest);
if (Conf.getvar (HiveConf.ConfVars.HIVEMAPREDMODE). Equalsignorecase (
"strict")
&& limit = null) {
throw new Semanticexception (Generateerrormessage (Sortexprs,
ERRORMSG.NO_LIMIT_WITH_ORDERBY.GETMSG ()));
}
}
3. The table read is partitioned table, but partition predicate is not specified.
Note: If it is a multilevel partition table, just show any one and release it.
If the "strict" mode is on, we have to provide partition Pruner for
Each table.
if ("strict". Equalsignorecase (Hiveconf.getvar) (Conf,
HiveConf.ConfVars.HIVEMAPREDMODE))) {
if (!hascolumnexpr (prunerexpr)) {
throw New Semanticexception (errormsg.no_partition_predicate
. getmsg ("for alias \" "+ alias +" \ Table \ "")
+ tab.gettablename () + "\"));
}
}
These three kinds of cases in the case of large amount of data will result in the generation of inefficient Mr Job, affecting execution time and efficiency, but directly throw exception and feel too forcefully.
You can open strict mode, such as Hiveweb, and operating tools in AD-HOC queries on some non online production environments.
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/