Hive itself supports very limited subqueries. Hive does not support subqueries in the where clause and only allows subqueries to appear in the from clause.
Incorrect syntax:
Insert into Table branch_atm zc_sumselect xt_op_trl, sa_tx_dt, "withdrawal-deposit", B. cr_tx_amt-. cr_tx_amt as cr_tx_amt from branch_atm ZC a join branch_atm zc B on (. xt_op_trl = B. xt_op_trl and. sa_tx_dt = B. sa_tx_dt and. tran_cd = 'atm deposit' and B. tran_cd = 'atm withdrawal '), counts from branch_atm zcgroup by xt_op_trl, sa_tx_dt, cr_tx_amt, counts;
Correct syntax:
Insert into Table branch_atm zc_sumselect. xt_op_trl,. sa_tx_dt, "withdrawal-deposit", B. cr_tx_amt-. cr_tx_amt, B. counts +. counts from branch_atm ZC a join branch_atm zc B on (. xt_op_trl = B. xt_op_trl and. sa_tx_dt = B. sa_tx_dt and. tran_cd = 'atm deposit' and B. tran_cd = 'atm withdrawal ')
Another one: efficiency considerations
Selectt1. product type, count (distinct (if (T2. user ID is null, null, T1. user ID) as keep_uvfrom (select product type, user idfrom fact table where ('date'> = 20140201 and 'date' <= 20140228) t1left Outer Join (select product type, user idfrom fact table where ('date'> = 20140101 and 'date' <= 20140131) T2 on (T1. product type = T2. product type and T1. user id = T2. user ID) group by T1. product type
The table contains many fields with a long time span. This is an efficient implementation of in/exists subqueries (specifically, this is a non-correlated subquery), that is, left semi join:
Left semi join implements the uncorrelated in/exists subquery semantics in an efficient way.
Left join
Select Product type, count (distinct T1. user ID) as keep_uvfrom (select product type, user idfrom fact table where ('date'> = 20140201 and 'date' <= 20140228 )) t1left semi join (select product type, user idfrom fact table where ('date'> = 20140101 and 'date' <= 20140131 )) t2 on (T1. product type = T2. product type and T1. user id = T2. user ID) group by product type
Hive: subquery