Two tables available: Sales lists the ID and name of the item listed for the id;things of the person and the item they purchased:
Hive> Select * fromSales;okjoe2Hank4Ali0Eve3Hank2Time taken:0.085Seconds, fetched:5row (s) Hive> Select * fromThings;ok2Tie4Coat3Hat1Scarftime taken:0.069Seconds, fetched:4Row (s)
1. Internal connection
Hive supports only equivalent connections , which means that only the equals sign can be used in an expression after the on keyword.
Hive> SelectSales.*, things.* > fromSalesJOINThings on(sales.id=things.id); Joe2 2Tiehank4 4Coateve3 3Hathank2 2Tie
You can also specify join conditions in the WHERE clause.
Hive> SelectSales.*, things.* > fromsales,things> whereSales.id=Things.id;okjoe2 2Tiehank4 4Coateve3 3Hathank2 2Tie
A single connection is implemented with an MR job . However, if the same columns are used in the connection conditions for multiple connections, the average connection can be implemented at least one MR job.
You can use the Explain keyword before a query to see how many MR jobs hive will use for a query: " This section is detailed later "
Hive> explain >Select Sales. *, things. * > from Join on (Sales.id=things.id);
2. External connection
An outer join allows you to find rows of data that cannot be matched in the join table .
In front of the inner connection, the "Ali" line does not appear in the output. Because the ID of the item she purchased did not appear in the things table.
left OUTER join: You can display all the data rows of the table on the left side:t_name1 OUTER JOIN t_name2 on ()
Hive> SelectSales.*, things.* > fromSales Left OUTER JOINThings on(sales.id=things.id); Okjoe2 2Tiehank4 4Coatali0 NULL NULLEve3 3Hathank2 2Tietime taken:13.387Seconds, fetched:5Row (s)
Right outer connection :t_name1 OUTER JOIN t_name2 on ()
Hive> SelectSales.*, things.* > fromSales Right OUTER JOINThings on(sales.id=things.id); Okjoe2 2Tiehank2 2Tiehank4 4Coateve3 3HatNULL NULL 1Scarftime taken:14.54Seconds, fetched:5Row (s)
All-out connection :t_name1 full OUTER JOIN t_name2 on ()
Hive> SelectSales.*, things.* > fromSales Full OUTER JOINThings on(sales.id=things.id); Okali0 NULL NULLNULL NULL 1Scarfhank2 2Tiejoe2 2Tieeve3 3Hathank4 4Coattime taken:44.671Seconds, fetched:6Row (s)
Half connection :t_name1 left SEMI JOIN t_name2 on ()
Hive> Select * fromthings> whereThings.idinch >(SelectId fromsales); OK2Tie4Coat3Hattime taken:15.633Seconds, fetched:3Row (s)
This in query can be converted to a semi-join query :
The left half join query must follow a restriction: the right table (sales) can only appear in the ON clause. For example, you cannot reference the right table in a select expression.
Hive>Select* from things >leftJOIN on (sales.id=things.id); OK2 Tie4 Coat 3 13.169 3 Row (s)
Hive Connection Query