Today in the test when writing hive SQL, notice that we put the order by behind the query, and if there is no field, an error is executed.
He will report the following mistake. In fact, at this time we think about whether it is wrong to write. But take a closer look. There seems to be no mistake.
At this point we may wonder if there is a problem with the order in which the statements are executed in hive. So I add the fields after the order by in the back of the previous select:
Sure enough, after that, there is no problem with execution. This is the time to summarize the order in which the hive statements are executed.
The order in which queries are executed in the EXECUTE statement of Hive:
This is a sql:
Select ... from ... .... GROUP by ... having ... order by .....
Execution order:
From ... select ... Group by ... ... have ... order by ..... ...
In fact, the execution order of the summary hive is also a summary of the execution sequence of MapReduce:
The execution order of the MR Program:
Map phase:
1. Perform from load, make table lookup and load
2. Performing a where filter to filter and filter conditions
3. Executing a SELECT query: Filtering for output items
4. Perform GROUP BY grouping: Describes the functions that need to be computed after grouping
5.map file Merge: Map side local Overflow Write file merge operation, each map eventually formed a temporary file. Then map to the corresponding Reducereduce stage by column:
Reduce phase:
1.group by: Group and calculate the data sent over the map side.
2.select: Last filter column for output results
3.limit sorting results output to HDFs file
So by the above example we can see that after SELECTT we will form a table and do the sorting in this table.
The order in which SQL is executed in hive.