In order to explain the problem in the simplest way, I have specially designed a table like this one.
first, GROUP by single-value rules
Rule 1: Single-value rules, followed by the list after select, must be returned and only one value returned for each grouping.
The typical performance is the column following the select, which must appear after the GROUP BY clause if no aggregate function is used.
As the following query error:
Because after grouping by department, there are 3 numbers in the technical part group, there are 2 numbers in the Sales section group, which do you let the database show?
If you assume that you use the aggregate function after count (number), for each department group, there is only one value-the number of people under that department:
Under the actual situation below, we hope to find out the name, department, and salary of each department, the highest wage of the person.
Shit, a frustrated. The first combat is wrong, let's analyze it.
Obviously, the name column above does not conform to the single-value rule. Our wishful thinking is that after MAX (payroll), SQL Server will automatically help us return ' names ' that do not conform to the single-value rule. Unfortunately, SQL Server did not do so. The reasons are as follows:
- If two people pay the same wages, then which person should be the name returned?
- If we are not using the max () aggregate function, but the aggregate functions such as SUM, AVG, and so on (there is no wage matching it), which is the name returned?
- If you use two aggregate functions in a query statement, such as Max (), MIN (). So should I return the name of Max's salary or the name of min salary?
In summary, the database is not likely to be able to be based on our input of an aggregate function, it helps us to judge and show that does not conform to the single-valued rules of the column.
For MySQL, when there is a column that does not conform to a single-value rule, the default is the first record that returns this set of results. And SQLite is returning to the last article.
Therefore, for the above query, we need to find a different solution.
Solution 1: Associate Subqueries
SELECT from as T1 WHERE not EXISTS (SELECTNULLfrom asWHERE=and> T1. Salary)
The output is as follows:
Fully meet the requirements. For the associated subquery above, it can be understood as:
Traverse all records of the payroll table to find records that do not have the same salary as the current record department.
Although the syntax for correlating subqueries is simple, performance is not good. Because the subquery is executed once for each record.
Solution 2: Derived tables
The idea of using a derived table is to execute a subquery first, get a temporary result set, and then use the temporary result set and the original table for the inner JOIN operation. Information about the person who can get the highest wage.
When I first wrote this SQL statement, I thought it was wonderful, and I thought it was wonderful after understanding it.
from payroll as T1 join ( max (payroll) as maximum --execute Query, record two fields department-maximum wage
group by Department) as T2 on T1. Department = T2. Department = highest
The way the derived table performs better than the associated subquery, because the way the derived table executes only one subquery at a time. But it needs a temporary table to store the temporary records. Therefore, this solution is not the best solution.
Solution 3: Use Join + is NULL
This is a better solution, when we use an outer join to match the record, when the matching record does not exist, it will replace the corresponding column with NULL.
Let's start by looking at a very simple SQL statement:
What do you see from it? Null is returned when there is no record in the T2 table that is higher than the salary in the T1 table.
So, then, is it null to solve the problem?
Good, wonderful method, let a person simply astounding use outer JOIN.
The join solution works for large data queries and scalable comparisons. It is always better suited to variable data volumes than a subquery-based solution.
Solution 4: Use aggregate functions for additional columns
We know that when GROUP by, the select list must return a single value, can we use an aggregate function to get this column to return a single value? The answer is yes.
In fact, the returned data is problematic, and when the salary is the same, it returns the first name arranged by name from the big to the small. In other words, when the salary is the same, it can only return a record.
Let's change the aggregation function to a min look.
Solution 5:row_number () + Over
with as ( SELECTthroughORDER by as part, score, Name , Createtime from xxx) SELECT*fromWHERE =1
The output is as follows:
second, having the understanding
The difference between where and having:
- Where (pre-group filtering): where the aggregate function column cannot be filtered because the grouping has not been executed while executing where, and the aggregate function has not been executed.
- Having (post-grouping filtering): Mainly used to filter the aggregate function columns, because there is an actual grouping followed by execution. The HAVING clause can only be used with the GROUP BY clause. You cannot use having when there is no GROUP BY clause.
Example of error using where:
Use the Where and have example correctly:
Query anti-pattern-GroupBy, having the understanding