Topic
The Employee
table holds all employees. Every employee has an ID, a salary, and there are also a column for the department ID.
+----+-------+--------+--------------+| Id | Name | Salary | DepartmentID |+----+-------+--------+--------------+| 1 | Joe | 70000 | 1 | | 2 | Henry | 80000 | 2 | | 3 | Sam | 60000 | 2 | | 4 | Max | 90000 | 1 |+----+-------+--------+--------------+
The Department
table holds all departments of the company.
+----+----------+| Id | Name |+----+----------+| 1 | IT | | 2 | Sales |+----+----------+
Write a SQL query to find employees who has the highest salary in each of the departments. For the above tables, Max have the highest salary in the IT department and Henry have the highest salary in the Sales depart ment.
+------------+----------+--------+| Department | Employee | Salary |+------------+----------+--------+| IT | Max | 90000 | | Sales | Henry | 80000 |+------------+----------+--------+
Successively wrote 5, 6 versions, the efficiency is different, pick out the typical 5, to analyze the SQL statement optimization
1.runtime:1539 ms
Select as
As
As
from Join
on =
where inch
(selectmaxfromgroup by DepartmentID);
2.runtime:1204 ms
SelectDepartment.name asDepartment,
Employee.Name asEmployee,
Employee.salary asSalary
fromDepartmentJoinEmployee
onDepartment.id=Employee.departmentid
where(Department.id, Employee.salary)inch
(SelectDepartmentID, Salary
from(Select * fromEmployeeOrder bySalarydesc) Q
Group byDepartmentID);
3.runtime:1399 ms
SelectA.name asDepartment,
B.name asEmployee,
B.salary asSalary
fromDepartment AJoinEmployee b
ona.ID=B.departmentid
where exists(Select 1 from(Select * fromEmployeeOrder bySalarydesc) c
Group byDepartmentID
havinga.ID=C.departmentid andB.salary= Max(c.salary));
4.runtime:980 ms
SelectA.name asDepartment,
B.name asEmployee,
B.salary asSalary
from(Department AJoinEmployee b ona.ID=B.departmentid)Join
(SelectC.departmentid,Max(c.salary) asSalary from(Select * fromEmployeeOrder bySalarydesc) cGroup byDepartmentID) d
ona.ID=D.departmentid andB.salary=D.salary;
5.runtime:957 ms
SelectA.name asDepartment,
B.name asEmployee,
B.salary asSalary
from(Department a straight_join Employee b ona.ID=B.departmentid) Straight_join
(SelectC.departmentid,Max(c.salary) asSalary from(Select * fromEmployeeOrder bySalarydesc) cGroup byC.departmentid) d
ona.ID=D.departmentid andB.salary=D.salary;
Summarize
- 1 compared to 2, aggregate function max() is less efficient than nested subqueries
- 2 compared with 3, in and exists efficiency is similar, then on-line check is:
1, in and not in also to use caution, otherwise it will cause full table scan
2, a lot of times with exists instead of in is a good choice
However, by following the optimization, you can see that in is actually quite slow
- Comparing 3 with 4, 4 uses join on instead of where to judge, the efficiency is improved a lot, then a read the MySQL source of the Great God said:
In MySQL's SELECT query, the core algorithm is the JOIN query algorithm. Other query statements are aligned to join: a single-table query is treated as a special case of a join, and a subquery is converted to a join query as much as possible
- 4 vs. 5, 5 replaces join for straight_join , or source God says:
For multi-table queries, it is recommended to add a straight_join clause to reduce the optimizer's process of reordering the table if it can be determined that the table is processed in a fixed order for better efficiency.
This clause can be used for the optimizer not to give the optimal array of SQL statements, on the other hand, the same applies to the optimizer can give the optimal arrangement of SQL statements, because MySQL calculates the optimal arrangement also requires a lengthy process.
For the latter situation, you can select the order of the tables according to EXPLAIN's prompts, and add the Straight_join clause to fix the order. The use of this condition is that the proportion of data between several tables will remain in a certain order, otherwise the table data will be counterproductive after this elimination.
For frequently invoked SQL statements, this method works better, and the more tables you manipulate, the better the effect.
Postscript
At this point, the optimization is not completely finished, Leetcode on the topic the fastest is 813MS, but did not share the code, finally posted two other people's home code:
Join Twice,890ms Accepted
SELECTName, Employee, Salary fromDepartmentJOIN(SELECTEmployee.Name asEmployee, employee.salary, Employee.departmentid fromEmployeeJOIN(SELECT' DepartmentID ',MAX(' Salary ') asSalary from' Employee 'GROUP by' DepartmentID ') T1 onT1. DepartmentID=Employee.departmentid andT1. Salary=employee.salary) T2 onDepartment.id=T2. DepartmentID
Easy solution. No joins. GROUP by IS enough. 916ms
SelectD.name, e.name, E.salary fromDepartment d,employee E, (Select MAX(Salary) asSalary, DepartmentID asDepartmentID fromEmployeeGROUP bydepartmentid) Hwheree.salary=H.salary andE.departmentid=H.departmentid andE.departmentid=D.id;
Leetcode-184-department highest Salary optimization record