Oracle Advanced query over (partition by...), advanced query ..
To facilitate learning and testing, all examples are created under Scott, an Oracle user.
create table EMP( empno NUMBER(4) not null, ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2))alter table EMP add constraint PK_EMP primary key (EMPNO);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7369, 'SMITH', 'CLERK', 7902, to_date('17-12-1980', 'dd-mm-yyyy'), 800, null, 20);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7499, 'ALLEN', 'SALESMAN', 7698, to_date('20-02-1981', 'dd-mm-yyyy'), 1600, 300, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7521, 'WARD', 'SALESMAN', 7698, to_date('22-02-1981', 'dd-mm-yyyy'), 1250, 500, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7566, 'JONES', 'MANAGER', 7839, to_date('02-04-1981', 'dd-mm-yyyy'), 2975, null, 20);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7654, 'MARTIN', 'SALESMAN', 7698, to_date('28-09-1981', 'dd-mm-yyyy'), 1250, 1400, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7698, 'BLAKE', 'MANAGER', 7839, to_date('01-05-1981', 'dd-mm-yyyy'), 2850, null, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7782, 'CLARK', 'MANAGER', 7839, to_date('09-06-1981', 'dd-mm-yyyy'), 2450, null, 10);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7788, 'SCOTT', 'ANALYST', 7566, to_date('19-04-1987', 'dd-mm-yyyy'), 3000, null, 20);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7839, 'KING', 'PRESIDENT', null, to_date('17-11-1981', 'dd-mm-yyyy'), 5000, null, 10);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7844, 'TURNER', 'SALESMAN', 7698, to_date('08-09-1981', 'dd-mm-yyyy'), 1500, 0, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7876, 'ADAMS', 'CLERK', 7788, to_date('23-05-1987', 'dd-mm-yyyy'), 1100, null, 20);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7900, 'JAMES', 'CLERK', 7698, to_date('03-12-1981', 'dd-mm-yyyy'), 950, null, 30);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7902, 'FORD', 'ANALYST', 7566, to_date('03-12-1981', 'dd-mm-yyyy'), 3000, null, 20);insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (7934, 'MILLER', 'CLERK', 7782, to_date('23-01-1982', 'dd-mm-yyyy'), 1300, null, 10);
Note:The red order by in the title indicates that order by must be included when this method is used.
1. rank ()/dense_rank () over (partition by... order ...)
Now the customer has such a requirement to query the information of the employees with the highest salaries in each department. I believe that students with certain oracle application knowledge can write the following SQL statement:
Select * from (select ename, job, hiredate, e. sal, e. deptno from emp e, (select deptno, max (sal) sal from emp group by deptno) t where e. deptno = t. deptno and e. sal = t. sal) order by deptno; select * from (select ename name, job occupation, hiredate entry date, e. sal salary, e. deptno department from emp e, (select deptno, max (sal) sal from emp group by deptno) t where e. deptno = t. deptno and e. sal = t. sal) order by department;
While meeting the customer's needs, we should habitually think about whether there are other methods. In this section, the rank () over (partition by...) or dense_rank () over (partition by...) syntax is used. The SQL statements are as follows:
select empno, ename, job, hiredate, sal, deptno from (select empno, ename, job, hiredate, sal, deptno, rank() over(partition by deptno order by sal desc) r from emp) where r = 1; select empno, ename, job, hiredate, sal, deptno from (select empno, ename, job, hiredate, sal, deptno, dense_rank() over(partition by deptno order by sal desc) r from emp) where r = 1
Why is the same result as the preceding statement? Here we will explain in detail the rank ()/dense_rank () over (partition by e. deptno order by e. sal desc) syntax.
Over: the conditions above which.
Partition by e. deptno: partition by department ID ).
Order by e. sal desc: sort by salary from high to low (order by must be included when rank ()/dense_rank () is used; otherwise, order by is invalid)
Rank ()/dense_rank (): Classification
The entire statement indicates that employees are classified based on their salaries from high to low Based on the Division by department, the "level" is represented by a number ranging from small to large (the minimum value must be 1 ).
So what is the difference between rank () and dense_rank?
Rank ():Skip sorting. If there are two first-level instances, the next step is Level 3.
Dense_rank ():Continuous sorting. If there are two first-level instances, the next step is the second-level instances.
Small job: Query Information about the minimum wage of a department.
2. min ()/max () over (partition ...)
Now we have obtained the highest/minimum wage for the department, and the customer demand is coming again. While querying the employee information, we can calculate the difference between the highest/minimum wage for the Department and the employee salary. This is relatively simple. Modify the groupby statement in Section 1 as follows:
-- Query the information of each employee and calculate the difference between the employee's salary and the highest/lowest employee's salary in the department
Select ename name, job occupation, hiredate entry date, e. deptno, e. sal salary, e. minimum sal-me.min_sal difference, me. maximum difference of max_sal-e.sal from emp e, (select deptno, min (sal) min_sal, max (sal) max_sal from emp group by deptno) me where e. deptno = me. deptno order by e. deptno, e. sal;
We have used min () and max (). The former is used to calculate the minimum value, and the latter is used to calculate the maximum value. What will happen if the two methods are used together with over (partition? Let's take a look at the following SQL statement:
Select ename name, job occupation, hiredate employment date, deptno department, min (sal) over (partition by deptno) Department minimum wage, max (sal) over (partition by deptno) highest salary of the Department from emp order by deptno, sal; select ename name, job occupation, hiredate employment date, deptno department, nvl (sal-min (sal) over (partition by deptno ), 0) Minimum wage difference of the department, nvl (max (sal) over (partition by deptno)-sal, 0) maximum wage difference of the Department from emp order by deptno, sal;
The query results of these two statements are the same. We can see that min () and max () are actually the minimum and maximum values, but only on the basis of partition by partition.
Small job: If order by is added in this example, what result will be obtained?
3. lead ()/lag () over (partition by... order ...)
The Chinese love phabi, good face, world-famous. This is even better for the customer. After comparison with the highest/minimum wage, the customer still finds that he is not addicted. This time, he proposed a more abnormal demand, calculate the difference between an individual's salary and a higher/lower employee's salary. This requirement makes me very embarrassed, and I don't know how to implement it in the groupby statement. But .... Now we have over (partition by...), and everything looks so simple. As follows:
-- Calculate the difference between an individual's salary and a higher/lower employee's salary
Select ename name, job occupation, sal salary, deptno department, lead (sal, 1, 0) over (partition by deptno order by sal) is the first department higher than your own salary, lag (sal, 1, 0) over (partition by deptno order by sal) after a department with lower salaries than itself, nvl (lead (sal) over (partition by deptno order by sal)-sal, 0) the previous difference in the Department that is higher than your own salary, nvl (sal-lag (sal) over (partition by deptno order by sal), 0) The last difference between the departments with higher salaries from emp;
After reading the above statements, do you also feel shocked )? Let's explain the two new methods used above.
Lead (column name, n, m): the value of the <column Name> record in the nth row after the current record. If not, the default value is m. If n, m, query the value of the record <column Name> In the first row after the current record. If no value exists, the default value is null.
Lag (column name, n, m): the value of the <column Name> record in the nth row before the current record. If no value exists, the default value is m. If n, m, query the value of the record <column Name> In the first row of the current record. If no value exists, the default value is null.
The following lists some common methods used in this syntax (Note: The method with the order by clause indicates that the order by clause must be used.):
Select ename name, job occupation, sal salary, deptno department, first_value (sal) over (partition by deptno) first_sal, last_value (sal) over (partition by deptno) last_sal, sum (sal) over (partition by deptno) Department total salary, avg (sal) over (partition by deptno) Department average salary, count (1) over (partition by deptno) total departments, row_number () over (partition by deptno order by sal) No. from emp;
Important: after reading this article, you may have some misunderstandings, that is, OVER (partition ..) it is better than group by. Actually, this is not the case. The former cannot replace the latter, and the former does not have a higher execution efficiency. It only provides more functions, therefore, we hope that you can choose based on your needs.
From http://blog.csdn.net/fu0208/article/details/7179001