Iv. statistical functions and Group queries
1. Statistical functions
Before learning a count () function, this function can be counted as the amount of data in the table, in fact this is a statistical function, and the commonly used statistical functions are as follows:
COUNT (): Query the data record in the table;
AVG (): averages are calculated;
SUM (): sum;
Max (): Find the maximum value;
Min (): Find the minimum value;
Example: test count (), AVG (), SUM ()
All employees of the company are counted, the average wage and the total wage paid each month.
SELECT MAX (SAL), MIN (SAL) from EMP;
Note point: about the count () function
The main function of the count () function is to perform statistics on the data, but in the case of data statistics, if there is no statistical record in a table, count () returns the data, except that the data is "0".
SELECT COUNT (ename) from BONUS;
If you are using a different function, it is possible to return null, but count () will always return a specific number, which will be used later in development.
2. Group Query
Before you explain the grouping operation, you must first make a clear point and under what circumstances it might be grouped, for example:
All employees of the company require a group of men, a group of women, after which the number of males and females can be counted;
Grouped according to age, divided into groups of 18 years of age and under 18 years of age;
Grouped by region: Beijing, a group of people in Shanghai, a group of Sichuan;
If all of this information is stored in the database, there must be duplicate content on one of the columns of the data, for example, when sex is grouped by gender, gender is definitely duplicated (male and female), grouped by age (with a range of repetitions), and the information in a region is duplicated according to the regional groupings.
So there's an unwritten rule in groups: when data repeats, grouping makes sense, because a person can also be a group (meaningless).
SELECT [DISTINCT] *| Group field 1 [aliases] [, Group field 2 [aliases],...] | Statistics function from table name [alias], [table name [alias],...] [WHERE condition (s)] [GROUP by Group Field 1 [, Group Field 2,...]] [Order by sort field ASC | DESC [, sort field ASC | DESC]];
Example: GROUP by department number, find the number of each department, average wage
SELECT Deptno, COUNT (empno), AVG (SAL) from Empgroup by Deptno;
Example: find the highest and lowest wages for each position, grouped by position
SELECT job, MAX (SAL), MIN (SAL) from Empgroup by job;
But now, once grouped, there are actually new restrictions on syntax, and there are the following requirements for grouping:
The grouping functions are used alone: |
SELECT COUNT (empno) from EMP; |
Incorrect use, other fields appear: |
SELECT Empno,count (empno) from EMP; |
Correct practice: |
SELECT Job,count (empno), AVG (SAL) From EMP GROUP by Job; |
The wrong approach: |
SELECT Deptno,job,count (empno), AVG (SAL) From EMP GROUP by Job; |
Example: According to the position group, the highest average wage is counted
1. Calculate the average salary of each position first
SELECT Job,avg (SAL) from the empgroup by job;
2. Wages with the highest average wage
SELECT MAX (AVG (SAL)) from Empgroup by job;
Example: Find out the name of each department, the number of departments, the average wage
1. Determine the required data sheet:
Dept Table: Name of each department;
EMP Table: Statistics of the number of departments, the average wage;
2, determine the known correlation field: Emp.deptno=dept.deptno;
Example: associating data from a dept table with an EMP table
SELECT D.dname,e.empno,e.salfrom Dept D, EMP Ewhere D.deptno=e.deptno;
Dname EMPNO sal----------------------------- -----ACCOUNTING 7782 2450ACCOUNTING &N Bsp 7839 5000ACCOUNTING 7934 1300RESEARCH 7369 800research 7 876 1100RESEARCH 7902 3000RESEARCH &NB Sp 7788 3000RESEARCH 7566 & nbsp 2975SALES 7499 1600SALES 7698 2850SALES &NB Sp 7654 1250SALES 7900 950sales 7844 1500SALES & nbsp 7521 1250 has selected 14 lines.
At this point in the query results, you can find that the Dname field shows duplicate data, according to the previous understanding of the grouping, as long as the data is duplicated, it is possible to group the query operation, but at this time with the previous grouping is not the same, the previous grouping is for a single entity table Group (EMP, Dept all belong to the entity table), but for the above data is displayed through the query results, so is a temporary virtual table, but whether it is an entity table or virtual table, as long as there are duplicates, then directly grouped.
SELECT D.dname,count (E.empno), AVG (e.sal) from Dept D, EMP Ewhere d.deptno=e.deptnogroup by D.dname;
However, this grouping is not appropriate because there are four departments in the Department (since the Dept table has been introduced, the Dept table has four departments of information), so you should change the results of the query through the left and right connections.
SELECT D.dname,count (E.empno), NVL (AVG (E.sal), 0) from Dept D, EMP Ewhere D.deptno=e.deptno (+) GROUP by D.dname;
All previous operations were grouped by needles for a single field, and multi-field groupings could also be implemented in a grouping operation.
Example: required to display the number, name, location, number of departments, average wage for each department
1. Determine the required data sheet:
Dept Table: Name of each department;
EMP Table: Statistics of the number of departments, the average wage;
2, determine the known correlation field: Emp.deptno=dept.deptno;
Example: associating an EMP table with a Dept table query
SELECT D.deptno,d.dname,d.loc,e.empno,e.salfrom Dept d,emp ewhere D.deptno=e.deptno (+);
deptno dname loc em PNO sal--------------------------------------------------------- &NBSP , ACCOUNTING NEW YORK 7782 2450 &NBSP ; 10 ACCOUNTING NEW YORK 7839 5000 10 ACCOUNTING NEW YORK 7934 &NBSP ; 1300  20, DALLAS & nbsp;7369 800 20, DALLAS 7876 1100  20 i &NBS P DALLAS &NBSp 7902  20 the &NBS P DALLAS 7788  20 resear CH DALLAS 7566 2975 &NBSP ; 30 SALES chicago 7499 &NBS P 30 SALES chicago &NBS P 7698 2850 30 SALES chicago 7654 1250 30 SALES &NBSP ; chicago 7900 950 &N Bsp;30SALES chicago 7844 1500 & nbsp 30 SALES chicago 7521 &nbs P 1250 40 OPERATIONS Boston 15 rows have been selected.
Duplicate data is present, and the duplicated data is averaged over three columns (DEPTNO,DNAME,LOC), so you can write three fields in the GROUP BY clause on the grouping:
SELECT D.deptno,d.dname,d.loc,count (E.empno), NVL (AVG (E.sal), 0) from Dept d,emp Ewhere D.deptno=e.deptno (+) GROUP by D.deptno,d.dname,d.loc;
The above is a multi-field grouping, but whether it is a single field or multiple fields, there must be a premise, there is duplicate data.
Example: requirements for detailed information on each department are required and the average wage for these departments is higher than 2000;
On the basis of the above-mentioned program, the only syntax for qualifying queries that have previously been learned is the WHERE clause, so use where to complete the requirements first.
SELECT D.deptno,d.dname,d.loc,count (e.empno) MYCOUNT,NVL (AVG (E.sal), 0) Myavgfrom Dept d,emp Ewhere D.deptno=e.deptno ( +) and AVG (E.sal) >2000group by D.deptno,d.dname,d.loc;
Now, the following error message appears:
WHERE D.deptno=e.deptno (+) and AVG (e.sal) >2000 * 3rd row error: ORA-00934: Grouping functions are not allowed here
The central meaning of this error message is that statistical functions cannot be used within the WHERE clause, and are not used in the WHERE clause, but are actually related to the main function of the WHERE clause, where the main function is to extract part of the data from all the data.
At this point, if you want to filter the grouped data again, use the HAVING clause, then the SQL syntax format is as follows:
SELECT [DISTINCT] *| Group field 1 [aliases] [, Group field 2 [aliases],...] | Statistics function from table name [alias], [table name [alias],...] [WHERE condition (s)] [GROUP by Group Field 1 [, Group Field 2,...]] [After having a packet filter condition (can use statistical function)] [Order by sort field ASC | DESC [, sort field ASC | DESC]];
The following is filtered using having.
SELECT D.deptno,d.dname,d.loc,count (e.empno) MYCOUNT,NVL (AVG (E.sal), 0) Myavgfrom Dept d,emp Ewhere D.deptno=e.deptno ( +) GROUP by d.deptno,d.dname,d.lochaving AVG (SAL) >2000;
Note the point: The difference between where and having
-
Where: is the filter performed before the group by operation, which means that the data is filtered out of all the data and the statistical function cannot be used in where;
-
Having: is the filter after the group by group, you can use the statistical function in the HAVING clause;
Study Questions: displays the non-salesperson job name and the sum of the monthly wages for the same job employee, And to meet the total monthly wage of employees engaged in the same job is greater than $5000, the output is listed in ascending order of monthly wages:
First step: querying all non-salesperson information
SELECT * from emp WHERE job<> ' salesman ';
Second Step: Group by position and use SUM function statistics
SELECT Job,sum (SAL) from the Empwhere job<> ' salesman ' GROUP by job;
Step Three: The total of the monthly wage is queried by the statistical function, so now the filter after grouping is done with a HAVING clause
SELECT Job,sum (sal) from Empwhere job<> ' salesman ' GROUP by Jobhaving SUM (SAL) >5000;
Fourth Step: in ascending order
SELECT job,sum (sal) sumfrom empwhere job<> ' salesman ' GROUP by Jobhaving SUM (SAL) > 5000ORDER by sum ASC;
The above topics dissolve the use of most of the syntax for grouping operations, and later encounter problems, to be analyzed slowly.
Five, subquery
Sub-query = Simple Query + Limited query + multi-table query + statistical query complex;
Before emphasizing that too many table queries are not recommended, because the performance is very poor, but the most advantageous alternative to multi-table query is the subquery, so the sub-query in the actual development of the use of quite a lot;
The so-called sub-query refers to a query nested within a number of other queries, nested subqueries after the query SQL statement is as follows:
SELECT [DISTINCT] *| Group field 1 [aliases] [, Group field 2 [aliases],...] | Statistical functions, ( select [DISTINCT] *| Group field 1 [aliases] [, Group field 2 [aliases],...] | Statistical functions &NBSP;FROM table names [aliases], [table name [Alias],...] [where conditions (s)] [GROUP by Group Field 1 [, Group Field 2,...]] [having after grouping filter (can use statistical function)] [ Order by sort Field ASC | DESC [, sort field ASC | DESC]]) from table name [alias], [table name [alias],...], ( select [DISTINCT] *| Group Field 1 [alias] [, Group field 2 [alias],...] | statistical function & nbsp from table name [alias], [table name [alias],...] [where condition (s)] [GROUP by Group Field 1 [, Group Field 2,...]] &NBSP ; [After having a packet filter condition (can use statistical function)] [Order by sort field ASC | DESC [, sort field ASC | DESC]]) [WHERE condition (s) ( select [DISTINCT] *| Group field 1 [aliases] [, Group field 2 [aliases],...] | Statistical functions from Table name [alias], [table name [alias],...] [where condition (s)] [GROUP by Group Field 1 [, Group Field 2,...] [having Group Filter after (can use statistical function)] [ORDER by sort field ASC | DESC [, sort field ASC | DESC]])][group by Group Field 1 [, Group wordParagraph 2,...]] [After having a packet filter condition (can use statistical function)] [Order by sort field ASC | DESC [, sort field ASC | DESC]];
In theory, a subquery can appear anywhere in a query statement, but from a personal point of view, a subquery appears more in the where and from clauses;
The following features are used for personal summary, not official statements:
WHERE: Subqueries generally return only single rows, multiple rows, single row and multiple columns of data;
From: A subquery Returns a data that is typically more than one row and appears as a temporary table.
Example: asking for full employee information that is higher than Smith's salary
To complete this procedure, you must first know what Smith's salary is:
SELECT sal from emp WHERE ename= ' SMITH ';
Because a single column of data is returned at this point, this clause query can appear in the Where.
SELECT * FROM Empwhere sal> (select Salfrom empwhere ename= ' SMITH ');
Example: asking for full employee information that is higher than the company's average salary
The company's average salary should be calculated using the AVG () function.
SELECT AVG (SAL) from EMP;
The returned result of the data is a single row of data, which appears in the Where.
SELECT * FROM Empwhere sal> (select AVG (SAL) from EMP);
The above returns a single-row column, but in a subquery, you can also return a single row of multiple columns of data, but this subquery rarely occurs.
Example: subquery returns single-row multicolumn data
SELECT * from Empwhere (job,sal) = (select Job,salfrom empwhere ename= ' ALLEN ');
If the subquery now returns multiple rows of single-row data, it will need to be judged using three judgments: in, any, and all;
1, in operator: used to specify a subquery's scope of judgment
The use of this operator is actually the same as in the previously explained in, the only difference is that the scope of the inside is specified by the subquery.
SELECT * from Empwhere sal in (select Salfrom empwhere job= ' MANAGER ');
However, when using in, you should also pay attention to the problem of not in, if you use not in operation, in the subquery, if there is a content is null, will not query out any results.
2, any operator: with each content want to match, there are three kinds of matching form
SELECT * FROM Empwhere sal=any (select Salfrom empwhere job= ' MANAGER ');
SELECT * FROM Empwhere sal>any (select Salfrom empwhere job= ' MANAGER ');
SELECT * FROM Empwhere sal<any (select Salfrom empwhere job= ' MANAGER ');
3, all operator: match each content, there are two kinds of matching form:
SELECT * FROM Empwhere sal>all (select Salfrom empwhere job= ' MANAGER ');
SELECT * FROM Empwhere sal<all (select Salfrom empwhere job= ' MANAGER ');
All of the subqueries above appear in the WHERE clause, then the query that appears in the FROM clause is then observed, and this subquery typically returns multiple rows and columns of data as a temporary table.
Example: querying the number, name, location, number of departments, average wage for each department
SELECT D.deptno,d.dname,d.loc,count (E.empno), AVG (e.sal) from the EMP e,dept dwhere e.deptno (+) =d.deptnogroup by D.deptno, D.dname,d.loc;
This time in fact produced a Cartesian product, altogether produced 56 records;
New Solution: through subquery completion, all statistical queries can only appear in the group by, so in the subquery responsible for statistical data, and in the external query, is responsible for the statistical Data and Dept table Data Unified.
Select D.deptno,d.dname,d.loc,temp.count,temp.avgfrom Dept D, (select Deptno dno,count (empno) count,avg (SAL) avgfrom Empgroup by Deptno) Tempwhere D.deptno=temp.dno (+);
The amount of data that is currently being manipulated in the program:
Sub-query statistics records are 14 records, the final statistical display results are 3 records;
A total of 4 records in the Dept table;
If there is a Cartesian product now, there are only 12 records, plus 14 records of employees, only 26 records;
With the above analysis, it can be found that the use of subqueries is indeed more efficient than using multi-table queries, so the development of the neutron query appears to be the most, and is given an unwritten rule: In most cases, if the final query results in the need to appear in the SELECT clause, But it is not possible to use statistical functions directly, in the sub-query statistics, that is: there are complex statistics where most of the sub-query needs.
Oracle Multi-table queries (2)