Oracle analytic function __c language

Source: Internet
Author: User

--Start

Speaking of Oracle analysis functions, it can be described in a very good and powerful sense. This feature is especially useful for a variety of statistical queries that are difficult to implement in normal SQL or are not implemented at all. First, let's start with a simple example, step by step to uncover its mystery, look at the following sql:

CREATE TABLE employ
(
    name    VARCHAR2)--  name
    DEPT    VARCHAR2 (a)--  Department
    SALARY  number         -wages
);

INSERT into employ VALUES (' John ', ' marketing department ', 4000);
INSERT into employ VALUES (' Zhaohong ', ' technical department ',);
INSERT into employ VALUES (' Dick ', ' marketing department ', 5000);
INSERT into employ VALUES (' Li Bai ', ' technology department ', 5000);
INSERT into employ VALUES (' Harry ', ' marketing department ', NULL);
INSERT into employ VALUES (' Wang Lan ', ' technology department ', 4000);

SELECT
    row_number () over (ordered by SALARY) as ordinal,
    name                               as name,
    DEPT                               as department,
    SALARY                             as Wages from
employ;
 
The results of the query are as follows:
 
ordinal       name       Department      salary
1          Zhaohong       Technical Department
2          John       Marketing Department    4000
3          Wang Lan       Technical Department    4000
4          Dick       Marketing Department    5000
5          Li Bai       Technical Department    5000
6          Harry       Marketing Department    (NULL)

See the above row_number () over (). A lot of people don't understand how two functions can write that. There are even doubts that the above SQL statement is really capable of execution. In fact, Row_number is a function yes, its role from its name can also be seen, is to the query result set number. However, over is not a function, but an analytic statement, its role is to define a scope (or can be said to be the result set), over the previous function only for over-defined result set function. How, do not understand. It doesn't matter, we'll explain it in detail later.

From the above SQL, we can see that the typical Oracle online analysis processing format includes two parts: the function section and the Over Analysis Statement section . So, what functions can the function part have? As follows:

Row_number The query result set line number RANK to the query result set choreography name Dense_rank to the query result set name min seek minimum value max Find the maximum AVG find the mean sum sum COUNT find the result set row number First_value find the minimum value LAS                	T_value to find the maximum value of the minimum, with the Dense_rank using the last maximum value, with Dense_rank use LAG            	Offset lead up offset listagg connection column ntile The Nth_value returns the value of nth row variance Variance Var_pop Population Variance Var_samp sample Variance StdDev standard deviation Stddev_pop general standard deviation Stdde V_samp sample standard deviation CORR covariance covar_pop total covariance Covar_samp sample covariance cume_dist calculation integral A continuous distribution model of distributed Percent_rank and cume_dist similar Percentile_cont computed values percentile_disc the discontinuous distribution model of calculated values Ratio_to_report meter          	Calculation ratio regr_slope linear regression regr_intercept linear regression regr_count linear regression regr_r2 linear regression REGR_AVGX        Linear regression Regr_avgy  	Linear regression regr_sxx linear regression regr_syy linear regression regr_sxy linear regression
 

The function of these functions, I will be introduced in the back step-by-step, you can guess according to the function name function.

Suppose I want to add a query to the average wage of the staff and the average salary of all the employees without changing the results of the above statement query. It is difficult to query with the usual SQL, but the parse function is very simple, as the following SQL shows:

SELECT row_number () over (ordered by DEPT, SALARY) as Ordinal, row_number () over (PARTITION by DEPT ORDER by SA                                                 Lary) As department number, name as name, DEPT As department, SALARY as payroll, AVG (SALARY) over (PARTITION by DEP
 
 
T) As department average wage, AVG (SALARY) over () as full average wage from employ;       The results of the query are as follows: Ordinal department ordinal name Department salary department average wage total salary 1 1 Zhang three marketing department 4000        4500 4000 2 2 Dick Marketing Department 5000 4500 4000 3 3 Harry            Marketing department (NULL) 4500 4000 4 1 Zhaohong Technical Department 2000 3666.67 4000 5       2 Wang Blue Technical Department 4000 3666.67 4000 6 3 Li Bai Technical Department 5000 3666.67 4000

Please note the difference between the serial number and the department number, we query the department number, the over expression in more than two clauses, respectively, are PARTITION byAnd ORDER BY。 What is their role. Before introducing their role, let's review the effect of over, remember.

Over is an analytic statement whose function is to define a scope (or a result set) that functions over the previous function only for the result set defined by over.

The role of order by is well known and used to sort the result set. The effect of PARTITION by is also very simple, and the same as group by, used to group the result set.

So far, we should have a certain understanding of the routines of analytic functions and experience it. Let's take a look at the result set on the SQL above and find out that Harry's salary is null, and when we sort by salary, NULL is put in the end and we want to put NULL in the front. Use the nulls-i keyword, the default is nulls last, see the following sql:

SELECT
    row_number () over (SALARY DESC NULLS i)    as RN,
    RANK () over (order by SALARY DESC, NULLS As          RK,
    Dense_rank () over (order by SALARY DESC NULLS a) as    D_rk,
    name                                                   as name, DEPT as                                                   Department ,
    SALARY as                                                 wages from
employ;
 
The results of the query are as follows:
 
RN  RK   d_rk     name       Department       salary
1     1     1     Harry       marketing Department    (NULL)
2     2     2     Dick       marketing Department    5000
3     2     2     Li Bai       Technical Department    5000
4     4     3     John       Marketing Department    4000
5     4     3     Wang Lan       Technical Department    4000
6     6 4 Zhaohong       Technology Department    2000

Notice the difference between row_number and rank, rank, rank, Dick and Li Bai's wages are 5000, they are ranked second. John and Wang LAN wages are 4000, how rank function ranking is four, and Dense_rank ranking is the third. This is exactly the difference between the two functions. Because there are two second names, the rank function does not have a third name by default.

Now there's a new problem, assuming you have a look at each employee's salary and salary less than all of his employees ' average wage. How. I didn't catch the question. It doesn't matter, please look at the following sql:

SELECT name as name, SALARY As Payroll, SUM (SALARY) over (order by SALARY NULLS A/I BETWEEN UN Bounded preceding and current ROW) as less than the total amount of my salary, sum (SALARY) over (order by SALARY NULLS 
              OWS BETWEEN Current ROW and unbounded following) as greater than the total amount of my salary, sum (SALARY) over (order by SALARY NULLS                                                  ROWS BETWEEN unbounded preceding and unbounded following) as gross wages 1, SUM (SALARY) over ()
 
As gross wages 2 from employ;                 The results of the query are as follows: Name salary is less than the total amount of my salary is greater than the total wages of 1 gross salary 2 Harry (NULL) (NULL) 20000 20000 20000 Zhaohong 2000 2000 20000 20000 20000 sheets 3,400 0 6000 18000 20000 20000 King Blue 4000 10000 14000 20000 20000 Lee 45,000 15000 10000 20000 20000 Li Bai 5000 20000 5000 20000 20000

There is a rows clause in the over section of SQL above, so let's take a look at the structure of the rows clause:

ROWS BETWEEN < upper condition > and < lower bound condition >
 
where "upper bound conditions" can be the following keywords:
unbounded preceding
<number>  Preceding current
row
 
"offline conditions" can be the following keywords: current
row
<number> following
unbounded following

Note that the above keywords are relative to the current line, unbounded preceding represents all the rows preceding the current line, that is, there is no upper bound;<number> preceding represents the beginning of the current line to the <number> line before it. For example, number=2, which represents the 2 lines preceding the current line, and current row represents the line. As for the other two keywords, I don't think you should know about it. If you do not understand, please carefully analyze the above SQL query results.

Over analysis statements can also have a clause, that is range, it is used in a way that is very similar to rows, or exactly the same, the effect is much worse not, but a little different, as follows:

RANGE BETWEEN < Upper bound condition >and < lower bound condition >

The < top conditions >, < lower bound conditions > and rows are identical, as the following SQL demonstrates the difference between them:


DELETE from employ;  
INSERT into employ VALUES (' John ', ' marketing department ', 2000);  
INSERT into employ VALUES (' Zhaohong ', ' technology department ', 2400);  
INSERT into employ VALUES (' Dick ', ' marketing department ', 3000);  
INSERT into employ VALUES (' Li Bai ', ' technology department ', 3200);  
INSERT into employ VALUES (' Harry ', ' marketing department ', 4000);   
  
INSERT into employ VALUES (' Wang Lan ', ' technology department ', 5000);                                                                              SELECT name as name, DEPT                                                                            As department, SALARY  As wages, First_value (SALARY IGNORE NULLS) over (PARTITION by DEPT) as Departmental minimum wage, nth_value (SALARY, 2) over (PARTITION by DEPT) as division's penultimate wage, Last_value (SALARY RESP ECT NULLS) Over (PARTITION by DEPT) as Department of the highest wage, SUM (SALARY) over (order by SALARY ROWS BETWEEN 1 Precedi NG and 1 following) as "ROWS", SUM (SALARY) over (the order by SALARY RANGE isTWEEN preceding and following) as "RANGE" from employ; The results of the query are as follows: Name Department Payroll department Minimum Wage Department bottom second wage department maximum wage ROWS RANGE John Market 2000 2000 3               000 4000 4400 4400 Zhao Red Technical Department 2400 3200 5000 2400    7400 4400 Lee Four Marketing department 3000 2000 3000 4000 8600 6200 Li Bai Technical Department 3200               3200 5000 2400 10200 6200 Harry Marketing Department 4000 2000 3000 4000 12200 4000 Wang LAN Technology Department 5000 3200 5000 2400 9000 500 0

The function of the range clause above is to define a wage range that is the upper limit of the current line's salary-500, and the lower limit is the current line salary +500. For example: Dick's salary is 3000, so the upper limit is 3000-500=2500, the lower limit is 3000+500=3500, then who has the salary in 2500-3500 this range. Only Dick and Li Bai, so the value of the range column is 3000 (Dick) +3200 (li Bai) = 6200. The above is the difference between rows and range.

The above SQL also uses the First_value,nth_value and Last_value three functions, which are also very simple to find over the minimum value of the definition set, the value of nth row and the maximum value. It is worth noting that these two functions have a keyword, IGNORE NULLS or respect NULLS, which function as their names, to ignore null values and to consider null values.

There are two more functions we did not introduce,LAG and leads, the functions of these two functions are very powerful, please look at the following sql:

SELECT name as name, SALARY As wages, LAG (salary,0) over (order by SALARY) as LAG0, LAG (SALARY) over (order by S Alary) as LAG1, LAG (salary,2) over (order by SALARY) as LAG2, LAG (SALAR y,3, 0) IGNORE NULLS over (order by SALARY) as LAG3, LAG (salary,4,-1) Respect NULLS over (order by SALARY) as L
 
AG4, Lead (SALARY) over (order by SALARY) as leads from employ;   The results of the query are as follows: Name and salary LAG0 LAG1 LAG2 LAG3 LAG4 lead Zhang 32,000 (null) (null) 0-1 2400 Zhao Hong 2400 2400 (NULL) 0-1 3000 Lee 430     00 3000 2400 2000 0-1 3200 Li Bai 3200 3200 3000 2400-2000     -1 4000 King 54,000 4000 32003000 2400 2000 5000 Wang LAN 5000 5000 4000 3200 3000 2400 (null) 

Let's take a look at the declarations of the lag and lead functions as follows:

LAG (expression or field, offset, default value) IGNORE NULLS or respect NULLS
Lag is the downward offset, the lead is offset upward, we look at the above SQL query results at a glance.

So far, all the knowledge about Oracle Analytics functions is introduced, and we'll revisit the components of the Oracle analysis function as follows:

Profiling functions Over (PARTITIONBY clause ORDER BY clause rows or range clauses)

It takes a certain amount of time and practice to master this knowledge, and once you have mastered it, you will have a magnificent martial arts that can be made in Oracle.

-- more see: Oracle SQL Extract

-- statement: Reprint please indicate the source

--Last edited on 2015-02-28

--Created by Shangbo on 2014-12-19

--End

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.