Brief description of Oracle analysis functions, multidimensional functions, and model functions

Source: Internet
Author: User
Tags commit count min range first row

The following code has been tested to run directly

Oracle analysis functions, multidimensional functions and model functions are briefly described, mainly for the BI report statistics, not necessarily very comprehensive, but a little explanation of the BI scenario

--Create a Sales Quantity table, the data trend is incremental

CREATE TABLE computersales as
SELECT
 120+trunc (RN/12) +round (dbms_random. VALUE (1,10)) Salesnumber
  from
  (
    SELECT level,rownum rn-
      DUAL
   CONNECT by rownum<=120
  ) ;

--The following is used to compare the statistics of NULL values and Non-null values to see that there is a problem with the count of null values, so it is recommended that you do not use null values columns in your database system

SELECT Count (*), COUNT (A.salesnumber), COUNT (DISTINCT a.salesnumber), SUM (A.salesnumber), AVG (A.salesnumber),
MAX (A.salesnumber), MIN (A.salesnumber) from Computersales A;
DELETE from Computersales WHERE Salesnumber is NULL;
COMMIT;
INSERT into Computersales VALUES (NULL);
COMMIT;
INSERT into Computersales VALUES (NULL);
COMMIT;
  SELECT Count (*), COUNT (A.salesnumber), COUNT (DISTINCT a.salesnumber), SUM (A.salesnumber), AVG (A.salesnumber),
MAX (A.salesnumber), MIN (A.salesnumber) from Computersales A;
SELECT trunc (Dbms_random.value (1,101)), DELETE from Computersales WHERE Salesnumber is NULL;
COMMIT; --Create a table with the added date field Computersalesbak as SELECT salesnumber,trunc (sysdate) +mod (a.dateseq-1,10) salesdate from (SE
Lect Salesnumber,row_number () over (order by ROWID) Dateseq from Computersales) A;
DROP TABLE Computersales;
    
RENAME Computersalesbak to Computersales; --Here are two ways to create, recruit area columns and date columns create TABLE Computersalesbak as SELECT salesnumber,trunC (sysdate) +mod (a.dateseq-1,24) salesdate, Case when TRUNC ((DateSEQ-1)/24) =1 THEN ' South China ' when TRUNC (Dat eSEQ-1) =2 THEN '/24 ' When TRUNC ((DateSEQ-1)/24) =3 THEN ' The northeastern region ' when TRUNC ((DateSEQ-1)/24) =4 TH EN ' East China ' else ' other region ' end from (SELECT Salesnumber,row_number () ROWID) Dateseq from Com
Putersales) A;
DROP TABLE Computersales;
    
RENAME Computersalesbak to Computersales; --This example constructs the Salesdate and area duplicate data CREATE TABLE Computersalesbak as SELECT salesnumber, TRUNC (sysdate) +mod (a.dateseq-1,10) Salesdate, when Areaseq=1 THEN ' South China ' when areaseq=2 THEN ' in North China ' when Areaseq=3 THEN ' Northeast ' when Areaseq=4 THEN ' East China ' else ' other area ' end from (SELECT Salesnumber,row_number () OV ER (Order by ROWID) Dateseq,round (dbms_random.
VALUE (1,5)) Areaseq from Computersales) A;
DROP TABLE Computersales;
     
    
RENAME Computersalesbak to Computersales; --moving averages, accumulating sums,The current window average, the current window sum, and the scope of window functions and sort functions SELECT area,salesdate,salesnumber, MIN (Salesnumber) over (PARTITION by region order by Sa Lesdate) as Min_area_salesdate, MAX (Salesnumber) over (PARTITION by area order by Salesdate) as Max_area_salesdate, AV G (Salesnumber) over (PARTITION by the salesdate) as Avg_area_salesdate, SUM (Salesnumber) over (PARTITION by
  Area order by Salesdate) as Sum_area_salesdate, COUNT (*) over (PARTITION by area order by Salesdate) as Count_area, MIN (Salesnumber) over (PARTITION by area) as Min_area, MAX (Salesnumber) over (PARTITION to area) as Max_area, AVG (Sale Snumber) over (PARTITION by area) as Avg_area, SUM (Salesnumber) over (PARTITION to area) as Sum_area, COUNT (*) OVE R (PARTITION by area) as Count_area from Computersales--observe Rank, Dense_rank,row_number,count difference--rank jump number, Dense_rank do not jump  Number, Row_number unique, count also jumps by statistic--if partition by and order by fields are unique, then these four functions make no difference. SELECT Area,salesdate,salesnumber, RANK () Over (PARTITION by areaOrder by Salesnumber) as Rank_area_salesnumber, Dense_rank () over (PARTITION by the area order by Salesnumber) as Denserank_ Area_salesnumber, Row_number () over (PARTITION by area order by Salesnumber) as Rownumber_area_salesnumber, COUNT (*) O VER (PARTITION by Salesnumber) as Countall_area_salesnumber, COUNT (Salesnumber) over (PARTITION by area or Der by Salesnumber) as Count_area_salesnumber from Computersales--observe similarities and differences between lag and lead, and lag parameters--By default lag takes the value of the previous line, and the lead is taken A row's value--lag, the first parameter of the lead determines the position of the row, the second parameter is the default value when the value is not fetched SELECT Area,salesdate,salesnumber, Lag (Salesnumber) over (PARTITION by A REA ORDER by Salesdate) as Lag_area_salesnumber, leads (Salesnumber) over (PARTITION by area order by Salesdate) as Lead_ Area_salesnumber, LAG (salesnumber,1) over (PARTITION by the area order by Salesdate) as Lag1_area_salesnumber, LAG (Sale snumber,2) over (PARTITION by salesdate) as Lag2_area_salesnumber, leads (salesnumber,1) over (PARTITION by Area ORDER by Salesdate) As Lead1_area_salesnumber, lead (salesnumber,2) over (PARTITION by the area order by Salesdate) as Lead2_area_salesnumber, LAG (salesnumber,1,0) over (PARTITION by the area order by Salesdate) as Lag10_area_salesnumber, LAG (salesnumber,2,1) over ( PARTITION by Salesdate) as Lag21_area_salesnumber, leads (salesnumber,1,0) over (PARTITION by Salesdate) as Lead10_area_salesnumber, lead (salesnumber,2,1) over (PARTITION by area order by Salesdate) as Lead21_area_ Salesnumber from Computersales--Observe the difference between First_value and Last_value--if you take a column that corresponds to the maximum minimum value in the same group, use First_value and arrange it in ascending order--last_ VALUE is somewhat like the last line of SELECT Area,salesdate,salesnumber, First_value (salesdate) over for a two-time grouping (PARTITION by-area order by Salesnu mber) as Firstvalue_area, First_value (salesdate) over (PARTITION by area order by Salesnumber DESC) as Firstvalue_area_ Desc, Last_value (salesdate) over (PARTITION by area order by Salesnumber) as Lastvalue_area, Last_value (salesdate) Over (PARTITION by area ORder by Salesnumber DESC) as Lastvalue_area_desc from Computersales--unlike above, keep needs and dense_rank a | Dense_rank last is used in conjunction with the largest or smallest value that is obtained by Salesnumber in the same area, with the above just the first row or the final line of SELECT Area,salesdate,salesnumber, Dense_ RANK () over (PARTITION by Salesnumber) Dense_rank, MIN (salesdate) KEEP (Dense_rank-I by Salesnumb ER) over (PARTITION by area) Min_first, min (salesdate) KEEP (Dense_rank Last order by Salesnumber) over (PARTITION by Min_last, Max (salesdate) KEEP (Dense_rank A/Salesnumber) over (PARTITION to area) Max_first, Max (SALESDA TE) KEEP (Dense_rank) over (PARTITION through area) Max_last from Computersales--cume_dist and Perce Nt_rank almost, are cumulative calculation ratio, except that the calculation benchmark is different, cume_dist more in line with the general practice--ntile the data flat into several parts, more suitable for the calculation of the value of four points--ratio_to_report, is to find the current value in the proportion of the partition, and cannot be combined with order by using--percentile_disc and Percentile_cont, is the given proportional parameters of the corresponding value, the general use of Percentile_disc can SELECT area,salesdate, Salesnumber, ROUND (Cume_dist () over (PARTITION byLesnumber), 2 cume_dist, ROUND (Percent_rank () over (PARTITION by area order by Salesnumber), 2 Percent_rank, ROUND (Rati O_to_report (Salesnumber) over (PARTITION by area), 2 Ratio_to_report, Ntile (4) over (PARTITION by R) Ntile, Percentile_disc (0.7) WITHIN GROUP (order by Salesnumber) over (PARTITION to area) Percentile_disc, percentile _cont (0.7) WITHIN GROUP (order by Salesnumber) over (PARTITION to area) Percentile_cont from Computersales--adds a column called Sales , you can perform related statistics CREATE TABLE Computersalesbak as SELECT salesnumber, ROUND (salesnumber*10+5*dbms_random. VALUE (1,10)) Salesvalue, TRUNC (sysdate) +mod (a.dateseq-1,24) salesdate, Case when TRUNC ((DateSEQ-1)/24) =1 the
            N ' South China ' when TRUNC ((DateSEQ-1)/24) =2 THEN ' North China ' when TRUNC ((DateSEQ-1)/24) =3 THEN ' ne ' When TRUNC ((DateSEQ-1)/24) =4 THEN ' East China ' else ' other region ' end zone from (SELECT Salesnumber,row_numbe R () over (order by ROWID) Dateseq from COmputersales) A;
DROP TABLE Computersales;
RENAME Computersalesbak to Computersales;
    
SELECT * from Computersales; Other statistics, students who have research on mathematical analysis can try their economic meaning SELECT Area,salesdate,salesvalue,salesnumber, Regr_slope (Salesvalue,salesnumber) Over (PARTITION by salesdate) "Slope", regr_intercept (Salesvalue,salesnumber) over (PARTITION by salesdate) "Intercept", REGR_R2 (Salesvalue,salesnumber) over (PARTITION by the Order by salesdate) "Regression coefficient", REGR_AVGX (Sales Value,salesnumber) over (PARTITION by salesdate) "The average value of the regression line", Regr_avgy (Salesvalue,salesnumber) over (parti tion by Salesdate) "The average value of the regression line", Var_pop (Salesvalue) over (PARTITION by the Order by Salesdate) "Var_pop_ should Variable ", Var_pop (Salesnumber) over (PARTITION by the area order by Salesdate)" var_pop_ arguments ", Covar_pop (Salesvalue,salesnumber  Over (PARTITION by the Salesdate) "Covar_pop", Regr_sxx (Salesvalue,salesnumber) over (PARTITION by ORDER by Salesdate) "Regr_sxx",--Regr_count (EXPR1, EXPR2) * Var_pop (EXPR2) regr_syy (Salesvalue,salesnumber) over (PARTITION by area order by Salesdate) "  Regr_sxy ",--regr_count (EXPR1, EXPR2) * Var_pop (EXPR1) regr_sxy (Salesvalue,salesnumber) over (PARTITION by salesdate) "Regr_sxy",--regr_count (EXPR1, EXPR2) * Covar_pop (EXPR1, EXPR2) Regr_count (Salesvalue,salesnumber) over ( PARTITION by the Salesdate) "Regr_count" from Computersales--the issue of the chain by date--is troublesome because the date days are not fixed--from computer Sales randomly delete a few lines again to test SELECT Area,salesdate,salesnumber, LAG (Salesnumber) over (PARTITION by the area order by Salesdate) as Lag_err Or,-in the case of a broken number, causes the data to be denied SUM (Salesnumber) over (PARTITION by the salesdate RANGE BETWEEN 1 preceding and 1 Precedin G) Yesterday,--Yesterday's value SUM (Salesnumber) over (PARTITION by area order by Salesdate RANGE BETWEEN 6 preceding and 6 preced ING) Lastweek,--Last week's data SUM (Salesnumber) over (PARTITION by: salesdate RANGE BETWEEN 6 preceding and 0 prece DING) LAST7_ACCU,--7 days ago tired, including the same day SUM (Salesnumber) over (PARTITION by the salesdate RANGE BETWEEN preceding and 0 preceding) Last30_ac Cumulative 30 days prior to cu--, including from Computersales--Add another product product column to facilitate the cube function demo CREATE TABLE Computersalesbak as SELECT salesn Umber, ROUND (salesnumber*10+5*dbms_random. VALUE (1,10)) Salesvalue, TRUNC (sysdate) +mod (a.dateseq-1,24) salesdate, Case when TRUNC ((DateSEQ-1)/24) =1 the
            N ' South China ' when TRUNC ((DateSEQ-1)/24) =2 THEN ' North China ' when TRUNC ((DateSEQ-1)/24) =3 THEN ' ne ' When TRUNC ((DateSEQ-1)/24) =4 THEN ' East China ' else ' other region ' end areas, case when ROUND (Dbms_random. VALUE (1,3)) =1 THEN ' product a ' when ROUND (Dbms_random. VALUE (1,3)) =2 THEN ' product B ' ELSE ' products C ' end product from (SELECT Salesnumber,row_number () (Orde
R by ROWID) Dateseq from Computersales) A;
DROP TABLE Computersales;
RENAME Computersalesbak to Computersales;
    
SELECT * from Computersales; --the traditional group by languageMethod SELECT Product,area,salesdate,sum (Salesnumber), SUM (Salesvalue) from Computersales GROUP by Product,area,salesdate OR DER by Product,area,salesdate--rollup (Group field Order)--is automatically stratified by Group field, similar to the daily report SELECT Product,area,salesdate,sum (Sa Lesnumber), SUM (Salesvalue) from Computersales GROUP by ROLLUP (product,area,salesdate) Order by product,area,salesdate  ---plus no, it's automatically sorted by Group field--equivalent to select * FROM (select Product,area,salesdate,sum (salesnumber) salesnumber,sum (Salesvalue) Salesvalue--Maximum grouping from Computersales GROUP by product,area,salesdate UNION all SELECT product,area,null,sum (salesnum ber), SUM (Salesvalue)--by product, region group from Computersales Group by Product,area,null UNION all SELECT product,null,null,sum (S Alesnumber), SUM (Salesvalue)--by product group from Computersales GROUP by Product,null,null UNION all SELECT null,null,null,sum (                                         Salesnumber), SUM (salesvalue)--Statistic sum from Computersales GROUP by Null,null,null) Order by 1,2,3 --and finallySort--cube (Group field Order), similar to OLAP, gets the intersection of all dimensions-automatically statistics by Group field Arrangement Group SELECT product,area,salesdate,sum (Salesnumbe R), SUM (Salesvalue) from Computersales GROUP by CUBE (product,area,salesdate) Order by Product,area,salesdate-Plus, is Automatically sorted by Group field--two--that is, Rollup C (3,1) is more than 3 layers--in order of Product,area,salesdate;product,area;product;all The statistical level of cube is 2 of n times, that is, all ordered combinations--according to Product,area,salesdate; Product,area; Product,salesdate; Product; Area,salesdate; area; Salesdate; The order of all is statistically-equivalent to the rollup expression, equivalent to the rollup permutation combination select * FROM (select Product,area,salesdate,sum (Salesnumber), SUM ( Salesvalue)--first press product,area,salesdate ROLLUP from Computersales GROUP by ROLLUP (product,area,salesdate) UNION SELECT P Roduct,null,salesdate,sum (Salesnumber), SUM (Salesvalue)--and then press Product,salesdate to find rollup from Computersales GROUP by ROLLUP (product,null,salesdate) UNION SELECT null,area,salesdate,sum (salesnumber), SUM (Salesvalue)--and then by area, Salesdate ROLLUP from Computersales GROUP by ROLLUP (null,area,salesdate) UNION SELECT null,null,salesdate,sum (salesnumber), SUM (salesvalue)--finally press Salesdate to find rollup from Computersales GROUP by ROLL Up (null,null,salesdate)) The order by 1,2,3--grouping sets is equivalent to separate statistics by three columns and is generally not commonly used for SELECT product,area,salesdate,sum (salesnu mber), SUM (Salesvalue) from Computersales GROUP by GROUPING SETS (product,area,salesdate) Order by Product,area,salesdat e;--plus no, it's automatically sorted by Group field-equivalent to select * FROM (select Product,null area,null salesdate,sum (salesnumber), SUM (Salesvalue)--- Product group from Computersales GROUP by Product,null,null UNION all SELECT null,area,null,sum (Salesnumber), SUM (Salesvalue)-- Group BY region from Computersales Group by Null,area,null UNION all SELECT null,null,salesdate,sum (salesnumber) salesnumber,sum (Salesvalue) Salesvalue--Group by date from Computersales GROUP by null,null,salesdate the order by 1,2,3--grouping function takes only one argument, the parameter is a datasheet of a column.
Returns 0 if the column returns 1 in empty.
--and it can only be used with GROUP by,rollup,cube,grouping SETS. --a little bit of a run, it is found that the function is only used to do the BI report, the statistical row into 1, in the future as a string replacement SELECT GROUPING (Product), product,grouping (area), area,grouping (salesdate), Salesdate,sum (Salesnumber), SUM (Salesvalue) from
Computersales GROUP by ROLLUP (product,area,salesdate) Order by Product,area,salesdate; --BI Standard report Format SELECT DECODE (productflag,1, ' Product summary ', product), DECODE (areaflag,1, ' Region Summary ', area), DECODE (salesdateflag,1, ' 
  Date Rollup ', To_char (salesdate, ' yyyy-mm-dd ')), Salesnumber,salesvalue from (SELECT GROUPING (product) Productflag, product, GROUPING (area) Areaflag,area, GROUPING (salesdate) salesdateflag,salesdate, SUM (salesnumber) salesnumber,sum (Salesva Lue) Salesvalue from Computersales GROUP by ROLLUP (product,area,salesdate) Order by Product,area,salesdate)--gr
       OUPING_ID is actually similar to the GROUPING principle, the GROUPING parameter is a single value, and only returns 1,1--grouping_id, then returns the value of the null area accumulated by the index of 2 to SELECT Product,area,salesdate,  grouping_id (product,area,salesdate) GROUPING421, grouping_id (Product,area) GROUPPING21, grouping_id (Product) GROUPING1, SUM (salesnumber), sum (salesvalue) from ComputersaLes GROUP by ROLLUP (Product,area,salesdate) ordered by Product,area,salesdate;--plus no, it's automatically sorted by Group field--group_id function can distinguish between heavy
The result of the complex grouping, the 1th time appears as 0, after each occurrence increases 1.
--GROUP_ID alone agreed to appear insignificant in the Select, often used in the having to achieve the purpose of filtering the duplication of statistics. SELECT product,area,salesdate,group_id (), sum (salesnumber), sum (salesvalue) from Computersales GROUP by CUBE (Prod Uct,area), the CUBE (product,salesdate) has group_id () =0 ORDER by 1,2,3-for example, in the case of Product,area and Product, respectively, Salesdate will result in product area, product time of the repeated calculation, resulting in the report is not clear-we use having group_id () =0 the repeated calculation of the row to remove the OK-generally do not recommend the report program over the grouping, otherwise to the end even their own confused the--group
By,rollup,cube can be used in combination, but the grouping fields in the select must appear in the key of the relevant field--model:model statement of GROUP BY.
--dimension the meaning of the by:dimension dimension, you can understand the index of an array, you must.
--measures: Specifies the column as an array--rules: A description of the various operations of the group. For the time being, we have not figured out how to apply, but simply implemented an example of last month, the first 30 days, 7 days, the first 1 days, SELECT Area,product,salesdate,salesnumber, Avg30day,avg1month,--The last 30 days
       Average, the average of the last one months accu30day,accu1month,--the cumulative value of the last 30 days, the cumulative value of the last one months salesnumber1day,salesnumber7day,--yesterday's sales, a week ago's sales Salesnumber30day,salesnumber1month--30 daysSales, last month's same day sales from Computersales MODEL DIMENSION by (area,product,salesdate) MEASURES (salesnumber,0 avg30day,0 Avg1mo nth,0 accu30day,0 accu1month,0 salesnumber1day,0 salesnumber7day,0 salesnumber30day,0 SALESNUMBER1MONTH) RULES UPDATE ( Avg30day[any,any,any]=avg (Salesnumber) [CV (), CV (), Salesdate BETWEEN CV (salesdate) -29 and CV (salesdate)], avg1month[
  Any,any,any]=avg (Salesnumber) [CV (), CV (), Salesdate BETWEEN add_months (CV (salesdate), -1) and CV (salesdate)], Accu30day[any,any,any]=sum (Salesnumber) [CV (), CV (), Salesdate BETWEEN CV (salesdate) -30 and CV (salesdate)], Accu1month
  [Any,any,any]=sum (Salesnumber) [CV (), CV (), Salesdate BETWEEN add_months (CV (salesdate), -1) and CV (salesdate)],
  Salesnumber1day[any,any,any]=max (Salesnumber) [CV (), CV (), Salesdate BETWEEN CV (salesdate)-1 and CV (salesdate)-1],
  Salesnumber7day[any,any,any]=max (Salesnumber) [CV (), CV (), Salesdate BETWEEN CV (salesdate)-7 and CV (Salesdate)-7], Salesnumber30day[any,any,any]=max (Salesnumber) [CV (), CV (), Salesdate BETWEEN CV(salesdate) -30 and CV (salesdate) -30], Salesnumber1month[any,any,any]=max (salesnumber) [CV (), CV (), Salesdate BETWEEN CV (salesdate) -30 and CV (salesdate) -30]) Order by 1,2,3 about issues that may occur in chronological statistics CREATE TABLE TEST (salesmonth
VARCHAR (6), Salesnumber INT);
INSERT into TEST VALUES (' 201002 ', 2);
INSERT into TEST VALUES (' 201004 ', 4);
INSERT into TEST VALUES (' 201007 ', 7);
INSERT into TEST VALUES (' 201008 ', 8);
    
INSERT into TEST VALUES (' 201010 ', 10); SELECT Salesmonth,salesnumber, LAG (Salesnumber) over (order by Salesmonth) as Lag10_area_salesnumber, which, if broken, causes data to be denied SUM (Salesnumber) Over (order by to_date (salesmonth| | ') ", ' YYYYMMDD ') RANGE BETWEEN 1 preceding and 1 preceding) from TEST

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.