@ Oracle Aggregate functions/analysis functions

Source: Internet
Author: User
References: Expert one-on-one and Oracle 9i reference

Oracle functions are classified into two types: single-row functions and multi-row functions. Multiline functions are also Aggregate functions and composite functions. The parameter is an array and the data size is the number of records. This array is not an array of common advanced languages, but a virtual array, when the number of records is large, data is written to the hard disk, and only images are stored in the memory.

Oracle provides analysis functions starting from 8.1.6 to calculate a group-based aggregate value. Unlike Aggregate functions, each group returns multiple rows. Aggregate functions return only one row for each group.
Window Function: Specifies the data window size used by the analysis function. The data size varies with the number of data rows, as shown in the following example:
Order by salary is a default window function.
Over (partition by deptno) by Department Partition
Over (order by salary range between 50 preceding and 150 following) the data window corresponding to each row is that the range value of the previous row does not exceed 50, and the range value of the subsequent row does not exceed 150
Over (order by salary rows between 50 preceding and 150 following) the data window corresponding to each row is the first 50 rows, followed by 150 rows
Over (order by salary rows between unbounded preceding and unbounded following) the data window corresponding to each row is from the first row to the last row, which is equivalent to over (order by salary range between unbounded preceding)

AVG function description, used to calculate the average value of expressions in a group and data window.
Sample: Select AVG (salary) over (partition by manager_id order by hire_date rows between 1 preceding and 1 following) as avg_salary from employee.

Corr returns the correlation coefficient of a pair of expressions. Abbreviations:
Covar_pop (expr1, expr2)/stddev_pop (expr1) * stddev_pop (expr2). Statistically speaking, correlation is the intensity of association between variables, the association between variables means that, to a certain extent, the value of a variable can be predicted by other variable values, and a-1 ~ The correlation coefficient shows the correlation strength. 0 indicates no correlation.

Covar_pop returns the population covariance of a pair of expressions.

Covar_samp, returns the sample covariance of a pair of expressions.

Count to accumulate events in the group. If * or some non-null constants are specified, Count counts all rows. If an expression is specified, Count returns the Count of non-null values in the expression. When the same value exists, these equal values are included in the calculated values. You can use distinct to record the number of rows that appear after removing identical data in a group.

Cume_dist: calculates the relative position of a row in the group. Cume_dist always returns a number greater than 0, less than or equal to 1, which indicates the position of the row in N rows.

Dense_rank: calculates the relative positions of each row returned by the query and other rows based on the value of the expression in the order by clause. The data in the group is ordered by the order by clause, and each row is assigned a number to form a sequence. The sequence starts from 1 and accumulates later. This sequence also increases each time the value of the order by expression changes. Rows with the same value obtain the same number (equivalent when null is considered ). The dense sequence returns no interval.

First, extract the row with the first value from the set returned by dense_rank (multiple rows may exist because the values may be equal ), therefore, the complete syntax needs to add a set function at the beginning to retrieve records.

First_value: returns the first value of the data window in the group.

Lag, which can access other rows in the result set without self-connection. It allows processing of the cursor, just as if the cursor is an array. In a given group, you can refer to the rows before the current row, so that you can select the previous row from the group with the current row. Offset is a positive integer. The default value is 1. If the index is out of the window range, the default value is returned (the first row in the group is returned by default). The opposite function is lead.

Last, extract the rows with the last value from the set returned by dense_rank (multiple rows may be taken because the values may be equal ), therefore, the complete syntax needs to add a set function at the beginning to retrieve records.

Last_value: returns the last value of the data window in the group.

Lead and lead are opposite to lag. Lead can access the rows following the current row in the group. Offset is a positive integer. The default value is 1. If the index is out of the window range, the default value is returned (the first row in the group is returned by default ).

MAX: Find the maximum value of the expression in the data window of a group.

Min to find the minimum value of the expression in the data window of a group.

Ntile, which is represented by a hash of "expression". For example, if expression = 4, a number (from 1 to 4) is allocated to each row in the group ), if the group contains 20 rows, 1 is allocated to the first five rows, 2 is allocated to the next five rows, and so on. If the base number of a group cannot be evenly divided by the expression value, when these rows are allocated, no number of percentile rows in the group exceeds the number of other percentile rows, the lowest percentile is the percentile with additional rows. For example, if the expression is 4 and the number of rows is 21, the percentile = 1 has five rows, and The percentile = 2 has five rows.

Percent_rank, which is similar to the cume_dist (cumulative allocation) function. For a given row in a group, when calculating the sequence number of that row, first minus 1, divide by N-1 (n is the number of all rows in the group ). This function always returns 0 ~ The number between 1 and 1.

Percentile_rank returns the data value corresponding to the input distribution percentage value. For the calculation method of the distribution percentage, see the percent_rank function. If there is no corresponding data value, the following algorithm is used to obtain the value:
Rn = 1 + (p * (N-1) where p is the percentage of input distribution, n is the number of rows in the group
CRN = Ceil (RN) FRN = floor (RN)
If (CRN = FRN = rn) then
(Value of expression from row at RN)
Else
(CRN-Rn) * (value of expression for row at FRN) +
(Rn-FRN) * (value of expression for row at CRN)
Note: The difference between this function and percentile_disc is that when the corresponding distribution value cannot be found, the calculation method of the returned alternative value is different.

Percentile_disc returns a data value that corresponds to the input distribution percentage value. For the calculation method of the distribution percentage, see the cume_dist function. If there is no corresponding data value, take the next value greater than the distribution value.
Note: The difference between this function and percentile_cont is that when the corresponding distribution value cannot be found, the calculation method of the returned alternative value is different.

Rank: calculates the relative positions of each row returned from the query based on the value of the expression in the order by clause. The data in the group is ordered by the order by clause, and each row is assigned a number to form a sequence. The sequence starts from 1 and accumulates later. This sequence also increases each time the value of the order by expression changes. Rows with the same value obtain the same number (equivalent when null is considered ). However, if the two rows are indeed sorted in the same order, the ordinal number will jump with the latter. If the ordinal number of the two rows is 1, there is no ordinal number 2. The sequence will assign a value of 3 to the next row in the group, and dense_rank will not skip.

Ratio_to_report: This function calculates the value of expression/(sum (expression), which gives the percentage relative to the total number, that is, the contribution of the current row to sum (expression.

Regr _ (linear regression) Functions
Function Description: these linear regression functions are suitable for regression with the least square method. Nine Different regression functions are available.
Regr_slope: Return slope, equal to covar_pop (expr1, expr2)/var_pop (expr2)
Regr_intercept: returns the y intercept of the regression line, equal
AVG (expr1)-regr_slope (expr1, expr2) * AVG (expr2)
Regr_count: returns the number of non-empty number pairs used to fill the regression line.
Regr_r2: returns the regression coefficient. The formula is:
If var_pop (expr2) = 0 then return null
If var_pop (expr1) = 0 and var_pop (expr2 )! = 0 then return 1
If var_pop (expr1)> 0 and var_pop (expr2! = 0 then
Return power (Corr (expr1, expr), 2)
Regr_avgx: calculates the average value of the regression independent variable (expr2), removes the null pair (expr1, expr2), and equals to AVG (expr2)
Regr_avgy: calculates the average value of the regression variable (expr1), removes the null pair (expr1, expr2), and equals to AVG (expr1)
Regr_sxx: return value equal to regr_count (expr1, expr2) * var_pop (expr2)
Regr_syy: returns regr_count (expr1, expr2) * var_pop (expr1)
Regr_sxy: return value equal to regr_count (expr1, expr2) * covar_pop (expr1, expr2)

Row_number, returns the offset of a row in an ordered group, which can be used to sort row numbers by specific criteria.

Stddev: calculates the standard deviation (standard deviation) of the current row about the group ).

Stddev_pop: calculates the population standard deviation and returns the square root of the population variable. The return value is the same as the square root of the var_pop function (standard deviation-population ).

Stddev_samp: calculates the standard deviation of the cumulative sample and returns the square root of the population variable. The return value is the same as the square root of the var_pop function (standard deviation-sample ).

Sum, the sum of the expressions in the function compute group.

Var_pop, (variance population) This function returns the population variable of a non-empty set (ignore null). var_pop performs the following calculation:
(Sum (expr2)-sum (expr2)/count (expr ).

Var_samp, (variance sample) This function returns the sample variable of a non-empty set (ignore null). var_pop performs the following calculation:
(Sum (expr * expr)-sum (expr) * sum (expr)/count (expr)/(count (expr)-1 ).

Variance. This function returns the variable of the expression. Oracle calculates the variable as follows:
If the number of rows in the expression is 1, 0 is returned.
If the number of rows in the expression is greater than 1, var_samp is returned.


In addition, the following information about rollup and cube was added in June March 28:
In addition to the basic syntax, group by statements also support rollup and cube statements.
Rollup (a, B, c) first performs group by on (a, B, c), then performs group by on (a, B), and then () perform the group by operation on the entire table.
Group by cube (A, B, C), group by (A, B, C) is first performed, followed by (a, B ), (A, C), (A), (B, C), (B), (c), and finally perform the group by operation on the entire table.
Grouping_id () can beautify the effect.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.