MySQL 8.0 adds SQL syntax support for window functions and CTE, and sqlcte
I tried some new features of MySQL 8.0.
If you have used MSSQL or window functions in Oracle (analytics functions in Oracle ),
Before using MySQL 8.0, you will know the pain of using window functions to process logic. Although pure SQL can also implement functions similar to window functions, however, this kind of SQL statement is not easy to read and use, and seems to be difficult to write.
When both MSSQL and Oracle and PostgreSQL fully support window functions, MySQL 8.0 also adds window functions, which makes SQL coding easier, this is one of the highlights of MySQL8.0.
For window functions, such as row_number (), rank (), dense_rank (), NTILE (), PERCENT_RANK (), etc., in MSSQL, Oracle, and PostgreSQL, the syntax used is basically the same as the expression logic.
In this regard, several database vendors are still relatively unified. If you are familiar with window functions (analysis functions) in any relational data, you can use them with confidence after MySQL 8.0.
Use a case to experience the convenience of window functions. If you are familiar with MSSQL, Oracle, or PostgreSQL, you don't need to read them.
Test case: Simulate an order table. The fields are the order number, user number, amount, and creation time.
drop table if exists order_infocreate table order_info( order_id int primary key, user_no varchar(10), amount int, create_date datetime);insert into order_info values (1,'u0001',100,'2018-1-1');insert into order_info values (2,'u0001',300,'2018-1-2');insert into order_info values (3,'u0001',300,'2018-1-2');insert into order_info values (4,'u0001',800,'2018-1-10');insert into order_info values (5,'u0001',900,'2018-1-20');insert into order_info values (6,'u0002',500,'2018-1-5');insert into order_info values (7,'u0002',600,'2018-1-6');insert into order_info values (8,'u0002',300,'2018-1-10');insert into order_info values (9,'u0002',800,'2018-1-16');insert into order_info values (10,'u0002',800,'2018-1-22');
SQL query is required to obtain the latest order for each user.
In the traditional way, try to format and read a little better. To tell the truth, this SQL statement is a bit confusing.
SELECT * FROM ( SELECT
IF(@y=a.user_no, @x:=@x+1, @x:=1) X , IF(@y=a.user_no, @y, @y:=a.user_no) Y, a.* FROM order_info a, (SELECT @x:=0, @y:=NULL) b ORDER BY a.user_no, a.create_date desc) aWHERE X <= 1;
The following is the execution result. Of course, the execution result can meet the requirements.
If you use the new window function method,
It is to use row_number () over (partition by user_no order by create_date desc) as row_num to encode a number for the original record,
Then, the data with the first number is naturally "The latest order of the user", which is logically clear and the code is concise and readable.
select * from ( select row_number()over(partition by user_no order by create_date desc) as row_num, order_id,user_no,amount,create_date from order_info)t where row_num=1;
Note that * is not allowed when using window functions in MySQL. Each field must be explicitly specified.
Row_number ()
(Group) sort numbers. as in the preceding example, row_number () over (partition by user_no order by create_date desc) as row_num is sorted by user group and create_date, generate a number for existing data.
Of course, you can also sort the whole without grouping. Any window function can be used to group statistics or not group statistics (that is, you can choose not to partition by ***). It depends on your needs)
Rank ()
Similar to row_number (), it is also a sorting function, but what is the difference between rank? The emergence of new things is bound to solve potential problems.
If a data entry is written to the test table: insert into order_info values (11, 'u0002', 800, '2017-1-22 ');
For the U002 user in the test table, two create_date data items are exactly the same (assuming such data exists). When row_number () is numbered, the two data records are encoded with two different numbers.
Theoretically, the rankings of these two data items are tied for the latest. Therefore, rank () is used to solve this problem, that is, when the sorting conditions are the same, the numbers are the same.
Dense_rank ()
Dense_rank () is used to solve the problem of rank () number,
When rank () is numbered, there is a gap. If there are two parallel numbers of 1st, the next rank number is 3, and the result is no data with the number 2.
If you do not want to skip, use dense_rank () instead.
Enhancement of Aggregate functions such as avg and sum in window functions
You can use the window function in aggregate functions, such as sum (amount) over (partition by user_no order by create_date) as sum_amont, to achieve a cumulative sum calculation function.
In the absence of window functions, this requirement can be written in pure SQL, which is too painful to be used.
NTILE (N) divides data into N groups according to certain sorting
For example, you can use NTILE (3) to sort the scores of students into three groups in descending order). This kind of requirement is not used very much.
The table above is used as follows. Orders of user_no = 'u0002' are divided into three groups based on the time latitude to see which group of data of each row.
First_value (column_name) and last_value (column_name)
First_value and last_value are basically named and known, that is, the values of the first and latest fields sorted by a certain type of data.
Let's take a look at the results.
Nth_value (column_name, n)
From the nth row of sorting, the value in the nth_value field is still returned. This function is rarely used. To be honest, it is difficult to express it in a language, let's take a look at the example.
N = 3
N = 4
Cume_dist
Under a certain sorting condition, the number of rows/total number of rows that are less than or equal to the value of the current row results in the distribution percentage of data at a certain latitude.
For example
The date (create_date) of the first row of data is 00:00:00, and the data smaller than or equal to 00:00:00 is one row. The calculation method is: 1st = 1/6
The date (create_date) of the first row of data is 00:00:00, and the data smaller than or equal to 00:00:00 is two rows. The calculation method is: 2nd = 2/6.
And so on
The date (create_date) of the first row of data is 00:00:00, and four rows of Data smaller than or equal to 00:00:00 are calculated as follows: 4th = 4/6
0.6666666666 of the data in the first row means that the data smaller than the date in the fourth row (create_date) accounts for 66.66666666666% of the Qualified Data
Percent_rank ()
It is also the calculation method of data distribution, but the algorithm is changed to: current RANK value-1/total number of rows-1.
The specific algorithm is not detailed, and there are not many actually used algorithms.
Lag and lead
Lag (column, n) obtains a field of the data of the previous n rows of the current data row according to a certain sorting rule, lead (column, n) obtains a field of the next n rows of data of the current data row according to a certain sorting rule,
It is really difficult.
For example, sort the order by time to obtain the time when the previous order occurred and the time when the next order occurred, (The interval between orders or the frequency of buying and buying orders can be calculated)
select order_id, user_no, amount, create_date, lag(create_date,1) over (partition by user_no order by create_date asc) 'last_transaction_time', lead(create_date,1) over (partition by user_no order by create_date asc) 'next_transaction_time'from order_info ;
CTE public table expression
There are two types of CTE: Non-recursive CTE and recursive CTE.
Non-recursive CTE can be used to increase the readability of the code and increase the structured expression of the logic.
At ordinary times, we hate SQL statements that contain dozens or even hundreds of rows. We don't know what they want to express, and it's hard to understand. For such SQL statements, we can use CTE to solve them in segments,
For example, Logical Block A is made into a cte, logical block B is made into a cte, and then the query continues on the basis of Logical Block A and Logical Block B, so as to implement the entire query with A direct sentence of code, logically, it becomes clearer and more intuitive.
For a simple example, it is not enough to explain the problem here. For example, it is still the first requirement to query the latest order of each user.
The first step is to sort the user's order by time and make a CTE. The second step is to query the CTE above and obtain the data with the row number equal to 1.
The other is recursive CTE, which has many application scenarios, such as querying sub-departments under most departments and sub-departments under each sub-department, the recursive method is required.
Here we do not detail the demonstration, only demonstrate a recursive usage, generate a continuous date in a recursive way.
Of course, recursion will not go on without limit. Different databases have different recursive limits. The default maximum number of recursion times in MySQL 8.0 is 1000.
If the maximum number of underestimates is exceeded, the following error occurs: Recursive query aborted after 1001 iterations. Try increasing @ cte_max_recursion_depth to a larger value.
It is determined by the @ cte_max_recursion_depth parameter.
The limitations of CTE are not significantly different from those of other databases. For example, fields must be included in the query results of CTE, and multiple queries of one CTE cannot be performed consecutively, I believe the old drivers familiar with the CTE are very clear.
The addition of window functions and CTE simplifies the compilation of SQL code and the implementation of logic. It does not mean that these functions cannot be implemented without these features, but the addition of new features, you can write SQL statements in a more elegant and readable way.
However, these are all new functions implemented in MySQL 8.0. Before MySQL 8.0, we should follow a complicated approach.