Reprint Please specify: theviper http://www.cnblogs.com/TheViper
Partially translated from How to select the First/least/max row per group in SQL
Some common SQL problems have similar workarounds, such as finding the most recent logs for each program, finding the most popular items in each category, and finding the top 5 points that each player plays. These problems can be summed up in selecting top N from each group.
Fruits table
Select the lowest-priced row in each category
Step: 1. Find the required value for the price. 2. Populating other fields
Method 1: Self-connect
Group By type and select the line with the lowest price
Select min as Minprice from Fruits Group by type;
Use the self-connection to merge the remaining lines with the rows above, as the above query has been grouped, and the remaining fields are connected to the table without grouping with a subquery.
Select F.type, f.variety, F.price from ( Selectmin as minprice fromGroup by asinnerjoinas on= and = X.minprice;
In fact, this method can be grouped directly with group, and I don't know what the author thinks.
SELECT MIN as Minprice from Fruits GROUP by TYPE;
Method 2 Dependent subquery
This method is inefficient, but very clear.
Select type, variety, price from Fruits where = (Selectminfrom aswhere= fruits.type);
Select top N rows for each group
Selecttype, variety, price fromFruitswherePrice=(Select min(Price) fromFruits asFwhereF.type=Fruits.type)orPrice=(Select min(Price) fromFruits asFwhereF.type=Fruits.type andPrice>(Select min(Price) fromFruits asF2whereF2.type=Fruits.type));
As you can see, first select the lowest price line, and then select the second low price line, two are connected with or.
This can also be used for self-connected writing, but more complex. As you can see, if you need to elect TOP3,TOP4,... , this approach gets worse.
There's a better way.
Select type, variety, price from Fruits where ( Selectcount(* from as F where=and<=<=2;
This can be understood as traversing the outside of the fruits rows, if the same classification, there are other rows <= the row and the number of such rows <=2, that line meets the requirements, take out.
You can see that this approach is elegant, because changing N to another value does not require rewriting. But this method is essentially the same as the previous method, and it uses the subquery. Some query optimizer does not perform well on subqueries.
Using Union
If there is an index on (type, price), and the index can filter many rows, then you can use limit for each category. Then the union merges them.
(Select * fromFruitswhereType= 'Apple' Order byPrice limit2)Union All(Select * fromFruitswhereType= 'Orange' Order byPrice limit2)Union All(Select * fromFruitswhereType= 'Pear' Order byPrice limit2)Union All(Select * fromFruitswhereType= 'Cherry' Order byPrice limit2)
Note that this is union all, not union. Doing so prevents the result from being sorted to remove duplicate rows before returning the result. There are no duplicate rows in this scenario, so let's tell the database not to sort the weight.
About union see Using union to implement loose index scan in MySQL
Use user variable (variables) for MySQL only
The above union is a good way to do this when the number of rows is small and the index can be used for sorting. The methods described below are only valid for MySQL. Before introducing this method, please see my other article how to number rows in MySQL.
In short, the article increments the number of rows in the same category
The method described below is based on this.
Set @num:= 0,@type:= "';Selecttype, variety, price from ( Selecttype, variety, price,@num:= if(@type =Type@num + 1,1) asRow_number,@type:=Type as Dummy fromFruitsOrder bytype, price) asXwhereX.row_number<= 2;
The subquery creates a temporary table and fills it with Row_number,dummy, which is an operation. Then choose the row_number<=2 line, which is another operation. Although there are two operations, its complexity is still O (n), which is only related to the size of the table, which is much better than the complexity O (N2) of the correlated subquery. The N of the correlated subquery is the number of categories, and if there are many classifications, the performance will be bad.
Finish
How to select the first row/last row/top row of each group in SQL