The distinct keyword can only be used to filter all records in the query field that are the same (the record set is the same). If you want to specify a field, it does not work. In addition, the distinct keyword is sorted, Which is inefficient.
Select distinct name from t1 can eliminate duplicate records, but only one field can be taken. Now, the value of id and name must be taken at the same time.
Select distinct id, name from t1 can take multiple fields, but only records with the same values of the two fields can be eliminated.
Therefore, you cannot use distinct to achieve the desired effect. You can use group by to solve this problem.
For example, the fields to be displayed are A, B, and C, and the content of field A cannot be repeated. You can use the following statement:
Select A, min (B), min (C), count (*) from [table] where [condition] group by
Having [condition] order by A desc
For better display of the title header, replace select A, min (B), min (C), count (*) with select A as A, min (B) as B, min (C) as C, count (*) as repeated times
The displayed fields and sorting fields must be included in group.
However, the displayed fields include min, max, count, avg, sum, and other Aggregate functions.
Min (B), min (C), count (*)
The general condition is written after the where clause
Conditions with Aggregate functions are written after having.
If having plus count (*)> 1 in the preceding statement, we can find records with repeat times of record A greater than 1.
If having plus count (*)> 2 in the preceding statement, we can find records with repeat times of record A greater than 2.
If having plus count (*)> = 1 in the preceding statement, all records can be queried, but only one record is displayed, and the number of repetitions is shown later ---- This is the expected result, and the statement can be passed through hibernate
The following statement can query the data that is duplicated:
Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1
Change the ">" number above to "=" to query the non-duplicated data.
For example, select count (*) from (select gcmc, gkrq, count (*) from gczbxx_zhao t group by gcmc, gkrq having
Count (*)> = 1 order by GKRQ)
Select * from gczbxx_zhao where viewid in (select max (viewid) from gczbxx_zhao group
Gcmc) order by gkrq desc --- this is feasible.
One question says: the efficiency of distinct deduplication is very low. I saw this article on the Internet as if it was very efficient to use group by having?
In a test, I have a product table with 0.26 million records. Only the product number is indexed and the brand name field is distinct.
Select brand, count (*) from tab_commbaseinfo group by brand having count (*) = 1
Average time is: 0.453
Select distinct brand from tab_commbaseinfo
Average time is: 0.39
I don't know if there are other methods.