Deduplication in Oracle does not require distinct

Source: Internet
Author: User

The distinct keyword can only be used to filter all records in the query field that are the same (the record set is the same). If you want to specify a field, it does not work. In addition, the distinct keyword is sorted, Which is inefficient.

Select distinct name from t1 can eliminate duplicate records, but only one field can be taken. Now, the value of id and name must be taken at the same time.

Select distinct id, name from t1 can take multiple fields, but only records with the same values of the two fields can be eliminated.

Therefore, you cannot use distinct to achieve the desired effect. You can use group by to solve this problem.

For example, the fields to be displayed are A, B, and C, and the content of field A cannot be repeated. You can use the following statement:

Select A, min (B), min (C), count (*) from [table] where [condition] group by

Having [condition] order by A desc

For better display of the title header, replace select A, min (B), min (C), count (*) with select A as A, min (B) as B, min (C) as C, count (*) as repeated times

The displayed fields and sorting fields must be included in group.

However, the displayed fields include min, max, count, avg, sum, and other Aggregate functions.

Min (B), min (C), count (*)

The general condition is written after the where clause

Conditions with Aggregate functions are written after having.

If having plus count (*)> 1 in the preceding statement, we can find records with repeat times of record A greater than 1.

If having plus count (*)> 2 in the preceding statement, we can find records with repeat times of record A greater than 2.

If having plus count (*)> = 1 in the preceding statement, all records can be queried, but only one record is displayed, and the number of repetitions is shown later ---- This is the expected result, and the statement can be passed through hibernate

The following statement can query the data that is duplicated:

Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1

Change the ">" number above to "=" to query the non-duplicated data.

For example, select count (*) from (select gcmc, gkrq, count (*) from gczbxx_zhao t group by gcmc, gkrq having

Count (*)> = 1 order by GKRQ)

Select * from gczbxx_zhao where viewid in (select max (viewid) from gczbxx_zhao group

Gcmc) order by gkrq desc --- this is feasible.

One question says: the efficiency of distinct deduplication is very low. I saw this article on the Internet as if it was very efficient to use group by having?

In a test, I have a product table with 0.26 million records. Only the product number is indexed and the brand name field is distinct.

Select brand, count (*) from tab_commbaseinfo group by brand having count (*) = 1

Average time is: 0.453

Select distinct brand from tab_commbaseinfo

Average time is: 0.39

I don't know if there are other methods.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.