Details the differences and usage of distinct and row_number () over () in SQL

Source: Internet
Author: User

1 Preface

When we write SQL statements to manipulate the data in the database, it is possible to encounter some less-than-good problems, such as the same field with the same name of the record, we only need to display one, but in fact, the database may contain multiple records with the same name, so that when retrieved, display multiple records, This is against our original intention! Therefore, in order to avoid this situation, we need to carry out the "heavy" treatment, then what is the "go heavy"? To put it bluntly, the same field allows records with the same content to display only one record.

So, how to achieve the function of "de-weight"? In this case, we have two ways to implement this function.

    • The first, when writing a SELECT statement, add distinct keywords;
    • Second, when writing the SELECT statement, call the Row_number () over () function.

What are the similarities and differences between the two ways to achieve the "go-to-weight" function? Next, the author will give a detailed explanation.

2 distinct

In SQL, the keyword distinct is used to return only different values. Its syntax format is:

SELECT DISTINCT 列名称 FROM 表名称

Suppose there is a table "Ceshidemo" that contains two fields, respectively NAME and age, in the following format:

Looking at the table above, we will find that there are two records with the same NAME and three records with the same age. If we run the following SQL statement,

/*** 其中 PPPRDER 为 Schema 的名字,即表 CESHIDEMO 在 PPPRDER 中*/select name from PPPRDER.CESHIDEMO

You will get the following results:

Observing this result, we will find that in the above four records, there are two records with the same NAME value, that is, the value of the 2nd record and the 3rd record is "Gavin". So, what if we want a record with the same NAME to show only one way? At this point, you need to use the DISTINCT keyword! Next, run the following SQL statement,

select distinct name from PPPRDER.CESHIDEMO

You will get the following results:

Observing the result, it is clear that our request has been fulfilled! However, we can not help but think, if the DISTINCT keyword at the same time in two fields will produce what effect? Now that you think about it, let's try it, run the following SQL statement,

select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

Look at the results, gee, seems to have no effect ah? She showed all the records. There are two records with the same NAME value, there are three records with the same age value, there is no change at all! But in fact, the result should be this. Because when distinct functions in more than one field, she will only "de-weight" all records with the same field values , obviously our "pathetic" four records do not satisfy the condition, so distinct will think that the above four records are not the same. Word ..., next, let's add an identical record to the table "Ceshidemo" and verify it. The table after adding a record is as follows:

Then run the following SQL statement,

select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

By observing the results, we have perfectly verified the above conclusions.

In addition, there is a need to pay special attention, that is: The keyword distinct can only be placed in the SQL statement in the first of all fields to work, if misplaced, SQL will not error, but will not play any effect.

3 Row_number () over ()

In the SQL Server database, we are provided with a function row_number () is used to label records in a database table, followed by a function over () when used, and the function over () to group and sort records in a table. The syntax used for both is:

BYORDERBY COLUMN2)

The records in the table are grouped by field COLUMN1, sorted by field COLUMN2, where

    • PARTITION by: Indicates grouping
    • ORDER BY: Indicates sort

Next, we test the data in the table "Ceshidemo". First, give the result of the query without using the Row_number () over () function, as follows:

Then, run the following SQL statement,

select  PPPRDER.CESHIDEMO.*, row_number() over(partition by age order by name desc) from PPPRDER.CESHIDEMO

The results are as follows:

As can be seen from the above results, on the basis of the original table, there is a column marked with a number of columns. In turn, we analyze the SQL statements we run and find that they are actually grouped by the value of the field age and sorted by the value of the field NAME. As a result, functions are validated.

Next, let's examine how to use the Row_number () over () function to achieve the function of "de-weight". By observing the above results, we can find that if you group by NAME, sort by age, and then take the first record of each group, you may be able to achieve the "go-heavy" function Ah! So try it, run the following SQL statement,

/** 其中 rn 表示最后添加的那一列*/select * from (select  PPPRDER.CESHIDEMO.*, row_number() over(partition by name order by age desc) rn from PPPRDER.CESHIDEMO)where rn = 1

After running, the results are as follows:

Observing the above results, we found that, alas, the data "go heavy" function is accidentally we realized Ah! Unfortunately, if we are careful, we will find a very uncomfortable thing, that is, in the execution of the above SQL statement "go to the weight", there is a NAME value of "Gavin", the Age value of "18" record is filtered out, but in real life, the same name of different age things too normal.

4 Summary

By reading and practicing the above, we already know that the function of data "de-weight" can be achieved either by using the keyword distinct or by using the function row_number () over (). But in the process of implementation, we should pay special attention to the use of the two characteristics and differences.

    • When using the keyword DISTINCT, we need to know that it works on a single field and multiple fields when there is a difference, when used in a single field, its "deduplication" is the table of all the values of the field duplicate data; When acting on multiple fields, its "deduplication" of all the fields in the table (that is, distinct The data that has the same value for the multiple fields of the specific action.
    • When the function row_number () over () is used, it is sorted by first grouping, then the first record of each group is removed for "de-weight" (as in this blog post). Of course, here we can also through the different constraints to carry out the "go to the weight", how to achieve, we need to move their brains to think about it!

concluding remarks: in this blog post, the author detailed his own on the keyword distinct and function row_number () over () to the data "de-weight" of some understanding, hope that the above content can be helpful to everyone!

Details the differences and usage of distinct and row_number () over () in SQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.