On the csdn Last post, I always well-meaning to advise the upstairs downstairs friends to use more joints. Can respond very little. Often a simple function, also must be written as a subquery or a cursor, make very complex and tedious. Indeed, this writing for beginners, the effort does not cost brain, thinking better understanding. So often the score is also these replies. But in fact, if you are really familiar with the SQL programming style, you will understand that the join query is the most direct, clearest and most powerful method, and the better way is to recruit no wins, a simple query to end the fight. Let me give a few examples to illustrate this point.
Example 1-1, duplicate records query and processing
There are always some friends on the Internet to ask, a table, there are duplicate records, how to do? Of course, a relational database with good design style, each table should have a primary key and a unique index, so there should be no duplicate records at all. But sometimes there are things that shouldn't happen, like "seven." Seven things ", such as" 9. 11 "... Well, actually, what I'm trying to say is, sometimes someone doesn't have the concept of a database, he doesn't know what the primary key is, or randomly builds an ID column sucks (which is nothing, no one is born to design the database, the key is to be willing to admit their shortcomings and improve). It is more common that our data may come from some spreadsheet or text file, and the problem is only discovered when imported into the database.
Here, we set up a table that represents the stock of a store. I intentionally did not add any indexes and constraints so that it would be easy to make a problem (like the nude mice in the lab).
CREATE TABLE PRODUCT (
ID INT, PName CHAR (20),
Price, number INT,
Pdescription VARCHAR (50))
Now, we can insert some data into it:
Idpnamepricenumberpdescription
1Apple 123000
1Apple 123000
2Banana 16.997600
3Olive 25.224500
4Orange 15.995500
4Coco Nut 40.992000
5Pineapple 302500
6Olive 25.223000
There are some obvious problems here, the first two lines are exactly the same, so the duplicate data has no meaning at all. InterBase is a good point, in its ibconsole can be directly modified them. In SQL Server, the system cannot distinguish between the two lines, and we receive an error message when we try to modify either row. In fact, this is also the response of a relational database. So what should we do?
In fact, the way to handle it is simpler than finding the wrong data, and join queries are not used. With an SQL statement
SELECT DISTINCT * from PRODUCT
You can compress repetitive data and generate a dataset that includes normal data. The results are as follows:
Idpnamepricenumberpdescription
1Apple 123000
2Banana 16.997600
3Olive 25.224500
4Orange 15.995500
4Coco Nut 40.992000
5Pineapple 302500
6Olive 25.223000
For support Select ... Into ... From the database of the statement, such a sentence
SELECT DISTINCT * into newtable from PRODUCT
You can import the data into a new table (newtable). Or you can use inert into ... SELECT DISTINCT * FROM ... Import it into an existing table. In short, there is the right data set, and then how to deal with it well done. I'm sure you know this. After the DISTINCT keyword is merged, the cursor will not be used to process the duplicate data.
This is the first step, and sometimes we don't want to compress them, but we want to see who's out of the question first. OK, use the following statement to find duplicate records, the rightmost column "Row_count" indicates the number of times this row of data repeats in the table:
SELECT ID, PName, price, number, pdescription, COUNT (*) Row_count
From PRODUCT
GROUP by ID, pname, price, number, pdescription
Having COUNT (*) > 1
Idpnamepricenumberpdescriptionrow_count
1apple123000null2
(The number of rows affected is 1 rows)
is actually the keyword GROUP by ... A simple use of the having and statistical function count, remember to write the Complete field list after group by. This means that we want exactly the same data, and each field is the same.
When there are many data in the product table, it is inefficient to generate the correct dataset directly from the previous method. Now with this result set, we can work efficiently. Now, we use
SELECT ID, PName, price, number, pdescription
From PRODUCT
GROUP by ID, pname, price, number, pdescription
Having COUNT (*) > 1
The duplicate data is born into a compressed, correct dataset, exported to a temporary table using the preceding method, and then used
DELETE from PRODUCT
WHERE ID in (
SELECT ID
From PRODUCT
GROUP by ID, pname, price, number, pdescription
Having COUNT (*) > 1
)
Remove duplicate data from the product table and insert the compressed data into product. Now there is no longer a full duplicate, no identifiable data in the product table.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.
A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service