SQL story Excerpt (ii) inquiry into ———— join query

Source: Internet
Author: User
Tags filter count interbase join joins sql query return

Example 1-2, information with duplicate key values
Now look at the product table that compresses the duplicate information
ID pname Price Number pdescription
1Apple 123000NULL
2Banana 16.997600NULL
3Olive 25.224500NULL
4Coco Nut 40.992000NULL
4Orange 15.995500NULL
5Pineapple 302500NULL
6Olive 25.223000NULL

Here are a few more problematic places. The IDs of Coco Nut and orange in the table are the names of 4,id 3 and 6, and the name of the two products (pname) are olive. And our intent is obviously to have each product correspond to an ID number, and the ID number and pname in the table should be unique. There are only 7 lines in this table, we can directly observe, with the naked eye to find the problem, the table in a large amount of data?
Now let's review the statements in Example 1 that query for duplicate data. We use
......
GROUP by ID, pname, price, number, pdescription
The dataset is grouped and used
Having COUNT (*) > 1
Filter out duplicate data, and so on, separate the ID column for grouping and filtering, can you find ID duplicate data? Give it a try:
SELECT ID
From PRODUCT
GROUP by ID
Having COUNT (*) > 1
return Result:
Id
-----------
4
So it appears that the ID number we want, but this report is really not practical, now we look at this ID is who:
SELECT ID, PName, price, number, pdescription
From PRODUCT
GROUP by ID
Having COUNT (*) > 1
This statement performs an error, and it is clear that the four columns after the ID number are neither in group by nor in statistical functions, and they should not appear here. And such a statement:
SELECT ID, PName, price, number, pdescription
From PRODUCT
GROUP by ID, pname, price, number, pdescription
Having COUNT (*) > 1
Returns an empty result set:
ID pname Price Number pdescription
----------- -------------------- --------------------- ----------- ------------------------------------



(The number of rows affected is 0 rows)
Many friends use subqueries
SELECT ID, PName, price, number, pdescription
From PRODUCT
WHERE ID in (
SELECT ID
From PRODUCT
GROUP by ID
Having COUNT (*) > 1
)
To solve, I have also seen an example of a two-level cursor (!? So there's no better way to do it?
I prefer the following line of statements:
SELECT l.id, R.pname, R.price, R.number, r.pdescription
From PRODUCT L
JOIN PRODUCT R
On l.id = R.id
GROUP by L.id, R.pname, R.price, R.number, r.pdescription
Having COUNT (*) > 1
The returned results are as follows:
ID Pnameprice Number Pdescription
4Coco Nut40.992000null
4Orange 15.995500NULL

With a join query, the speed is much faster than a subquery, because you do not need to search for data in the result set in the Sub statement each time you use the in operation. In particular, when there is a lot of data in the table and the result set is large, the difference is quite alarming. If you are running on a multiprocessor, multiple hard disk server, the join query can also make the most of the parallel operations to improve efficiency. In the summer of 1999, IBM engineers at Lanzhou University attended the National Database Technology conference, explaining to us the performance of optimizing join queries using parallel computing technology. By contrast, subqueries are a bit of a disadvantage in this regard. Some powerful database engines turn subqueries into join queries when appropriate, or vice versa. But is it better to hold the truth in our own hands?
Of course, subqueries are not necessarily slower than joins, and there are opportunities for me to demonstrate that some subqueries are faster than join queries, and even some subquery statements are difficult to implement with joins. Theoretically, a join query generates a Cartesian product, the size of which is the product of each subset that makes up it. This creates a huge space overhead (none of the database systems we see actually do). The subquery is more complex. The result set is divided into a standard quantum query and to the quantum query, (standard quantum query refers to the return of a simple data query, the SEED query in MS SQL Server can be directly as a column of the external query); The relationship between the subquery and the external query is divided, There are correlated subqueries and unrelated subqueries (the result set of the dependent subquery depends on the current row of rows of the external query, and not the unrelated subquery). Usually correlated subqueries are a bit of a headache, it needs to execute the subquery repeatedly, if the data set of the external query operation (not the returned dataset) is N, the subquery operation's data set row number is m, that its complexity will be the maximum of M n times! Coupled with the large space cost of the subquery data set expansion, the speed will be greatly affected. The subquery in the example above is lucky to be an unrelated to quantum query, but even so, it is also difficult to save a child result set in an operation and to operate it repeatedly, and the result is that it is not faster than a join query. This is why MySQL has not supported subqueries for a long time. In general, the operation of large datasets, the performance of the join query is always better than the subquery, so we should fully grasp this method.
Taking the final join query in Example 2 as an example, we analyze the idea of writing this join query. As mentioned earlier, in theory, when you join a dataset, a Cartesian product is generated. If there is a table t the contents are as follows:
Word
----
A
B
Then execute "Select L.word, R.word from T as L JOIN T as R on l.word = R.word" When Mr.
L.wordr.word
A A
A b
Ba
Bb

Then execute "on l.word = R.word" and filter it into
L.wordr.word
A A
Bb

Here, we use the middle of this Cartesian product to make a fuss. If the data in the ID column of the product table is indeed unique, then after it is joined, it should be the same as the T-table Word column that you just saw, and the ID in the result set remains unique. Now let's try this statement:
SELECT l.id, R.id, L.pname, R.pname
From PRODUCT L
JOIN PRODUCT R
On l.id = R.id
The results are as follows:
Ididpnamepname
11AppleApple
22BananaBanana
33OliveOlive
44OrangeOrange
44Coco Nutorange
44OrangeCoco Nut
44Coco Nutcoco Nut
55PineapplePineapple
66OliveOlive

Have you noticed? Originally repeated two times the ID number 4, now, repeated 4 times. This is because the coco nut and the Orange two row ID number repeats, the Cartesian product is its square, cannot filter for the join condition. So, when we group the result set by the ID field of one subset and the other fields of another subset, the data with ID 4 is divided into two groups, two rows per group, and the normal data is only one row per group. We can find the data with duplicate ID, even we can know how many times it repeats! Take a look at the following SQL statement:
SELECT l.id, R.pname, R.price, R.number, R.pdescription, COUNT (*) Row_count
From PRODUCT L
JOIN PRODUCT R
On l.id = R.id
GROUP by L.id, R.pname, R.price, R.number, r.pdescription
Having COUNT (*) > 1
return Result:
Idpnamepricenumberpdescriptionrow_count
4Coco Nut40.992000null2
4orange15.995500null2

(The number of rows affected is 2 rows)
This structural scalability is also less than subquery, and it also brings some interesting effects, both good and bad, which will be discussed later in the chapter.
Similarly, use
SELECT r.id, L.pname, R.price, R.number, R.pdescription, COUNT (*) Row_count
From PRODUCT L
JOIN PRODUCT R
On l.pname = R.pname
GROUP by R.id, L.pname, R.price, R.number, r.pdescription
Having COUNT (*) > 1
Statement, you can find the duplicate data of the PName column and the number of times it repeats:
ID pname Price number pdescription Row_count
3Olive 25.224500NULL 2
6Olive 25.223000NULL 2

(The number of rows affected is 2 rows)
Through the above two examples, we can see that more in-depth understanding of the operating mechanism of relational databases, skilled use of simple query and join query, can effectively improve the performance and maintainability of the program, reduce the complexity of the code. Why not?
There is no Money data type in the InterBase, so when you create the product table in InterBase, remember to define the Price field as another type, where I use numeric (8,4). In addition, there is a problem in InterBase that when you execute the following statement to delete duplicate data:
DELETE from PRODUCT
WHERE ID in (SELECT ID
From PRODUCT
GROUP by ID, pname, price, number, pdescription
Having COUNT (*) > 1)
Theoretically, it should have erased both lines of "Apple", as MS SQL Server2000 did. But it only erased one line! The data in the executed table is as follows:
Idpnamepricenumberpdescription
1apple12.00003000null
2banana16.99007600null

Obviously, after deleting a row of data, it requery the datasheet and then decides to delete the next line of data. For relational databases, this is not a good thing, neither rigorous nor graceful. But specifically to this statement, is a good thing, we only use a delete command to complete the operation should be divided into several operations of the data merge. In InterBase, there are similar places where there is no real collection operation like MS SQL Server. I will refer to it at any time in future examples. In practical work, we should also pay attention to.
The InterBase 6.0.1 I use is a free Open-source database, and MS SQL Server is the apple of Microsoft's eye, and MS SQL Server7 's designer won the 1998 Turing Prize. I have to admit that this lightweight and quick InterBase is a wonderful thing to do, and it implements some of the powerful features of MS SQL Server, such as cascading updates, until version 2000, and of course it has its own shortcomings. But considering its cost-effective, we really can't ask for more. It is also recommended that beginners take this opportunity to understand the use of temporary tables. It's not much space here.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.