SQL story Excerpt (ix) ———

SQL story Excerpt (ix) ———— unequal joins

Last Update:2017-02-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Unequal joins

Generally speaking, the SQL language is a disorderly operation. If you want to perform an orderly process, such as comparing the front and back of a sequence, you must use a cursor. However, in some cases, another method can be used, without cursors, to handle the ordered information, which is unequal joins. Let's look at one of the following examples

A while ago, csdn Netizen BuildIT letter, and I discussed such a question: The following table history

CREATE TABLE [HISTORY] (

[Thedate] [DateTime] Null

[Quantity] [INT] Null

) on [PRIMARY]

is stored in a series of historical data, such as:

INSERT HISTORY VALUES (' 2002-01-01 00:00:00.0 ', 11)

Go

INSERT HISTORY VALUES (' 2002-01-02 00:00:00.0 ', 34)

Go

INSERT HISTORY VALUES (' 2002-01-03 00:00:00.0 ', 27)

Go

INSERT HISTORY VALUES (' 2002-01-04 00:00:00.0 ', 43)

Go

Now, we want to query the total from the start date to each date. In other words, display such a result set:

Thedate Quantity Q_sum

2002-01-01 00:00:00.0 11 11

2002-01-02 00:00:00.0 34 45

2002-01-03 00:00:00.0 27 72

2002-01-04 00:00:00.0 43 115

Intuitively, we can build a cursor on the SELECT * from HISTORY ORDER by Thedate, starting with the first, each, plus once. What about a different idea? If we create such a result set, let each day period correspond to the number of its day and the number of records before it. Then we can group by this date and sum the quantity. It is obvious that an unequal query has been formed. My original writing is wrong, the following is the BuildIT modified final statement

Select L.thedate,

L.quantity,

SUM (r.quantity) as Q_sum

From HISTORY l

Join HISTORY R

On L.thedate >= r.thedate

Group BY L.thedate, l.quantity

ORDER BY L.thedate

The unequal join itself is not a one by one correspondence, and its corresponding relation and order are closely related. That's why we can use it for orderly operation. One more natural example:

SELECT l.i, SUM (R.I)

From N L

JOIN N R

On L.i >=r.i

GROUP by L.i

Table n has only one integer column I, which preserves the natural sequence. So, there is no mystery, this is the summation of the natural sequence. Here sum (R.I) represents the cumulative sum of the natural sequence n from 0 to I, which is simpler than the previous problem. But obviously this is not the place to play the power of unequal joins, because it creates a huge triangular dataset, like the following

1 1

2 1

2 2

3 1

3 2

3 3

...

When I performed this query on a list of 16-bit integers, my athlonxp1700+/256mddr machine ran for nearly 30 minutes, and when I wrote this text, it returned a data overflow error. Obviously, even a list of 16-bit integers is too large for this query. My advice is to use an unequal join only if the result set cannot be expressed in formulas. Like this cumulative, we already have a mature formula, why let the computer stupid calculate it? Use the following statement

SELECT I, ((1+i) *i)/2

From N

Compared to the honest accumulation, the speed is extraordinary fast. found that the data overflow, not even a second, but this computer is not able to think of this method, alas ...

Legend of a generation of the Master of Mathematics in Gauss Elementary School, his teacher tested him this question. So almost all the Chinese schoolchildren were tortured by the teacher with this problem. It seems that the purpose of the teachers is to tell us that our IQ is not as Gauss. But I did not want to compare with others ah ...

In college, teach us the first "mathematical analysis," the teacher said the computer is a fool, I just thought it was fun, today is seen, it seems that in the capacity of summing up, the computer is my primary school level, will never catch up with Gauss on primary school.

However, this kind of thing is useful in the place where the formula is difficult to express. For example, a friend of mine wrote a prime sieve with an unequal join, which is interesting. Although it will not be more efficient than the program we write with procedural code, but it can express the essence of sieve method, perhaps we will study number theory, the use of this kind of SQL style notation. The friend taught me a lot about computer science and I wouldn't copy his code out of respect for him. However, the statement itself is not complex, I believe that friends think of using the join query, must be written out, we are interested, you may wish to try. It can also be used to achieve some other series, we will discuss a few later.

An unequal join also has a usage that can be used to generate an ordinal column, such as

SELECT COUNT (L.afield) as ID,

L.afield

From MYTABLE L

Jion MYTABLE R

On L.afield > R.afield

GROUP by L.afield

The afield field can be a string, a date, or, of course, a numeric value, which can be sorted anyway. This thing a bit Chine taste, the amount of data is too large, it is not fun, generally or with the physical line number is good, although not the SQL standard, but practical ah. I have seen this example in MCDBA's review (which is said to have been tested), but my friend has made it himself, and you may have achieved it independently.

The orderly operation of an unequal query is clearly derived from the sortable and reciprocal of the joined fields, so it is best not to do unequal joins on fields with duplicate values (in fact, it is best not to make any joins on fields with duplicate values, unless you are quite sure what you are doing). The data explosion of the equivalent join is terrible enough, if the unequal connection is blown up ... Hey hehe ...

Imagine a pair of duplicate values in an equivalent join, which may result in two pairs of duplicates. But if unequal joins, it is related to the position of repetition. Because this is a triangle, so it appears on the top of the all right, if it appears in the lower part of the triangle ...

An unequal join query is clearly a powerful tool, but it is also one of the shortcuts to getting into trouble. There are several suggestions that are my experience:
If the join generates a large "triangle", don't use it, try a subquery or even a cursor;

The resulting set of results is smaller than the original table, as much as possible to filter out the useless data first;

Using unequal joins for sequence calculation will express very clearly (because is not the process of), but usually in the efficiency of it does not have any advantages, so, usually play can, really use words best consider first;

Also, unequal joins are not easily used in multiple joins, or they may cause leverage.

I wish you all a pleasant journey in this magical world!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More