Ten steps to fully understand SQL

Source: Internet
Author: User
Tags joins what sql

Many programmers view SQL as a scourge. SQL is one of the few declarative languages that run in a way that is completely different from the command-line language we know, the object-oriented programming language, or even the functional language (although some people think the SQL language is also a functional language).

We write SQL every day and apply it to open source software Jooq. So I wanted to introduce the beauty of SQL to those who still have a headache, so this article is specially written for the following readers:

1, in the work will use the SQL but it does not fully understand the people.

2. People who are skilled in using SQL but do not understand their grammatical logic.

3. People who want to teach others about SQL.

This article focuses on the SELECT sentence, and other DML (data Manipulation Language command) will be introduced in other articles.

10 simple steps to fully understand SQL

First, SQL is a declarative language

The first thing to do is to keep this concept in mind: "declaration." The SQL language is an example of declaring a computer what you want from the raw data, rather than telling the computer how to get the results. Isn't that great?

(Translator note: In short, the SQL language declares the properties of the result set, and the computer picks up the data from the database according to what SQL declares, rather than the traditional programming thinking to indicate how the computer operates.)

Select first_name,last_name from Employees where salary>100000

The above example is easy to understand, and we don't care where these employee records come from, all we need is the data for those who are well-paid (translator note: salary>100000).

Where did we learn that?

If the SQL language is so simple, what makes people "smell SQL"? The main reason is that our subconscious mind is thinking in terms of imperative programming. It's like this: "Computer, take this step first, then perform that step, but before that, check to see if condition A and condition B are met." For example, using variable parameters, using loop statements, iterating, calling functions, and so on, are all thought-ins of this type of imperative programming.

The syntax of SQL is not executed in syntax order

The SQL statement has a feature that makes most people confused: the order in which the SQL statements are executed is inconsistent with the syntax order of their statements. The syntax order of the SQL statement is:

SELECT[DISTINCT]
From
WHERE
GROUP by
Having
UNION
ORDER by

For ease of understanding, the above does not list all the SQL syntax structure, but it is sufficient to explain that the syntax of the SQL statement and its order of execution is completely different, the above statement as an example, the order of execution is:

From
WHERE
GROUP by
Having
SELECT
DISTINCT
UNION
ORDER by
There are three interesting areas to note about the execution order of SQL statements:

1, from is the first step of SQL statement execution, not SELECT. The first step in the database execution of the SQL statement is to load the data from the hard disk into the data buffer in order to manipulate the data. (Translator Note: "The first thing that happens was loading data from the disk into memory, in order to operate on such data.", but not so , in the case of common databases such as Oracle, data is extracted from the hard disk into a data buffer. )

2. SELECT is executed after most of the statements have been executed, strictly speaking, after the From and GROUP by. It is important to understand this, which is why you cannot use the field that sets the alias in the SELECT in the where to determine the condition.

1 SELECT a.x + a.y as Z
2 from A
3 WHERE z = 10
--Z is not available here, because select is the last executed statement!

If you want to reuse alias Z, you have two choices. Or just re-write it again. The expression represented by Z:

1 SELECT a.x + a.y as Z
2 from A
3 WHERE (a.x + a.y) = 10
... or resorting to derivative tables, universal data expressions, or views to avoid alias reuse. Take a look at the examples below.

3. The UNION is always ranked before order by, both syntactically and in order of execution. Many people think that each UNION segment can use ORDER by ordering, but this is not true according to the SQL language standard and the differences in the execution of each database SQL. Although some databases allow for the ordering of SQL statement subqueries (subqueries) or derived tables (derived tables), this does not mean that the sort order will remain sorted after the UNION operation.

Note: Not all databases use the same parsing method for SQL statements. such as MySQL, PostgreSQL, and SQLite will not be executed as described in the 2nd above.

What have we learned?

Since not all databases execute SQL forecasts in the way described above, what are our gains? Our harvest is always to remember that the syntax order of SQL statements is inconsistent with the order in which they are executed, so that we can avoid generic errors. If you can remember the differences in syntax order and execution Order of SQL statements, you can easily understand some of the most common SQL problems.

Of course, if a language is designed to respond directly to the order of execution of its statements in a grammatical order, the language is very friendly to programmers, and the programming language level design concept has been applied by Microsoft to the LINQ language.

Third, the core of the SQL language is a reference to the table (table references)

Because of the differences in the syntax order and execution Order of SQL statements, many students think that the field information in select is the core of the SQL statement. The real core, in fact, is the reference to the table.

According to the SQL standard, the FROM statement is defined as:

1 <from clause>:: = from <table reference> [{<comma> <table reference>} ...]
The "output" from statement is a union table that comes from the union of all referenced tables in a dimension. Let's take a slow analysis:

1 from A, b
The output from the statement above is a union table, combined with table A and table B. If Table A has three fields and the B table has 5 fields, then this "output table" has 8 (=5+3) fields.

The data in this joint table is a*b, the Cartesian product of A and B. In other words, each piece of data in table A is paired with every piece of data in table B. If a table has 3 data and B has 5 data, then the Union table will have 15 (=5*3) data.

The result of the from output is filtered by the WHERE statement to be processed by the GROUP by statement to form a new output result. We will discuss this issue again later.

If we look at the perspective of Set theory (relational algebra), a table of a database is a set of data-element relationships, and each SQL statement alters one or more relationships, resulting in a new relationship (that is, the creation of a new table) of the data element.

What have we learned?

It's easy to understand how the data changes in the "pipelining" of the SQL statement when thinking about issues from a table perspective.

Iv. flexible referencing tables can make SQL statements more powerful

Flexible referencing of tables can make SQL statements more powerful. A simple example is the use of joins. Strictly speaking, the JOIN statement is not a part of SELECT, but rather a special table reference statement. The connection definition for tables in the SQL language standard is as follows:

1 <table reference>:: =
2 <table name>
3 | <derived table>
4 | <joined table>

Take the previous example:

1 from A, b

A May lose a connection to the following table:

1 A1 JOIN A2 on a1.id = A2.id

Putting it in the previous example becomes:

1 from A1 JOIN A2 on a1.id = A2.id, b

Although it is not common practice to combine a join table with a comma with another table, you can do so. As a result, the final output table has a a1+a2+b field.

(Translator Note: The original word is degree, translated into dimensions.) If we view a table, we can imagine that each table is composed of two dimensions, the horizontal dimension is the field or column we call, the English language is columns, the longitudinal dimension represents each data, the English record, according to the context, the author refers to the number of fields. )

Applications that derive tables in SQL statements are even more powerful than table joins, so let's talk about table joins.

What have we learned?

When thinking about a problem, it's easy to understand how the data is handled by the SQL statement and to help you understand what the complex table references do, from the perspective of the table reference.

More importantly, understanding joins is the key to building the join table, not part of the SELECT statement. There are some databases that allow JOIN in INSERT, UPDATE, DELETE.

V. Table joins are recommended in SQL statements

Let's take a look at the first sentence:

1 from A, b

Advanced SQL programmers may learn to give you advice: try not to use commas instead of joins for table joins, which can improve the readability of your SQL statements and avoid errors.

Using commas to simplify SQL statements can sometimes cause confusion in thinking about the following statement:

1 from A, B, C, D, E, F, G, H
2 WHERE a.a1 = B.BX
3 and a.a2 = C.c1
4 and D.D1 = B.BC
5--etc ...

It is not difficult to see the benefits of using JOIN statements:

    • Safety. Joins and tables to be joined are close enough to avoid errors.
    • More connections, join statements can be distinguished from external connections and internal connections.

What have we learned?

Remember to use join as a connection to the table, never use a comma to join the table after the from.

Vi. different connection operations in SQL statements

In SQL statements, table joins are fundamentally divided into five ways:

    • Equi JOIN
    • SEMI JOIN
    • ANTI JOIN
    • Cross JOIN
    • Division

Equi JOIN

This is the most common join operation, and it contains two ways to connect:

    • INNER join (or join)
    • OUTER join (includes: Left, right, full OUTER join)

It is easiest to use examples to illustrate the difference:

1--This table reference contains authors and their books.
2--there is a record for each book and its author.
3--authors without books is not included
4 author JOIN book on author.id = book.author_id
5
6--This table reference contains authors and their books
7--There is a record for each book and its author.
8--... OR there is a "empty" record for authors without books
9--("empty" meaning that all book columns is NULL)
Ten author left OUTER JOIN book on author.id = book.author_id

SEMI JOIN

This connection relationship behaves in two ways in SQL: using in, or using EXISTS. "SEMI" is the meaning of "half" in Latin. This connection is only part of the connection to the target table. What does that mean? Think again about the connection between the author and the title. Let's imagine this: we don't need a combination of the author/title, just the author of the book in the title list. Then we can write it like this:

1--Using in
2 from author
3 WHERE author.id in (SELECT book.author_id from book)
4
5--Using EXISTS
6 from author
7 where EXISTS (SELECT 1 from book WHERE book.author_id = author.id)

Although there are no strict rules on when you should use in and when you should use EXISTS, you should know these things:

    • In better readability than EXISTS
    • EXISTS is better than in expression (more suitable for complex statements)
    • There is no difference in performance between the two (but the performance difference is very large for some databases)

Because using INNER join can also get the author information corresponding to the book in the title table, many beginners have the opportunity to think that they can go through DISTINCT, and then write the SEMI join statement like this:

1--Find only those authors who also has books
2 SELECT DISTINCT first_name, last_name
3 from author
4 JOIN book on author.id = book.author_id

This is a very bad way to do this, for the following reasons:

    • SQL statement performance is poor: because Deduplication (DISTINCT) requires the database to repeatedly read data from the hard disk into memory. (Translator Note: DISTINCT is really a resource-intensive operation, but each database may operate differently for DISTINCT).
    • This is not entirely true: although there may not be a problem with this writing, it becomes very difficult to get the right results as SQL statements become more complex.

More about the dangers of abusing DISTINCT can refer to this blog post

(http://blog.jooq.org/2013/07/30/10-common-mistakes-java-developers-make-when-writing-sql/).

ANTI JOIN

This connection is just the opposite of the SEMI join. You can use this connection by adding a not keyword in or before EXISTS. For example, we have a list of books that are not in the title list.

1--Using in
2 from author
3 WHERE author.id not in (SELECT book.author_id from book)
4
5--Using EXISTS
6 from author
7 where not EXISTS (SELECT 1 from book WHERE book.author_id = author.id)

It is also possible to refer to SEMI JOIN for features such as performance, readability, and expression.

This blog post describes what to do if you encounter NULL when using not in, because there is a little deviation from this topic, it is not described in detail, interested students can read

(http://blog.jooq.org/2012/01/27/sql-incompatibilities-not-in-and-null-values/).

Cross JOIN

This connection process is the product of two connected tables: Each data for the first table corresponds to each data in the second table. As we have seen before, this is the use of commas in the from statement. In practical applications, there are few places where cross joins can be used, but once they are used, you can express them in such an SQL statement:

1--Combine every author with every book
2 Author Cross JOIN book

Division

Division is really a freak. In short, if join is a multiplication operation, then division is the inverse of the join process. Division's relationship is difficult to express in SQL, which is a beginner's guide, explaining that division is beyond our means. But interested students can still come to see these three articles

(http://blog.jooq.org/2012/03/30/advanced-sql-relational-division-in-jooq/)

(http://en.wikipedia.org/wiki/Relational_algebra#Division)

(https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/).

Recommended Reading →_→ "Drawing interpreting SQL Union statement"

What have we learned?

Learned a lot! Let's think again in our minds. SQL is a reference to a table, and JOIN is a complex way to reference a table. However, there is a difference between the expression of the SQL language and the actual logic we need, not all the logical relationships can find the corresponding join operation, so it is necessary for us to accumulate and learn the logic of the relationship in peacetime, so that you can later write SQL statements to choose the appropriate join operation.

Vii. derived tables in SQL as variables

Before that, we learned that SQL is a declarative language, and that SQL statements cannot contain variables. But you can write statements that are similar to variables, which are called derived tables:

Plainly, the so-called derived table is a subquery in parentheses:

1--A derived table
2 from (SELECT * from author)

It is important to note that there are times when we can define a correlation name for a derived table (that is, what we call an alias).

1--A derived table with an alias
2 from (SELECT * from author) a

Derived tables can effectively avoid problems that arise from SQL logic. For example, if you want to reuse a result that is queried by a SELECT and WHERE statement, you can write (in Oracle, for example):

1--Get authors ' first and last names, and their
2 SELECT first_name, last_name, age
3 From (
4 SELECT first_name, last_name, Current_date-date_of_birth age
5 from author
6)
7--If The age was greater than 10000 days
8 WHERE Age > 10000

It is important to note that in some databases, as well as in the sql:1990 standard, derived tables are classified as the next-level-universal Table statement (common table experssion). This allows you to reuse a derived table multiple times in a SELECT statement. The above example is (almost) equivalent to the following statement:

1 with A as (
2 SELECT first_name, last_name, Current_date-date_of_birth age
3 from author
4)
5 SELECT *
6 from a
7 WHERE Age > 10000

Of course, you can also create a separate view of "a" so that you can reuse the derived table in a broader context. More information can be found in the following article (Http://en.wikipedia.org/wiki/View_%28SQL%29).

What have we learned?

We have repeatedly emphasized that, in general, SQL statements are references to tables, not to fields. Take advantage of this and don't be afraid to use derived tables or other more complex statements.

Viii. a GROUP by in an SQL statement is an operation on a reference to a table

Let's recall the previous from statement:

1 from A, b
Now, we apply the GROUP by to the above statement:

1 GROUP by a.x, A.Y, b.z
The result of the above statement is that a reference to a new table with three fields is generated. Let's take a closer look at the phrase: When you apply group BY, columns that do not use aggregate functions after SELECT are shown behind group by. (Translator Note: The main idea is "when you are using GROUP by, you will be able to do the next level of logical operation of the column is reduced, including the column in Select").

It is important to note that the other fields are able to use aggregate functions:
1 SELECT a.x, A.Y, SUM (A.Z)
2 from A
3 GROUP by a.x, A.Y

It is also worth noting that MySQL does not adhere to this standard, which is indeed a confusing place. (Translator Note: This is not to say that MySQL does not have a GROUP by function) but not to be confused by MySQL. GROUP by changes the way the table is referenced. You can refer to a field in SELECT as well as group by in Group by.

What have we learned?

GROUP by, once again, is a reference to a table that has been manipulated to convert it into a new way of referencing.

Nine, the SELECT in the SQL statement is essentially a mapping of the relationship

I personally prefer the word "mapping", especially when it is used in relational algebra. (Translator Note: The original word for projection, the word has two layers of meaning, the first meaning is prediction, planning, design, the second meaning is projection, mapping, after repeated deliberation, I think the map can be more intuitive to express the role of SELECT). Once you've built up a reference to a table, modified, deformed, you can map it to another model step-by-step. The SELECT statement is like a "projector", which we can interpret as a function that transforms the data in the source table into the data of the target table in a certain logic.

With the SELECT statement, you can manipulate each field to generate the required data through a complex expression.

The SELECT statement has many special rules, at least you should be familiar with the following:

    1. You can only use fields that are derived from the table reference;
    2. If you have a GROUP BY statement, you can only use the fields or aggregate functions that follow the group by statement;
    3. When you do not have GROUP by in your statement, you can use the window-opening function instead of the aggregate function.
    4. When you do not have GROUP by in your statement, you cannot use aggregate functions and other functions at the same time;
    5. There are ways to encapsulate common functions in aggregate functions;
    6. ......

Some of the more complex rules are more than enough to write another article. For example: Why can't you use common functions and aggregate functions in a SELECT statement without GROUP by? (4th of the above)

The reasons are as follows:

    1. Intuitively, this approach is logically illogical.
    2. If intuition cannot persuade you, then the grammar certainly can. The sql:1999 standard introduced the GROUPING SETS,SQL:2003 standard with the introduction of group Sets:group by (). Whenever an aggregate function appears in your statement, and there is no explicit GROUP by statement, an ambiguous, empty GROUPING SET is applied to this SQL. Therefore, the rules of the original logical order are broken, and the mapping (that is, the SELECT) relationship first affects the logical relationship, followed by the grammatical relationship. (Translator Note: This paragraph of the original is more difficult, can be easily understood as follows: in the existing aggregate functions and common functions of the SQL statement, if there is no group by group, the SQL statement by default, the entire table is a grouping, when the aggregate function on a field aggregated statistics, each of the referenced table The record loses its meaning, and all the data is aggregated into a single statistic, and you have no sense in using other functions for each record at this time.

Are you confused? Yes, me too. Let's go back and look at something plain.

What have we learned?

The SELECT statement is probably the hardest part of the SQL statement, although he looks simple. The effect of other statements is actually a different form of reference to the table. While the SELECT statement consolidates these references together, the source table is mapped to the target table by a logical rule, and the process is reversible, and we can clearly know how the target table's data is coming from.

To learn a good SQL language, you need to understand the other statements before using the SELECT statement, although select is the first keyword in the syntax structure, but it should be our last master.

X. Several simple keywords in SQL statements: DISTINCT, UNION, ORDER by and OFFSET

After studying the complex SELECT Henan Opera, let's look at some simple things:

    • Set operations (DISTINCT and UNION)
    • Sort operations (Order By,offset ... FETCH)

Set operation (set operation):

The main operation of set operations lies in the collection, in fact, refers to a table operation. Conceptually, they are well understood:

    • DISTINCT Data deduplication after mapping
    • UNION combines two sub-queries and goes back
    • UNION all stitching up two subqueries but not heavy
    • EXCEPT to remove the results from the first subquery from the second word query
    • INTERSECT retains the results from two subqueries and goes back to

Sort operations (ordering operation):

A sort operation is not related to a logical relationship. This is a SQL-specific feature. The sort operation is executed not only at the end of the SQL statement, but also in the process of running the SQL statement. Use ORDER by and OFFSET ... FETCH is the most efficient way to ensure that data is sorted in order. All other sorting methods have some randomness, although the resulting sorting results are reproducible.

OFFSET ... Set is a statement that does not have a uniform syntax, and different databases have different expressions, such as the LIMIT for MySQL and PostgreSQL ... OFFSET, SQL Server, and Sybase TOP ... START at and so on. Specific about OFFSET: Different syntax for FETCH can refer to this article

(http://www.jooq.org/doc/3.1/manual/sql-building/sql-statements/select-statement/limit-clause/).

Let's use sql! in our work

Just like in other languages, it's a lot of practice to learn SQL language well. The 10 simple steps above will help you get a better understanding of the SQL statements you write every day. On the other hand, you can accumulate a lot of experience from common mistakes. The following two articles describe some of the common SQL errors that JAVA and other developers have made:

    • Ten Common mistakes Java developers make when Writing SQL
    • Common Mistakes Java developers make when Writing SQL

Turn http://blog.jobbole.com/55086/

Ten steps to fully understand SQL

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.