[Introduction to Cartesian product SQL query in]sql statement

Source: Internet
Author: User
Tags first string joins one table

In this article, the main description of the various connections in SQL and the scope of use, as well as the further interpretation of relational algebra and relational algorithms on the same query in different ways.

Introduction to multi-table connections

In a relational database, a query tends to involve more than one table, because few have a single table, and if most queries involve only one table, then that table is often lower than the third paradigm, with a lot of redundancy and exceptions.

Therefore, a join is an important means of connecting multiple tables into a single table.

For example, a simple two-table connection student table (Student) and class table:

After the connection


Cartesian product

The Cartesian product is implemented in SQL as a cross join. All connections are made to the temporary Cartesian product table, the Cartesian product is a concept in relational algebra, representing two tables of each row of data any combination, the two tables connected to the Cartesian product (cross connection)

In the practical application, the Cartesian product itself is mostly not practical use, only in two tables connected with the restrictions, it will have practical significance, the following see the inner connection


Internal connection

If it is understood in steps, the inner connection can be considered as a cross-linking of two tables, and then by adding a restriction (in SQL through the keyword on) to exclude a subset of the rows that do not meet the criteria, the result is an inner connection. The above figure, if I add a restriction

For the first two tables, if the query statement is as follows:


SELECT * from  [Class] C       Inner joins        [Student] s on       c.classid=s.studentclassid


The above query statements can be partial understanding, first the class table and the student table to cross-connect, generated the following table:

Then, with the constraints on the back, select only those columns that are equal to the Studentclassid and ClassID (the green ones), and finally, the subset of the selected table

Of course, the restrictions after the inner join on are not only the equals sign, but you can also use comparison operators, including > (greater than), >= (greater than or equal), <= (less than or equal to), < (less than),!> (not greater than),!< (not less than), and <> (Not equal to). Of course, the data types of the two columns involved in the constraint must match.

For the above query statement, if the on after constraint is changed from equal to greater than:


SELECT * from  [Class] C       Inner joins        [Student] s on       c.classid>s.studentclassid


The results are filtered from the Cartesian product of the first step to a subset of those classid greater than Studentclassid:

Although the attached table above is of no practical significance, this is only used as a demo:-)


Relational calculus

The concept of the Cartesian product above is the concept of relational algebra, and I mentioned in the previous article that there is a query method for relational calculus. The above relational algebra is a distributed understanding, the above statement derivation process is this: "The table student and class are connected inside, Match all classid and studentclassid equal rows, select all Columns "

and the relational algorithm, more attention is what I want, for example, the same query above, using relational algorithm to think the way is "to find all the students information, including their class information, class ID, student ID, student name"

The SQL query statement using the relational calculus method is as follows:


SELECT * from  [Class] C       ,       [Student] s  where c.classid=s.studentclassid


Of course, the results returned after the query are not changed:


External connection

Let's say the top two tables, students and classes. I added a student named Eric to the student, but for some reason forgot to fill in its class ID:

When I want to execute such a query: Give me the names of all the students and the classes they belong to:


SELECT s.studentname,c.classname from           [fordemo].[ DBO]. [Student] s       inner join        [fordemo].[ DBO]. [Class] C       on        s.studentclassid=c.classid


Results such as:

As you can see, this query "lost" Eric.

At this point, you need to use the outer connection, the outer connection can make the join table of the party, or both parties do not have to adhere to the connection restrictions on the following. Change the inner join in the query statement above to the left outer join:


SELECT s.studentname,c.classname from           [fordemo].[ DBO]. [Student] s left       outer join        [fordemo].[ DBO]. [Class] C       on        s.studentclassid=c.classid


The results are as follows:

Eric has come back again.


Right outer connection

The concept of right outer joins and left outer joins is the same, but the order is different, for the above query statements can also be changed to:


SELECT s.studentname,c.classname from           [fordemo].[ DBO]. [Class] C Right       outer join        [fordemo].[ DBO]. [Student] s       on        s.studentclassid=c.classid


The effect is the same as using the left outer join above.


Full outer connection

The full outer join is to output at least once per row of the left and right tables, connected by the keyword "full outer join", and can be seen as a combination of left and right outer joins.


Self-connect

Turning to self-linking, let's start with a table and a question first:

The above employee table (employee), because the manager is also a kind of employee, so put two kinds of people into a table, the Mangerid field represents the employee ID of the immediate manager of the current employee.

Now, my question is, how do I find the name of the manager of Careyson?

As can be seen, although the data is stored in a single table, but in addition to nested queries (this will be discussed in subsequent articles), only self-connection can be done. The correct self-connection statement is as follows:


SELECT m.employeename from  [fordemo].[ DBO]. [Employee] e  inner JOIN [fordemo].[ DBO]. [Employee] M  on  e.managerid=m.id and e.employeename= ' Careyson '


Before you explain the concept of self-linking in detail, take a look at a more illustrative example of the self-connected application:

This table is a table that attends the minutes of the meeting, each row represents the record of the meeting (here, I don't have to EmployeeID and Meetingid to express it because of the simple table, it's easier to understand the table by name)

Well, now my question is: find the people who participate in the "Talk about project progress" meeting and participate in the "Discuss career development" meeting.

At first glance, it's confusing, right? Maybe you see the first impression in your head is:


SELECT  employeename from  [fordemo].[ DBO]. [Meettingrecord] m  where meetingname= ' ¨??????????? ¨¨ ' and meetingname= ' ¨??????? ¨°??? ¤?é?1 '


(My Code highlighting plugin does not support Chinese, so the first string after the WHERE clause above is ' talk about project progress ' and the second is ' talk about career development ')

Well, congratulations, wrong answer. If you write like this, you will not get any data. The correct wording is to use self-connect!

Self-connected is a special kind of connection that is a way to connect to a physically identical but logically different table. I see Baidu Encyclopedia said since the connection is a special internal connection, but this is wrong, because two of the same table can not only be connected, but also outside the connection, cross-connect ... When you make a self-join, you must specify an alias for at least one of these tables to differentiate between the two tables!

Back to the example above, using self-linking, the correct wording is:


SELECT  m.employeename from  [fordemo].[ DBO]. [Meettingrecord] m,        [fordemo].[ DBO]. [Meettingrecord] m2  where m.meetingname= ' ¨??????????? ¨¨ ' and M2. Meetingname= ' ¨??????? ¨°??? ¤?é?1 ' and        m.employeename=m2. EmployeeName


(Please refer to the above for garbled questions)


Multi-table Connection

Multiple table joins can actually be seen as n-1 two-table connections to n tables. This understanding makes the problem much easier!

For example, the top three tables, the first two tables we have already begun to understand in the article, suppose now added a teacher table, the three tables to the Cartesian product as follows:


SELECT * from  [fordemo].[ DBO]. [Class]       Cross Join        [fordemo].[ DBO]. [Teacher]       Cross Join        [fordemo].[ DBO]. [Student]


The results can be expressed as:


Summarize

This paper makes a brief introduction to various connection query methods in SQL, and uses some demos to explore the usefulness of various connections, and to master the principle of various connections is necessary to write a good SQL query!

-------------------------------------------------------------

No join condition results in Cartesian product
People who have studied linear algebra know that the Cartesian product is a popular saying that each member of the two set is associated with any member of the other party's collection. As you can imagine, in a SQL query, you produce a Cartesian product if you do not have join conditions for two table join queries. This is the most common case of performance problems caused by our Cartesian product: Developers miss the join condition when writing code.

SQL in which the Cartesian product occurs:

View Plaincopy to Clipboardprint?select sum (Project_fj.danjia*project_fj.mianji) from Project_fj,orderform where Project_fj.zhuangtai= ' not sold ' and project_fj.project_id=30
Select SUM (Project_fj.danjia*project_fj.mianji) from Project_fj,orderform where project_fj.zhuangtai= ' not sold ' and project _fj.project_id=30

This statement is only part of the SQL statement, the problem is that the other part uses the table OrderForm, so there is a orderform in the from, but the above part of the statement is not used at all OrderForm, but not set the condition leads to the Cartesian product.

Workaround: Use the left JOIN

View Plaincopy to Clipboardprint?select sum (Project_fj.danjia*project_fj.mianji) from PROJECT_FJ left JOIN OrderForm on p roject_fj.id=orderform.project_id
Where project_fj.zhuangtai= ' not sold ' and project_fj.project_id=30
Select SUM (Project_fj.danjia*project_fj.mianji) from PROJECT_FJ left JOIN OrderForm on Project_fj.id=orderform.project _id
Where project_fj.zhuangtai= ' not sold ' and project_fj.project_id=30

This article is from the "Suixufeng column"

[Introduction to Cartesian product SQL query in]sql statement

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.