Go: How to learn SQL (PART II: Understanding SQL from a relational perspective)

Source: Internet
Author: User
Tags joins microsoft sql server 2005 sql server books scalar table definition

Transferred from: Http://blog.163.com/[email protected]/blog/static/285720652010950825538/

6. Understanding SQL from a relational perspective

6.1. Relationships and Tables

As we all know, the database we use today is usually a relational database. The relationship naturally lies in its critical position. Beginners of the database principle may be confused about the relationship and what the table is, if not clearly understood, it is likely that the concept of relationship is not meaningful, will only cause confusion.
In fact, these two sets of concepts are only due to the difference between the theoretical and technical circles. The former requires a professional, non-ambiguous concept for theoretical discussion, while the latter hopes to use an intuitive, easy-to-understand vocabulary in practical applications. Typically, relationships and tables can be considered one thing.

In terms of definition: A relationship is a collection of tuples (that is, records of a table, or rows). In addition, the relationship has the following characteristics:
-A relationship contains a set of attributes (that is, a field of a table, or a column), and a relationship with N attributes can be called an n-ary relationship.
-a tuple of a relationship has the same attributes as the relationship, and the tuple of the N-ary relationship is an n-tuple, with one value for each attribute in a tuple.
-a domain of a property (that is, the data type of the field, but the requirements for the domain are more stringent, see "Data Type" below), which is the collection of all possible values for the property.

From here you can see the difference between a relationship and a table: a relationship as a collection that does not contain repeating tuples, and a table that can contain duplicate records. This is one of the many accusations that SQL faces, but it has its technical rationality. The differences here have little effect on comprehension, so consider the table as "a collection that may (but should not) be duplicated." Noting this difference, we can use the relationship and table without distinction.

In addition, the relationships and tables here refer to all table-valued things, including physical tables, virtual tables (views), derived tables (a subquery used in the FROM clause), table variables, table-valued functions, and so on. They are physically different, but logically equivalent.

6.2. Relational Model

The process of database modeling (i.e., table structure design) is to design a relational data model representing and storing business data according to the real-world business requirements. The E-R model can be used to simplify the problem during the design process, because the E-R model can be more visually relevant to the real world and can easily be transformed into a relational model. For skilled designers, we can omit the E-R model and construct the relationship model directly.

The relationship model can be directly represented in the relational database, so the relationship model is not very different from the physical model. The physical model usually simply adds the necessary indexes as needed, or physically maps the conceptual tables to partitioned views or partitioned tables.

The relationships of the above models are as follows:


A brief summary of the two key points in the design of the relational model:

1. Integrity constraints (Integrity constraint):

Integrity constraints Guarantee data consistency (in accordance with basic criteria) and contain the following 3 types:
-Entity integrity (PRIMARY KEY constraint): The primary key for a table cannot be empty.
-Referential integrity (FOREIGN KEY constraint): A table's foreign key must exist in the referenced table.
-Custom Integrity (check constraint, UNIQUE constraint): The data in the table cannot violate the conditions defined by the constraint (you cannot make the check expression false, and you cannot have duplicate values for the field or field combination of a unique constraint).

Integrity constraints define the boundaries of the system conceptual model, which is very important to prevent dirty data from entering the system, because dirty data tends to be more annoying than no data (this is the opposite of "wrong view over No idea").

When designing table structures, foreign keys, CHECK, and unique constraints may be properly omitted (for operational performance and development efficiency, and trust that the table data is only modified by uniform stored procedures and no dirty data is present), but the primary key is usually required. The primary key not only means that it is efficient to query (because the primary key of the DBMS is usually implemented through a B + Tree clustered index), and more importantly, it clearly illustrates what the data in the table is uniquely identified. (At the moment I only find a situation where the primary key is not needed: the log table--there may be multiple records at the same time, so datetime cannot be the primary key, and an incremented logid is not significant, see questions about the clustered index selection scheme.) )

For a selection of primary keys, see a basic question.

I am still puzzled by the sample of the table data and (possibly) the outdated data dictionary and program documentation. If you show me the complete table definition (including the various integrity constraints, especially the primary key), you usually don't need to look at the data sample in the table, even the document can be omitted. (These two words borrowed from Brooks in the "man-month myth" of the words. See footnote to section 1.6, "UNIX programming Art". )

2. Paradigm (normal Form, NF):

A paradigm is the principle of a set of relational (table) designs that prevent the updating of data by avoiding redundancy (the embodiment of the dry principle). The following 3 levels of paradigms are commonly used in practice:
-1NF: The fields in the table are atomic.
-2NF: All fields in a table can be determined by the primary key (function dependent).
-3NF: In addition to the full primary key, there is no decision relationship (function dependency) between the other fields (including the partial primary key).

First, explain the "atom" of 1NF. This "atom" means that the business requirements do not need to be split (without preconditions, the word "split" is interpreted in many ways, such as a string can be split into characters, and integers can be split into binary bits or elements of the product). For example, the city, street, house number is the three parts of the address, if the address is just as a record, do not need more granular processing, you can have three parts of a field, if you need to query and group statistics according to urban areas, you need at least the city as a separate field. Therefore, a field that is not "atomic" must be defined according to the requirements of the business. In practice, the business requirements will change, so the system design also needs some foresight. The current field of an atom may not be atomic as the demand changes.

The paradigm gives the principle of how a set of tables should be designed, but does not show how to design the table as such. The relational paradigm decomposition of database theory is too abstract, the following is a practical idea:
-1NF: Each field in the table does not need to be split (at least it does not require too complex split processing). such as the structure of the name is very simple, usually do not need to design into a first name and two fields, but if it is an international system, the structure of the names of different cultures can be different, then it is best to separate LastName and FirstName, such as Facebook, Twitter and other web site design.
-2NF: Defines the primary key for the table. See the above discussion on entity integrity.
-3NF: Do not store related or derived data in the same table, store only primary data, and other data are obtained through join queries or calculations. If you do not store both the Department ID and the department name (related data) in the employee table, or both the date of birth and age (derived data), where the department ID or birth date is the primary data, the department name can be obtained through a join query, and age can be obtained by calculation.

In some cases, the design of the inverse paradigm may be required for structural visualization or query performance considerations. such as a comma-separated number of values in a string field (such as ' 1,2,3,5,8 ', violation of 1NF), or in a table at the same time to store related data or derived data to avoid joins or computational overhead (such as storing Department ID and department information at the same time to avoid joining the departmental table, or to store employees Total payroll to avoid complex payroll calculations, violating 3NF). Inverse paradigm design can bring complex query processing or redundancy, the better solution is the basic data with the form of the table storage, through a unified process to calculate and refresh the buffer table to improve the performance of the query, see "The programmer of the Way" 7th section on the dry principles of discussion.

6.3. Relational Operations

The query of the table is equivalent to the relational operation defined by the relational algebra (relational Algebra). Understanding relational operations may simplify the understanding of queries.

The basic relational operations used are only 4 classes (simple enough):

1. Basic Set operation (binocular operation)

A relationship is a collection of tuples, so a relationship also supports basic set operations:
-and (Union): corresponds to the SQL keyword Union
-Turn (intersection): Corresponding SQL keyword intersect
-Poor (difference): Corresponding SQL keyword except

The difference is that the set operation of the relationship requires that the two relationships that participate in the operation must contain the same set of attributes (the number and type of attributes are the same).
Since tables allow duplicate records, the above three operations in SQL can also be in the form of Union All/intersect all/except all, resulting in no duplication of records.

2. Extraction of a part of a relationship (single-mesh operation)

-Select (selection): Filters out the specified tuple (row) according to the criteria, corresponding to the WHERE clause of the SQL query
-Projection (projection): Filters out the specified attributes (columns) According to the list, corresponding to the SELECT clause of the SQL query

Because the table allows duplicate records, the projection operation of the relationship is in fact equivalent to the effect of select DISTINCT. The default effect of select is not to remove duplicate records.

3. Join of two relationships (binocular operation)

-Cartesian product (Cartesian product): Corresponds to the SQL keyword cross JOIN (with the same comma-separated effect as multiple tables from the From)
-INNER JOIN (Inner join): Corresponding SQL keyword Inner join
-Outer JOIN (Outer join): Corresponding SQL keyword {left | Right | Full} OUTER JOIN

4. Aggregation operation (single-mesh operation)

Groups according to the specified attributes (columns), and you can use aggregate functions. The GROUP BY clause that corresponds to the SQL query.

The above 4 types of relational operations, whether monocular or binocular, are still a relationship and can continue to operate.

In general, SQL queries, in addition to some special language features (such as top, sort functions, etc.), the main query logic is the combination of these 4 types of relational operations.

6.4. Data Query

1. Logical Processing of queries

In the case of T-SQL, the logical processing of a query (the full SELECT statement) is as follows (where the numbers in parentheses denote the processing order):
[Code=sql]
(8) SELECT (9) DISTINCT (one) <TOP_specification> <select_list>
(1) from <left_table>
(3) <join_type> join <right_table>
(2) on <join_condition>
(4) WHERE <where_condition>
(5) GROUP by <group_by_list> (6) with {CUBE | ROLLUP}
(7) Having (ten) ORDER by <order_by_list>
[/code]
Description
+ some clauses are optional. For example, a join can occur 0 to several times, and GROUP by and having may occur 0 to 1 times.
+ from the above sequence, it can be seen that the WHERE clause cannot use the computed result of select, but in the ORDER BY clause.
+ The logical processing of a query may not be the same as the physical processing process. But for the study of SQL, it is necessary to understand the logical process first. First you need to know how to calculate the correct results, only to talk about how to calculate the correct results more efficiently.

This is described in the 1th chapter of Microsoft SQL Server 2005 Technology Insider: T-SQL query.

2. Query criteria

In SQL Server Books Online, the BNF syntax diagram for query criteria is as follows:
[Code=sql]
Search Condition
< search_condition >:: =
{[NOT] <predicate> | (<search_condition>)}
[{and | OR} [not] {<predicate> | (<search_condition>)} ]
[,... N]
<predicate>:: =
{Expression {= | < > |! = | > | > = |! > | < | < = |! <} expression
| String_Expression [NOT] like string_expression [ESCAPE ' Escape_character ']
| expression [not] between expression and expression
| expression is [not] NULL
| expression [NOT] in (subquery | expression [,... n])
| Expression {= | < > |! = | > | > = |! > | < | < = | | <} {ALL | SOME | Any} (subquery)
| EXISTS (subquery)}
| CONTAINS ({column | *}, ' < Contains_search_condition > ')
| FREETEXT ({column | *}, ' freetext_string ')
[/code]
Where: predicate is asserted, expression is a scalar, and subquery is a subquery. The query statement returns a result that causes the query condition to be true.

3. Sub-query

A query is called a subquery if it is part of a statement.

A. Classification by results:
-Scalar subquery: If the result of the query is a scalar value, that is, a single row and one column, then the scalar subquery (a scalars expression).
-table-valued subquery: The inverse is a table-valued subquery (table-valued expression).

B. Whether the query involves an outer table classification:
-Self-contained subquery: subqueries that do not involve outer tables are unrelated subqueries, such as select A.* from a WHERE a.id in (select b.ID from B).
-Correlated subquery: The inverse is a correlated subquery, such as select A.* from a WHERE EXISTS (select * from b where b.id = a.id).

C. By sub-query where the location of the classification:
-In search_condition: a subquery in the query condition, such as all subquery in the syntax diagram above.
-In FROM clause: a subquery in the FROM clause, also known as a derived table, such as a select * FROM (SELECT * from a) TMP, the derived table must specify a table alias. (Common table Expressions, which is supported after SQL Server 2005, can be considered a variant of a derived table, but can be used multiple times in a query and supports advanced features such as the recursive CTE, see Books Online.) )
-In SELECT clause: Sub-query in the SELECT clause, such as select D.depid, ManagerName = (select E.empname from Employee e WHERE e.empid = D.manag Erid) from Department D WHERE ..., this seed query performance is poor, usually can be replaced by join. If possible, avoid using subqueries in the SELECT clause.

6.5. Data modification

In SQL Server, statements that modify data (add, delete, change) support the following formats:

1. Add (INSERT)
[Code=sql]
INSERT into <table> (<column_list>) VALUES (<values>)
INSERT into <table> (<column_list>) SELECT <values> from ...
INSERT into <table> (<column_list>) EXEC <usp>
SELECT <values> into <table> from ...
[/code]
Above 4 statements:
The 1th one is the SQL standard INSERT statement (SQL Server 2008 also supports specifying multiple tuples in the values clause);
The 2nd and 3rd are INSERT statements for T-SQL extensions, but require that the result set of the SELECT statement and the exec stored procedure match the specified number of Insert column fields in the target table and the data type one by one (or can be implicitly converted);
The 4th is not an INSERT statement, but instead creates a table based on the result set of the SELECT statement and inserts the resulting data into it, noting the difference from the 2nd statement.

2. Delete (delete)
[Code=sql]
DELETE from <table> [WHERE <where_condition>]!!!
DELETE from <table> from <table> joins <another_table> on <join_condition> WHERE <where_ Condition>
TRUNCATE TABLE <table>
[/code]
Above 3 statements:
The 1th is the SQL standard DELETE statement (WHERE clause omits the result is to delete all data, note!) );
The 2nd is the DELETE statement for the T-SQL extension, and the effect is to delete the target table data that conforms to the Join query criteria (change delete from <table> to select DISTINCT <table>.* to see what data is deleted);
The 3rd is actually DDL, not DML, and requires permissions and run conditions that are different from delete, but the effect is to clear all the data in the table and be more efficient than delete.

3. Change (UPDATE)
[Code=sql]
UPDATE <table> SET <col> = <new_value> [WHERE <where_condition>]!!!
UPDATE <table> SET <col> = <new_value> from <table> JOIN <another_table> on <join_condit Ion> WHERE <where_condition>
[/code]
Above 2 statements:
The 1th one is the SQL Standard UPDATE statement (WHERE clause omits the result is to update all data, note!) );
The 2nd is an UPDATE statement for the T-SQL extension that updates the target table data that meets the join query criteria to the specified result (update <table> Set <col> = <new_value> is changed to select < Table>.<col> <new_value> can see which data is updated to which new values, where <new_value> can be the computed value of the join query, but if the join query results make the <col of the target table > and new values <new_value> become one-to-many relationships, <col> updates to which <new_value> is not deterministic, which can lead to unexpected bugs.

This is described in the 8th chapter of Microsoft SQL Server 2005 Technology Insider: T-SQL query.

6.6. Logical meaning of a table

Many people who use the database, do not understand the principle of the database, can not logically understand the meaning of the table, so that only the table as a data structure, as a similar to a two-dimensional array of things, so write a low-efficient loop statement is not difficult to understand, data consistency is difficult to ensure.

The table can be understood from the following two levels:

1. A table is a collection of things (object objects and fact facts collectively), where each row of a table represents a class of things, and a primary key is the unique identity of a thing.
such as: Student table (#学号, name, professional, ... ) is a collection of students (objects), a school number can uniquely identify a student; the students choose the curriculum (#学号, #课程ID, course selection time) is a collection of students elective course (fact), the joint primary key number and the course ID can uniquely identify the fact that a student selected a course.

Database modeling is the design of a set of tables based on business requirements to represent everything in a business system.

2. The table structure of a table defines a predicate, and each row of records in a table is a true quantization of the predicate. Since the quantized predicate is a proposition, a table is a set of true propositions.
(for the concept of predicates and quantification, see the section on data logic in the book "Discrete Mathematics and its Applications (5th edition)"). )
Such as: The student selected the timetable (#学号, #课程ID, the course time) defined a predicate-"Student {#学号}, in {Class time}, selected course {#课程ID}." ", where {} is part of a variable. A predicate is not a proposition, but a proposition only after it is quantified (or instantiated). A row of records of the table (' S001 ', ' C0001 ', ' 2010-08-24 17:16:58 ') represents a true quantization, the quantified proposition is "student S001, in 2010-08-24 17:16:58, selected course C0001." ”。

A database system contains many tables, each of which is a set of true propositions, all of which represent trusted knowledge in the system.
Designing a table is the design of a predicate. As long as the table structure document illustrates the meaning of this predicate, the meaning of the record in the table is naturally clear; conversely, if the predicate of a table is ambiguous or ambiguous, the records in the table are meaningless.

Understand the integrity constraints from this perspective:
-Entity integrity (PRIMARY KEY constraint): If a table contains exactly the same two records, then repeating a true proposition does not increase knowledge; If two different records of a table have the same primary key, the two propositions are conflicting. Entity integrity guarantees that each proposition is unique and conflict-free.
-Referential integrity (FOREIGN KEY constraint): Referential integrity guarantees that everything involved in each proposition is meaningful (defined in the table of the thing).
-Domain integrity (check constraint): Domain integrity guarantees that each proposition is consistent (does not violate check constraints). "Each proposition is consistent" is the necessary non-sufficient condition that "every proposition is correct", so the constraint is only a minimum guarantee.

Trackback:http://topic.csdn.net/u/20100826/18/ba72991f-f961-4c18-8d9b-f234c87a609d.html

Go: How To learn SQL (PART II: Understanding SQL from a relational perspective)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.