2.1 Structure of the relational database
A relational database consists of a collection of tables (table) with unique names for each table. For example, the instructor table records information about the teacher, which has four header columns: ID, Name, Dept_name, and salary. Each row in the table records a teacher's information, including the teacher's ID, name, dept_name, and salary. Similarly, the course table contains information about the course, including course_id, title, Dept_name, and credits for each course. Note that each teacher is identified by the value of the ID column, and each course is identified by the value of the course_id column.
The third table is Prereq, which contains information on the refresher courses for each course. The table has course_id and prereq_id two columns, each of which consists of a course pair, which shows that the second course is the first course.
As a result, each row in the Prereq table identifies the link between the two courses: one course is a first course in another. As another example, we examine the instructor table, where rows in a table can be considered to represent a link from a specific ID to the corresponding name, dept_name, and salary values.
In general, a row in a table represents a link between a set of values. Since a table is a collection of such links, the concept of the table and the teaching relationship is closely related, which is the origin of the relational data model name. In mathematical terms, a tuple (tuple) is simply a sequence (or list) of a set of values. A connection between n values can be mathematically represented by an n-tuple (n-tuple) about these values, in other words, an n-tuple is a tuple with n values that corresponds to a row in the table.
Instructor Relationship
| Id |
Name |
Dept_name |
Salary |
| 10101 |
Srinivasan |
Comp. Sci. |
65000 |
| 12121 |
Wu |
Finance |
90000 |
| 15151 |
Mozart |
Music |
40000 |
| 22222 |
Einstein |
Physics |
95000 |
| 32343 |
El said |
History |
60000 |
| 33456 |
Gold |
Physics |
87000 |
| 45565 |
Katz |
Comp. Sci. |
75000 |
| 58583 |
Califieri |
History |
62000 |
| 76543 |
Singh |
Finance |
80000 |
| 76766 |
Crick |
Biology |
72000 |
| 83821 |
Brandt |
Comp. Sci. |
92000 |
| 98345 |
Kim |
Elec. Eng. |
80000 |
Course Relationship
| course_id |
Title |
Dept_name |
Credits |
| BIO-101 |
Intro. To biology |
Biology |
4 |
| BIO-301 |
Genetics |
Biology |
4 |
| BIO-399 |
Compultational Biology |
Biology |
3 |
| CS-101 |
Intro. to computer |
Comp. Sci. |
4 |
| CS-190 |
Game Design |
Comp. Sci. |
4 |
| CS-315 |
Robotics |
Comp. Sci. |
3 |
| CS-319 |
Image processing |
Comp. Sci. |
3 |
| CS-347 |
Database System Concepts |
Comp. Sci. |
3 |
| EE-181 |
Intro. To Digital Systems |
Elec. Eng. |
3 |
| FIN-201 |
Investment Banking |
Finance |
3 |
| HIS-351 |
World History |
History |
3 |
| MU-199 |
Music Video Production |
Music |
3 |
| PHY-101 |
Physical Principles |
Physics |
4 |
Prereq relationship
| course_id |
prereq_id |
| BIO-301 |
BIO-101 |
| BIO-399 |
BIO-101 |
| CS-190 |
CS-101 |
| CS-315 |
CS-101 |
| CS-319 |
CS-101 |
| CS-347 |
CS-101 |
| EE-181 |
PHY-101 |
Thus, in the terms of the relational model, the relationship is equivalent to a table, and the tuple is equivalent to a row, similar to a property equivalent to a column. Visible instructor relationships have four properties: ID, Name, Dept_name, and salary.
We use the term relationship instance to represent a specific instance of a relationship, that is, a specific set of rows that are contained. An instance of instructor is visible with 12 tuples, corresponding to 12 teachers.
Because relationships are tuple collections, the order in which tuples appear in relationships is irrelevant , that is, they are the same set of tuples, no matter how they are sorted.
For each property of a relationship, there is a collection of allowed values, called the domain of the property . So the domain of the salary attribute of the instructor relationship is the collection of all possible wage values, and the domain of the name attribute is the collection of all possible teacher names.
We require that all properties of R be atomic for all relationship R. If the elements in the field are considered to be non-sub-units, the fields are atomic. For example, suppose that the instructor table has an attribute phone_number, which holds the teacher's set of phone numbers, then Phone_number is not atomic because a set of phone numbers can also be subdivided into individual phone numbers.
An empty (null) value is a special value that has an unknown or nonexistent identity value. If a teacher does not have a phone number, or does not provide it, then we can only use null values to emphasize that the value is unknown or nonexistent.
2.2 Database Schema
When we talk about databases, we have to differentiate between database schemas and database instances , which are logical designs of databases, which are a snapshot of the data in a given moment of the database.
The concept of relationships corresponds to the concept of variables in programming languages, and the concept of relational patterns corresponds to the concept of type definitions in programming languages. The concept of a relationship instance corresponds to the concept of the value of a variable in a programming language. The value of a given variable may change with the event, and similarly, when the relationship is updated, the content of the relationship instance changes with the event. On the contrary, the pattern of relationships is not always changing.
Although it is important to know the difference between relationship patterns and relationship instances, we often use the same name, such as instructor, to refer to both the pattern and the instance. We explicitly indicate a pattern or an instance when needed. For example, "Instructor mode" or "an instance of a instructor relationship." However, in cases where the meaning of a pattern or instance is clear, we simply use the name of the relationship.
Examine the relationship of department, the pattern of which is:
Department (Dept_name, Building, Budget)
Department relationship
| Dept_name |
Building |
Budget |
| Biology |
Watson |
90000 |
| Comp. Sci. |
Taylor |
100000 |
| Elec. Eng. |
Taylor |
85000 |
| Finance |
Painter |
120000 |
| History |
Painter |
50000 |
| Music |
Packard |
80000 |
| Physics |
Watson |
70000 |
Note that the attribute Dept_name appears both in instructor mode and in department mode. Such repetition is not a coincidence. In fact, using the same property in relational schemas is one way to associate tuples of different relationships. For example, let's say we want to find out about all the teachers working in the Watson building. We first find all the dept_name of the Department of Watson in the department relationship. Then, for each of these departments, we find the teacher information corresponding to Dept_name in the instructor relationship.
2.3 Yards (main code-external code)
We must have a way to differentiate between different tuples in a given relationship. This is indicated by their properties. That is, the attribute value of a tuple must be able to uniquely differentiate a tuple .
A Superkey is a collection of one or more properties that uniquely identifies a tuple. For example, the ID is a instructor, which uniquely identifies a tuple of instructor.
The extra code may contain insignificant attributes. For example, the combination of ID and name is a instructor, but the name is irrelevant, but its true subset ID is also a super-code, and any true subset of the ID collection is no longer a instructor super-code. We are usually interested in such a number of super-code, any of their true subset can not be called Super code, such a minimum code is called candidate code .
We use the term Master code to represent the candidate code that is selected by the Database Designer and is used primarily to differentiate different tuples in a relationship. Code is a property of a whole relationship, not a single tuple. Any two different tuples in a relationship are not allowed to have the same value on the code attribute at the same time.
A relational pattern, such as R1, may include a master code for another relational pattern (such as R2) in its properties. This attribute is called the Outer Code of the reference R2 on R1 . The relation R1 is also called the reference relation of the outer code dependence,R2 is called the reference relation of the outer code. For example, the Dept_name property in instructor is a foreign code on instructor, and it references department because Dept_name is the main code of department.
Now examine the section and teaches relationship. The following requirements are reasonable: If a course is divided into classes, it must be taught by at least one teacher, but it may be taught by more than one teacher. To impose this constraint, we need to ensure that if a particular (course_id, sec_id, semester, year) combination appears in the section, then the combination must also appear in teaches. However, this set of values does not constitute teaches's main code, because more than one teacher may teach the same course segment. As a result, we cannot declare the outer code constraint from section to teaches. The constraints from section to teaches are referential integrity constraints .
2.4 Pattern Diagram
A database schema with a master code and an external code dependency can be identified by a schema diagram. Shows the pattern of our university organization. Each relationship is identified by a rectangle, the name of the relationship is displayed above the rectangle, and the attributes are listed within the rectangle. The main code attribute is underlined with an underscore. External code dependency is identified by an arrow from the outer code attribute of the reference relationship to the main code attribute of the referenced relationship.
2.5 Relational Query Language
The query Language is the language in which users request information from the database. These languages are usually higher than the standard programming language level. The query language can be divided into procedural and non-procedural . In a procedural language, the user knows that the system performs some column operations on the database to calculate the desired results. In a non-procedural language, the user only needs to describe the information needed, rather than giving the exact process of getting the information.
2.6 Relational operations
All procedural relational query languages provide a set of operations that are either applied to a single relationship or applied to a pair of relationships. These operations have a good, and also required, property: The result of the operation is always a single relationship. This nature makes it possible to combine several of these operations in a modular way. In particular, because the result of a relational query is itself a relationship, relational operations can be applied to query results just as they are applied to a given set of relationships.
Filter tuples : The most common relational operation is to select a special tuple from a single relationship (such as instructor) that satisfies some particular predicate (such as $ SALARY>85 $000), which results in a new relationship, It is a subset of the primitive Relationship (instructor), and the tuples that do not satisfy the predicate are filtered out.
Select Properties : Another common operation is to pick a specific attribute (column) from a relationship. The result is a new relationship that contains only those attributes that are selected. For example, suppose we just find out the ID and salary of the instructor relationship, then the name of the teacher and the name of the department will not be detected.
Join Operations can combine two relationships by combining tuple pairs from two relationships into a single tuple, as described in the following ways. There are several different ways to connect to a relationship. For example, when the teacher information is found, but also to find out the information of the teacher's department, so that the instructor and department connected together. The Cartesian product operation is different, and its operation result is to contain all pairs from two relationship tuples, regardless of whether their property values match, such as a relationship total of 3 tuples, the other relationship has a total of 5 tuples, Cartesian product operation after a total of 15 tuples.
and the Operation is suitable for two "similar structure" of the table execution, such as an order table and the history of the order table, we have to find out the past and present all the orders, then we can put together the two tables.
The object of the relational operation is not only the established table, but also the result of the meta-calculation. For example, if we were to find the ID and salary of a teacher with a salary of more than $85 000, we could first select a tuple with a salary value greater than 85 000 dollars from the instructor relationship and then select the ID and salary two attributes from the result.
deduplication: Sometimes the query results contain duplicate tuples, we want to remove duplicates, for example we just want to find out the names of all the teachers in the instructor relationship, because the teacher may have the same name, then we can remove the duplicate name at the same time as the query.
Introduction to Relational Models