I learned a Chinese Database textbook six years ago, but at that time my mind was not enlightened, Or I lacked a perceptual knowledge about the content of the textbook, so I had no effect after learning it. Recently, to create a system with CMS, we need to re-learn the database theory. Therefore, we have bought the basic tutorial for this database system, this series of articles is a summary and generalization of this study.
1.1 Database System
A database is the data and set managed by a DBMS.
DBMS must have the following functions: (1) You can use DDL to define the logical structure of data. (2) You can use dql and DML to query and update stored data. (3) supports efficient processing of massive data. (4) Persistence and recoverability. (5) concurrency control or atomicity. The file system does not meet the above five points.
Databases have gone through several generations of development, from the early hierarchy, network to modern relational. Relational databases originated from Ted codd's famous paper "relational model of large shared databases". Currently, data is evolving towards the object field. Many developments have been made between super large and small poles. in information-oriented integration, data warehouses are used to integrate multiple databases.
1.2 DBMS Overview
This section describes the basic modules of DBMS.
The preceding figure is explained as follows. There are two types of database users at the top. The first type is DBA. They create database modes. The other is for users and applications who query or modify data. The bottom of the figure is the physical storage. data is stored on the disk and written and read into the cache by the storage manager. Buffer Management is used to manage available memory and provide high-speed storage of block data. The record manager is used to manage data files, index files, and data formats. Managing this information helps you quickly retrieve required data from data files. The query processor includes the query compiler and execution engine. The query compiler converts queries into query sequences, including query analysis, preprocessing, and query optimizer. Transaction Management requires log and concurrency control. The execution engine is used to execute command sequences compiled at the upper layer and obtain data. It usually obtains data with buffering. Transaction processing includes log and concurrency control, and concurrency control requires support for Lock tables.
2. Relational Data Model
After introducing the basic concepts of the database, we will learn the basic concepts of the relational database.
2.1 Data Model
A data model includes three aspects: physical data structure, data operations, and data constraints. Typical data model relational model, header structured data model XML, oo Data Model
2.2 relational model Basics
The relational model can be viewed as a two-dimensional table. A link column is called an attribute.
The combination of the Link name and attribute is called the link mode, which is recorded as R (A, B, C,...). The Link Attributes can be arranged in any way.
The rows in the link are called tuples.
Each attribute of a link has a type called a field, which can be embodied in the link mode, such as R (A: String, B: int ...).
A set of meta groups in a link is called an instance of a link. Meta groups in a link can be arranged in any way.
A key is composed of one or more attributes. The key constraint ensures that no two tuples in a relational instance can have the same key. The key represents the attribute with an underscore. Such as R (ID, A, B ).
2.3 define the relationship in SQL
SQL has three types of relationships: Persistent storage relationships, that is, tables. Temporary Storage relationship, that is, view. The relationship temporarily generated by the DBMS itself is called a temporary table.
Data Type: only date and time are introduced here. In SQL, the date is expressed as 'date' 2013-01-24 'and the time is expressed as '21: 00: 100'. That is, the string value is enclosed by single quotation marks after the type.
Next we will use SQL to create a simple relationship:
create table Country( id int auto_increment, name char(32) unique default "", primary key(id));
Drop table country is required to delete a link. alter table country add location char (16)/drop name is used to modify the link. The primary key is declared with the primary key, the key of a single attribute can be described after the attribute. If multiple attributes constitute a primary key, the preceding method must be used. The attribute of primary key must not be the same and cannot be null. Unique must be unique but can be null. Note that null cannot be compared with null. If the unique attribute is not null, multiple null values may exist.
2.4 relational algebra
2.4.1 relational algebra Overview
The traditional relational algebra operations mainly include the following four types: (1) relational operations parallel, intersection, and difference. (2) Select an operation to remove some columns. (3) combined operations, including Cartesian Product Operations and join operations. (4) Rename. Relational algebra expressions are usually called queries.
2.4.2 set operations on links
The most common set operations are parallel r u s, cross R ∩ S, and difference r-s. These three operations are relatively simple, that is, the relational instance is used as a set for calculation, not to mention too much here. Note that R and S must have the same property set and the types of each property must also match.
2.4.3 projection π
Projection is used to generate a new relationship with only some columns from the relational R.
Expressionπ a1a2,... an (r)Indicates a link. It only contains attributes A1, A2 ,...., the column represented by an, that is, the new link can be expressed as S (A1, A2 ,.... an ).
2.4.4 select σ
Select a child set to generate a new link that contains only the r tuples. The new link tuples must meet the condition c Based on the R attribute. Select expressionσ C (r). [Sigma]. For example, for a transcript table, you can select a student whose score is above 90: σ grade> 90 (R ).
2.4.5 Cartesian Product
Relational R (AI) x s (BI) is a set of ordered pairs. The first element of an ordered pair is any tuple AI in R, the second element of the ordered pair is the BI of any tuples in S. The number of tuples in the product is the product of the number of tuples in two relations.
The Relationship Mode of product is R and S. If the two have the same properties, you need to change the corresponding attribute names in a link to different names. For the sake of clarity, if attribute a is both the property of R and S, it can be recorded as r.a and S. A respectively in the accumulation.
2.4.6 natural connectionBytes
The link can also be connected according to the conditions. If R and S share the same attributes A1, A2 ,..., an, R, and s are two relational tuples. if and only when R and S are in the common attributes A1, A2 ,..., when an is in the same phase, the tuples R and S can be connected. This is called a natural connection.R ⋈ s. For example, for relation student (ID, name) and grade (ID, grade), student and grade can be naturally connected Based on ID in order to find the Student name and score.
In a connection, if a tuples cannot be connected to any tuples in another link, they are called floating tuples.
2.4.7 θ connection
θ connections are extensions of natural connections. They do not have to be connected based on equal attributes, but can be connected based on any condition C.R branch CS. You can construct the result of rxs first, and then find the tuples that meet the condition C in the result. For example, for student (ID, name, age) and teacher (ID, name, age), we want to find students older than the teacher.R ⋈ S. age> T. Ages.
2.4.8 rename P
P s (r) indicates renaming the link to S. If you still need to rename the column while renaming, it is expressed as p s (A1, A2 ,..., an) (R), that is, the names of R columns are called Ai.
2.4.9 relationship between operations
Some of the operations described above can be expressed as equivalent by a combination of other operations. The six operations are parallel, differential, selection, projection, product, and rename to form an independent set. Each operation cannot be implemented by other operations in the set.
2.5 link Constraints
2.5.1 constraint Representation
The premise is that R and S are links. If R = tuples, no tuples exist in R. R ⊆ s indicates that all the tuples of R appear in S.
2.5.2 reference integrity constraints
Attribute a in relational R. If it references attribute B in relational S, the component of any of the tuples of R in attribute, the tuples of link S must be included in the component of attribute B, expressedπA (r) BranchπB (s)
.
2.5.3 key constraints
Relationship R (A1, A2,..., an), assuming A1 is the key attribute of the link, the A2-An of A1 must be the same for the two tuples. Record the rename relationship in which s is R, then σ r. A1 = S. A1 and R. Ai! = S. AI (R * S) = artificial.
2.5.4 other constraints
Take the value constraints of the link property as an example. It is required that a of R (A) can only take [0-5], then σ a <0 and a> 5 (R) = random.
3. Package-based algebra and logic Query Language
3.1 relational algebra operations on the package
The relationships implemented by commercial DBMS are based on packages instead of collections. Considering a link as a package rather than a set can improve the efficiency of some relational operations. For example, a package can combine two relationships without removing the need for weight sorting. For the projection operation, you only need to select the data in the column and do not need to re-arrange the data.
3.1.1 package combination, delivery, and difference
Assume that R and S are packets, where the meta R appears n times in R and m times in S, then
In package r u s, the tuples R appear N + m times
In the package R ∩ S, the tuples R appear min (m, n) times,
In the packet R-S, The tuples R appear max (0, N-m) times
3.1.2 package projection operation
During the package projection operation, if multiple identical tuples are generated, the duplicates are not eliminated from the package projection results.
3.1.3 selection operation on the package
When selecting an operation on the package, duplicate tuples are not removed in the result.
3.1.4 pack of flute Products
Every tuples in a link are paired with every tuples in another link, regardless of whether the element is repeated.
3.1.5 package connection
First, compare the tuples in the two relationships to see if they can be paired. If yes, the paired tuples are a member of the result. Duplicate tuples are not required in the result.
3.2 extended relational algebra operations
3.2.1 eliminate duplicates
Delta (r) indicates that a relation R with no repeated tuples is returned. This operator converts a package into a set.
3.2.2 aggregation Operator
The aggregate operator is used to aggregate values that appear in a column of a link. Standard operators include the following:
1) sum generates the sum of a column
2) AVG generates the mean value of a column
3) min & Max generate the minimum and maximum values in a column respectively.
4) Count generates the number of values in a column, including duplicate tuples.
3.2.3 grouping operators
Sometimes people not only can calculate clustering for columns, but also need to Group by one or more attribute values, and then consider the clustering operation of group inner elements.
The grouping operator gamma L (r) is defined here. The calculation process of the returned relationship is: grouping the tuples of the relational R first. Each group contains all the tuples whose group attributes are specific values. Then, a new Meta Group is generated for each group. The content of this tuples is a group attribute value, and the result of all the tuples in this group are clustered.
Delta operations are the redundancy of Gamma operations. Deduplication is equivalent to grouping in each column.
3.2.4 extended projection operation
We can see the projection operation above.πA1a2,...(R), extend this operation to support the calculation of component components or selected components. The projection list can be one of the following elements: R attribute, X-> Y rename, e-> Z, E is an expression that involves the properties, constants, arithmetic operators, or string operators of R. Z is the new name after the name.
3.2.5 sorting Operator
The expression Tau L (r) indicates the relation R in L-order. L can be an attribute list. The tuples of R are first sorted by a1 values, and then sorted by A2 if the A1 attributes are equal.
3.2.6 external connections
External Connection is a variant of the connection. It executes the natural connection first, and then adds the floating tuples from R or s to it. If no component exists in the corresponding attribute, null is used.
There are several variants of the natural outer join. The left outer natural join only adds the floating tuples of the mechanical variable R. The outer right join only adds the floating tuples of the right variable s to the result.
There are also corresponding outer θ connections for θ connections, left outer θ connections, and right outer θ connections.
3.3 datalog logical Query Language
Relational operations can also be based on the datalog logic query language, which is not described here currently.
3.4 relational algebra and datalog
Same as above, omitted
References
Wikipedia-relational algebra http://zh.wikipedia.org/wiki/%E5%85%B3%E7%B3%BB%E4%BB%A3%E6%95%B0_%28%E6%95%B0%E6%8D% AE %E5%BA%93%29#.E8.87.AA.E7.84.B6.E8.BF.9E.E6.8E.A5_.28.E2.8B.88.29
Basic tutorial on database systems Jeffrey D. Ullman, the third edition of the original book, Jenifer widom