Database paradigm-Easy to understand "turn" database paradigm is an essential knowledge in database design, without the understanding of the paradigm, can not design a high-efficiency, elegant database. Even design the wrong database. And it's not that easy to understand and master the paradigm. In textbooks, relational algebra is generally used to interpret database paradigms. Although this can be very accurate expression of the database paradigm, but more abstract, not too intuitive, not easy to understand, more difficult to remember.
This paper introduces the paradigm in a more straightforward language, which is intended to facilitate understanding and memory, and may present some imprecise representations. But for beginners should be a good introduction. I write these purposes mainly to strengthen the memory, in fact, I also compare dishes, I hope when I am unfamiliar with some concepts, back to look at their notes, can quickly enter the state. If you find a mistake in it, please correct it.
Here's how to get started:
first, the basic concept
To understand the paradigm, you must first know what isrelational database, if you don't know, I can simply not say that: A relational database is a two-dimensional table to save data. Between table and table can ... (Omit 10W words).
Then you should understand the following concepts:
-
- things that exist in the real world and can be distinguished. such as "A student", "a book", "a course" and so on. It is worth emphasizing that the word "things" here is not only the "things" that can be seen and touched, but also the virtual one, rather "the relationship between the teacher and the school".
- The textbook is interpreted as: "An entity has an attribute", thus, the beginning of the property is a logical concept, for example, "Gender" is a property of "human". In a relational database, a property is a physical concept, and attributes can be thought of as "a column in a table."
- : A row in a table is a tuple.
- component: A property value for the tuple. In a relational database, it is an atomic operation, that is, when the relational database is doing anything, the attribute is "non-divided." Otherwise it's not a relational database.
- code: table can uniquely determine one of a tuple's properties (or ), If there is more than one such code, then everyone is called candidate code, we choose one from the candidate code to do the boss, it is called main code.
- If a code contains all the attributes, the code is full size.
- A property is the primary attribute as long as it appears in any candidate code.
- non-primary property: In contrast to the above, there is no candidate code, this property is a non-primary attribute.
- a property (or attribute group), it is not a code, but its other table code, it is the outer code.
Two, 6 paradigms
Well, the above has introduced all the basic concepts we need to master the paradigm, so let's talk about the paradigm. The first thing to understand is the inclusive relationship of the paradigm. A database design if it conforms to the second paradigm, it must also conform to the first paradigm. If it conforms to the third paradigm, it must also conform to the second paradigm ...
The first paradigm (1NF): attribute is not divided.
In front of us we've introducedProperty ValueConcept, we say that it is "non-divided". And the first paradigm requires that attributes are also not divided. So what is the difference between it and the attribute value? Give an example:
Name |
Tel |
Age |
Big Treasure |
13612345678 |
22 |
Xiao ming |
13988776655 |
010-1234567 |
21st |
Ps: In this table, the attribute value is "divided".
Name |
Tel |
Age |
Cell phone |
Landline |
Big Treasure |
13612345678 |
021-9876543 |
22 |
Xiao ming |
13988776655 |
010-1234567 |
21st |
Ps: In this table, the attributes are "divided".
Neither of these cases satisfies the first paradigm. A database that does not meet the first paradigm, not a relational database! Therefore, we do not have such a "table" in any relational database management system.
The second paradigm (2NF): 1NF compliant, and,the non-primary attribute is completely dependent on the code.
It sounds like a mystery, but it's really nothing.
The main attribute in a candidate code can also be several. If a primary property is not a single candidate, it cannot determine any of the non-primary properties. Give a counter example: We consider a primary school administration system, students assigned a teacher in class, a textbook, a classroom, a time, we all go to class, no problem. So how is the database designed? (Student class table)
Students |
Course |
Teacher |
Teacher Title |
Textbooks |
Classroom |
Class time |
Xiao ming |
First grade Chinese (top) |
Big Treasure |
Associate professor |
"Primary language 1" |
101 |
14:30 |
A student in a class, must be in a particular classroom. So there are classrooms (students, classes)
A student has a class that must be taught by a particular teacher. So there are teachers (students, courses)
A student in a class, his teacher's job title can be determined. So there are (students, courses)--teacher title
A student in a class must be a specific textbook. So there are textbooks (students, courses)
A student in a class, must be at a certain time. So there are (students, courses)--Class time
So (student, course) is a code.
However, a course, must designate a textbook, first-grade language is certainly used in the "primary language 1", then there is the curriculum----teaching materials. (student, course) is a code, but the course decides the textbook, which is called incomplete dependence, or partial dependence. In this case, the second paradigm is not satisfied!
What's wrong with it? You can think about:
1, the headmaster to add a new course called "Calculus", the textbook is "College Mathematics", How to do? Students have not chosen a class, and the student is the main attribute, the main attribute can not be empty, how to record the course, the textbook Where to remember?...... Are you depressed?(Insert Exception)
2, the next semester no students to learn first-year language (on), to learn a first-year language (the next) went, then the table will not exist in the first grade language (on), there is no "primary language 1". At this time, the headmaster asked: first grade Chinese (on) What teaching materials AH? ...... Are you depressed?(Delete Exception)
3, the headmaster said: First grade language (on) for the teaching materials, replaced by the "University of Chinese." 10,000 students have chosen such a class, the change is very big Ah! It's exhausting. Are you depressed?(Modify exception)
How should that be solved? Projection decomposition, which decomposes a table into two or more tables
Students |
Course |
Teacher |
Teacher Title |
Classroom |
Class time |
Xiao ming |
First grade Chinese (top) |
Big Treasure |
Associate professor |
101 |
14:30 |
Student Class Table New
Course |
Textbooks |
First grade Chinese (top) |
"Primary language 1" |
Table of the courseThe third paradigm (3NF):2NF compliant, and,Eliminate delivery dependencies
The above "Student class table new" conforms to 2NF, you can verify that: Two primary properties are used alone, without having to determine any of the other four non-primary properties. But it has a transitive dependency!
Where is it? The problem is in the "teacher" and "teacher title" here. A teacher must be able to determine a teacher's title.
What's the problem? Think about:
1, the teacher upgraded, changed the professor, to change the database, the table has n, changed n times ...(Modify exception)
2, no candidate this teacher's class, the teacher's title has not been recorded ...(Delete exception)
3, a new teacher, not assigned to teach what class, his title to remember? ......(Insert exception)
How should that be solved? As with the above, the projection decomposition:
Students |
Course |
Teacher |
Classroom |
Class time |
Xiao ming |
First grade Chinese (top) |
Big Treasure |
101 |
14:30 |
Teacher |
Teacher Title |
Big Treasure |
Associate professor |
BC Paradigm ( BCNF ): conforms to 3NF, and the primary attribute is not dependent on the primary attribute
If the relationship pattern is in the first paradigm, and each attribute is not passed by a key code, then r belongs to the BC paradigm.
Usually
The conditions of the BC paradigm have multiple equivalents: each non-trivial dependency must have a key code on the left side, and each determinant must contain a key code.
The BC paradigm checks both non-primary and primary properties. When only the non-primary attribute is checked, it becomes the third paradigm. The relationship of the BC paradigm must satisfy the third paradigm.
It is also possible to say that if a relationship reaches the third paradigm, and it has only one candidate code, or if each of its candidate codes is a single attribute, the relationship naturally reaches the BC paradigm.
Generally, a database design conforms to 3NF or BCNF. In the BC paradigm, there are the fourth paradigm and the five paradigm.
The Fourth paradigm: requires the deletion of many-to-many relationships within the same table.
The fifth paradigm: re-establishing the original structure from the final structure.
However, in most applications it is not necessary to design to this extent. And, in some cases, too much normalization can even hinder the logical readability and efficiency of the database. A certain degree of redundancy in a database is not necessarily a bad thing. If you are interested in the fourth paradigm and the five paradigm, you can take a look at the professional textbooks, learn from the beginning, and forget everything I said, so as not to mislead you
Some techniques in database design
1. Relationship between the original document and the entity Can be a pair of one or one-to-many, many-to-many relationships. In general, they are one-to-one relationships: a single original document corresponds to and corresponds to only an entity. In special cases, they may be one-to-many or many-to-one relationships, that is, a single original document corresponds to multiple entities, or multiple original documents corresponding to an entity. The entities here can be understood as basic tables. Clear this correspondence, to our design input interface is very good. Example 1〗: An employee biographical data, in the Human Resources information System, the corresponding three basic tables: Employee basic Situation table, social relations table, work Resume table. This is a typical example of "a single original document corresponding to multiple entities". 2. Primary key and foreign key Generally, an entity cannot have no primary key and no foreign key. In a e-r diagram, an entity in the leaf area can define a primary key or not define a primary key (because it has no descendants), but must have a foreign key (because it has a father). The design of primary key and foreign key plays an important role in the design of global database. When the design of the global database was completed, an American database design expert said: "Key, Everywhere is the key, in addition to the key, nothing", this is his database design experience, but also reflects his information system core (data model) of the highly abstract ideas. Because: The primary key is the height abstraction of the entity, and the primary key is paired with the foreign key, representing the connection between the entities. 3. Nature of basic tables A base table differs from an intermediate table, a temporary table, because it has the following four attributes: (1) atomicity. The fields in the base table are non-biodegradable. (2) Primitive nature. The records in the base table are the records of the original data (the underlying data). (3) Deductive nature. All the output data can be derived from the base table and the data in the Code table. (4) stability. The structure of the base table is relatively stable, and the records in the table are stored for a long time. After you understand the nature of the base table, you can differentiate the base table from the intermediate and temporal tables when you design the database. 4. Paradigm Standard The relationship between the base table and its fields should satisfy the third paradigm as much as possible. However, the design of the database that satisfies the third paradigm is often not the best design. In order to improve the efficiency of database operation, it is often necessary to reduce the standard of normalization: to increase redundancy appropriately and to achieve the purpose of space-changing time. Example 2〗: There is a basic table for storing goods, as shown in table 1. The existence of the "Amount" field indicates that the design of the table does not satisfy the third paradigm, since "amount" can be obtained by multiplying the "unit price" by "quantity", stating that "amount" is a redundant field. However, increasing the "amount" of this redundant field can increase the speed of query statistics, which is the practice of space-changing time. In Rose2002, you specify two types of columns: data columns and computed columns. Columns such as amounts are referred to as computed columns, and columns such as unit price and quantity are referred to as data columns. Table 1 table structure of the commodity table Product name commodity model Unit Price quantity amount TV 29 "2,500 40 100,000 5. Popular understanding of three paradigms A popular understanding of the three paradigms is of great benefit to database design. In the database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms in a popular way (the popular understanding is sufficient understanding, not the most scientific and accurate understanding): The first paradigm: 1NF is an atomic constraint on attributes, requiring attributes to be atomic and non-decomposed. The second paradigm: 2NF is a unique constraint on records, requiring records to have a unique identity, that is, the uniqueness of the entity; The third paradigm: 3NF is a constraint on field redundancy, that is, any field cannot be derived from another field, it requires no redundancy in the field. There is no redundant database design to do. However, a database without redundancy is not necessarily the best database, and sometimes in order to improve operational efficiency, it is necessary to lower the paradigm standard and properly retain redundant data. The practice is to adhere to the third paradigm when designing the conceptual data model, and to lower the standard of normalization into the design of the physical data model. Lowering the paradigm is adding fields, allowing redundancy. 6. Be good at identifying and correctly dealing with many-to-many relationships This relationship should be eliminated if there is a many-to-many relationship between the two entities. The solution is to add a third entity between the two. In this way, the original a many-to-many relationship, now becomes two one-to-many relationship. To properly assign the original two entity's attributes to three entities. The third entity here is essentially a more complex relationship that corresponds to a basic table. Generally speaking, the database design tool does not recognize many-to-many relationships, but it can handle many-to-many relationships. Example 3: In "Library information System", "book" is an entity, "reader" is also an entity. The relationship between these two entities is a typical many-to-many relationship: A book can be borrowed by multiple readers at different times, and a reader can borrow more books. To do this, add a third entity between the two, the entity is named "borrowed book", its properties are: Borrow time, borrow also sign (0 means borrowing, 1 means return), in addition, it should also have two foreign keys ("books" The primary Key, "reader" of the primary key), so that it can be connected with the "book" and "Reader". 7. The primary key PK value method PK is an inter-table connection tool for programmers, which can be a string of numbers with no physical meaning, which is implemented automatically by the program. It can also be a combination of field names or field names that have a physical meaning. But the former is better than the latter. When PK is a combination of field names, the number of suggested fields should not be too many, more not only the index occupies a large space, and the speed is slow. 8. Correct understanding of data redundancy The repetition of the primary key and the foreign key in multiple tables is not data redundancy, and the concept must be clear, in fact many people are unclear. The duplication of non-key fields is the data redundancy! And is a kind of low-level redundancy, that is, repetitive redundancy. Advanced redundancy is not a recurring occurrence of a field, but a derivation of a field. Example 4〗: "Unit price, quantity, amount" of three fields in a commodity, "amount" is derived from "unit price" multiplied by "quantity", it is redundant, and it is a kind of advanced redundancy. The purpose of redundancy is to improve processing speed. Only low-level redundancy increases the inconsistency of the data, because the same data can be entered multiple times, from different times, places, and roles. Therefore, we advocate advanced redundancy (derived redundancy) against low-level redundancy (repetitive redundancy). 9. e--r Chart No standard answer The E--r diagram of information system has no standard answer, because its design and drawing is not unique, as long as it covers the business scope and function content of the system requirement, it is feasible. Conversely, to modify the E--r diagram. Although it does not have the only standard answer, it does not mean that it can be arbitrarily designed. The standard of good E-r chart is: The structure is clear, the association is concise, the number of entities is moderate, the attribute allocation is reasonable, there is no low level redundancy. 10. View technology is useful in database design Unlike basic tables, code tables, and intermediate tables, a view is a virtual table that relies on a real table of data sources. View is a window for programmers to use the database, is a form of synthesis of the base table data, is a method of data processing, is a means of user data privacy. In order to perform complex processing, increase computation speed and save storage space, the definition depth of the view should not exceed three layers. If the three-tier view is still not enough, you should define a temporary table on the view and then define the view on the temporary table. With this iterative definition, the depth of the view is not restricted. The role of views is more important for certain information systems related to national political, economic, technical, military and security interests. Once the basic tables of these systems have been physically designed, a first-level view is immediately established on the base table, which has the same number and structure as the number and structure of the base table. It also stipulates that all programmers are only allowed to operate on the view. Only the database administrator, with the "security key" shared by multiple people, can operate directly on the base table. Let the reader think: why is this? 11. Intermediate tables, reports and temporary tables The intermediate table is the table that holds the statistics, which is designed for the data warehouse, the output report, or the query results, and sometimes it has no primary key and foreign key (except for the Data Warehouse). Temporary tables are designed by programmers to store temporary records that are used by individuals. The base table and the intermediate table are maintained by the DBA, and the temporary tables are automatically maintained by the programmer themselves using the program. 12. Integrity constraints are represented in three areas Domain Integrity: Use Check to implement constraints, and in the database design tool, when defining a field's range of values, there is a check button that defines the value city of the field. Referential integrity: The use of PK, FK, table-level triggers to achieve. User-defined integrity: It is a business rule that is implemented with stored procedures and triggers. 13. The way to prevent database design patching is "three-little principle" (1) The smaller the number of tables in a database, the better. Only the number of tables, can explain the system e--r diagram few but good, remove the redundant entities, formed a high degree of abstraction of the objective world, the system of data integration, to prevent patching-style design; (2) The fewer fields in a table combine primary keys, the better. Because of the role of the primary key, one is to build the primary key index, the second is to do as a sub-table foreign key, so the combination of the number of primary key fields is less, not only saves the running time, but also saves the index storage space; (3) The smaller the number of fields in a table, the better. Only the number of fields is small, to show that there is no duplication of data in the system, and there is little data redundancy, and more importantly, to urge readers to learn "column to row", so that the child table is prevented from pulling into the main table, leaving a lot of spare fields in the main table. The so-called "column-to-row" is to pull some of the contents of the main table and create a separate child table. This method is very simple, some people are not accustomed to, do not adopt, do not execute. The practical principle of database design is to find the right balance between data redundancy and processing speed. "Three little" is a whole concept, a comprehensive view, can not isolate a certain principle. The principle is relative, not absolute. The "more than three" principle is certainly wrong. Imagine: If the same functionality is covered by the system, the e--r graph of 100 entities (1000 properties) is certainly much better than the E--r diagram of 200 entities (2000 properties). Advocating the principle of "three little" is called the reader to learn to use the database design technology for system data integration. The steps of data integration are to integrate the file system into the application database, integrate the application database into a subject database, and integrate the subject database into a global consolidated database. The higher the degree of integration, the more data sharing, the less information island phenomenon, the whole enterprise information System of the global E-r diagram of the number of entities, the number of primary keys, the number of attributes will be less. The purpose of advocating the "three less" principle is to prevent readers from using patching technology, constantly adding and removing the database, making the enterprise database become a random design of the database table "garbage heap", or database table "clump", and finally caused the database of basic tables, code tables, intermediate tables, temporary tables disorganized, countless, Cause the information system of enterprises and institutions can not be maintained and paralyzed. "More than three" principle anyone can do, the principle is "patching method" design database crooked Science said. "Three less" principle is the principle of few but good, it requires a high degree of database design skills and art, not anyone can do, because the principle is to eliminate the "patching method" design database theory. 14. Ways to improve the efficiency of the database operation Under the condition of the given system hardware and system software, the way to improve the operation efficiency of the database system is: (1) in the database physical design, reduce the paradigm, increase redundancy, less use of triggers, multi-use stored procedures. (2) When the calculation is very complex, and the number of records is very large (for example, 10 million), the complex calculation must first outside the database, in the file system in the C + + language calculation processing finished |
Thinking of database paradigm and design of database