First, the basic concept
1 Relationship Model
Table : The concept of mathematical relationships is closely related, and the relational database is a collection of tables. A row in the table represents a link between a series of values.
Properties : The first column of a table is called a property.
domain : The value allowed by the property.
A mathematical description.
Relationship : A mathematical term that can replace the concept of "table". A relationship is a set of tuples.
Meta-group : A mathematical term that can replace the concept of "line".
tuple variable : A variable that represents a tuple, with all tuples set as a domain variable.
domain of atoms : the elements of a field are considered to be non-sub-units.
Super Code : A collection of one or more properties. The combination of these properties can uniquely identify a tuple in a relationship.
Candidate Code : the smallest super code, its arbitrary true subset can not constitute a super code.
Main code : the candidate code that is selected by the Database Designer to distinguish different tuples in the same relationship.
External Code : One of the properties of a relational schema R1 is the main code of another relational schema R2, which is called a foreign code.
Reference Relationship : Called R1 as the reference relationship of external code dependency.
Referenced Relationship : Called R2 as the reference relationship of external codes.
Database Schema : The logical design of the database.
DB instance : A snapshot of the data in the database at a given moment.
Analogy with programming language
relational pattern : equivalent to the concept of type definition in a programming language. The difference between it and the relationship is that the relationship is equivalent to the concept of variables in the programming language.
Relational Example : a worthwhile concept corresponding to variables in programming language
2 Entity-Contact model (E-R model)
Entity -Contact model (E-R data Model): Based on a perception of the real world, the world consists of a set of basic objects called entities and the connections between them. Three basic concepts of the E-R Model: entity sets, contact sets, attributes.
Entity : "Events" or "objects" that can be distinguished from other objects in the real world .
entity set : A collection of entities that have the same type and share the same nature.
An entity set may intersect.
The database includes a set of entity sets, each of which includes entities of the same type.
An entity set may have multiple properties.
An entity in an entity set can be represented by a collection of pairs (attributes, data values).
the extension of the entity set : The entities that make up the entity set.
Properties : The descriptive nature of each member in the entity set.
property of the field (the value set of the property): The property has a value, the value of the property is a collection of properties for the domain.
Simple Properties : Properties that cannot be divided into smaller parts.
Composite Properties : Attributes that can be subdivided into smaller parts.
single-valued property : A property that has only a single value for a specific entity.
multivalued Property : A property that may correspond to a set of values.
Derived Property : A property that can be derived from another related property or entity. Derived properties are not stored and can be computed when needed.
base Property (stored property): The property that is derived.
Contact : The correlation between multiple entities.
Contact Set : A collection of contacts of the same type.
Participation : The association between entity sets is called participation.
Contact Instance : Represents an association between named entities that are modeled as real-world.
The role of the entity : The role of the entity in the relationship.
degree of contact set : Number of entity sets participating in the contact set.
Map Cardinality (cardinality scale): The number of entities that can be associated with another entity at the same time through a contact set.
to Entity sets The two-yuan contact set between A and B has a mapping base of:
one to one : An entity in a is associated with at most one entity in B, and an entity in B is associated with at most one entity in a.
One -to-many : An entity in a can be associated with any number of entities in B, and an entity in B is associated with at most one entity in a.
Many-to-one : An entity in a is associated with at most one entity in B, while an entity in B can be associated with any number of entities in a.
Many-to-many : An entity in a can be associated with any number of entities in B, and an entity in B can also be associated with any number of entities in a.
entity set E all participate in the contact set R: Each entity inentity set E participates in at least one contact of the contact set R.
entity set Part E participates in the contact set R: only some entities inentity set e participate in the contact set R.
code : used to differentiate entities and connections, and uniquely identify links.
Super Code : A collection of one or more properties that allow us to uniquely identify an entity in an entity set.
Candidate Code : the smallest super code, its arbitrary true subset can not become super code.
Main code : A candidate code used to differentiate different entities in the same entity set.
You should select attributes that are never or rarely changed as the main code.
The main code structure of the contact is dependent on the contact set map cardinality. In a one-off connection, you can use either of the two main codes as the main code of the contact. In many-to-many connections, the main code of the contact is composed of two main codes. In a one-to-many connection, the main code of the contact is determined by either.
weak entity set: An entity set that is not sufficient to form the main code is called a weak entity set.
Strong entity set : An entity set that has a master code.
identifies the entity set (the Master entity set): The strong entity set on which the weak entity set depends.
Identity Contact : A contact set that links a weak entity set with its identity entity set. An identity link is a many-to-one relationship from the entity set to the identity entity set, and the weak entity set is all involved in the contact.
Resolution : A collection of attributes that can distinguish between weak entities.
Specialization : The process of grouping within an entity.
generalization : The containment relationship between a high-level entity set and one or more lower-layer entity sets.
Super class : The high-level entity set.
Subclass : The lower-level entity set.
attribute Inheritance : An important feature of high-level entities and lower-layer entities produced by specialization and generalization, and high-level entity assemblies are inherited by lower-layer entity sets.
Ii. the principle of normalization
First Paradigm
if a relational pattern All domains of R are atomic, and the relational pattern R belongs to the first paradigm. More generally, all properties are simple attributes, so that the entity set conforms to the first normal form.
Second Paradigm
If the entity set conforms to the first paradigm, and the non-master code is completely dependent on the master code, such an entity set conforms to the second paradigm.
Third Paradigm
If the entity set conforms to the second paradigm and its non-master code does not depend on any other non-master code, such an entity set conforms to the third paradigm.
Iii. Design Process (non-relational database available)
The first stage, the demand analysis. and domain experts, database users to communicate widely, to obtain more detailed specifications, the final submission of products for User Requirements Specification .
Phase II, conceptual design . Select the data model to convert the requirements into a conceptual model of the database.
If the relational database design, then must choose the relationship model, the final product submitted to the entity -Contact diagram and functional Requirements Specification . The specific work at this stage is to define the database to represent entities, entity attributes, connections between entities, and constraints on entities, and describe the various types of operations performed on the data (additional deletions) or things.
Get The E-R model is tested and optimized using the paradigm described above.
The third stage, the logical design. Map the conceptual pattern to the implementation data model of the database system that will be used.
To design a relational database, the data model is a relational data model, which typically The conceptual pattern defined by the E-R model is mapped to a relational schema, so the final submitted product is a relational pattern .
After acquiring the relational pattern, the pattern is tested and optimized using the paradigm described above.
fourth stage, physical design. define the physical characteristics of the database and complete the design.
Iv. Detailed design process
1 Conceptual Design Essentials
1) should I select an entity or a property?
The concept of entities and attributes is not very precise, and the difference depends on the characteristics or structure of the things that are modeled in reality.
The general principle is that, depending on the nature of the model being modeled, it is necessary to determine whether this property can be split again before using the attribute, that is, whether this property contains some other "attributes" that can be described more precisely, which should use entities rather than attributes.
2) can the main code of an entity set be a property of another entity set?
Generally speaking, it is inappropriate to use one entity set attribute as a property of another entity set during modeling, and the most obvious disadvantage is that it causes the data to be stored repeatedly, and it is more inappropriate to attribute the master code of one entity set as another entity set, because if the modeling process Found that two entities seem to have a common attribute, then there must be a connection between the two entities, at this time should be abstracted out of the two entity set of relations, the relationship between them with a clear expression, rather than hide the relationship.
3) can I use the main code attribute of a related entity set as a property of a contact set?
Because the main code attribute of the associated entity set is implied in the contact set, do not use the main code attribute of the related entity set as the property of the contact set. This is often caused by unclear concepts.
4) can I also design the contact set for two entity sets as an entity set?
Concrete analysis of specific circumstances.
If the related entity set and the contact set are all one-to-a relationships, then it is better to use a contact set. In addition, it is better to use a connection to describe the behavior between entities.
If this is not the case, you should consider designing the contact set of two entity sets as entity sets.
5) Avoid the use of four yuan or more than four yuan of the contact set
Some seemingly non-two-dollar contact sets can often be converted to multiple two-yuan contact sets, but it is also inappropriate to convert some of the high-dollar links to more than two yuan.
6) convert Many-to-many connections between two entity sets to one-to-many links whenever possible
7) layout of contact Properties
Whether a descriptive attribute is placed in a contact set or entity set depends on the business characteristics.
Single-to-one contact sets, the properties of a contact can be placed in any entity that participates in the contact.
A one -to-many contact set, the properties of the contact can only be placed on the "many" side of the entity that participates in the link.
Many-to-many contact sets, the properties of the contact are generally placed in the contact set
8) You should select attributes that are never or rarely changed as the main code
9) weak entity set and multi-valued attribute
A weak entity set can be represented as a composite property of the primary entity set if the weak entity sets participate only in the identity contact and the attributes are not many.
Specialization or generalization
The use of specificity is to express the differences between the subsets contained in the entity set, highlighting the characteristics of subsets to meet the needs of the business.
The reason for generalization is to highlight the similarities between entity sets to hide their differences to meet the needs of the business.
One) use aggregation
The E-R model cannot express the connection between the contacts. Aggregation expresses a more complex relationship. Use aggregation when you need to express contact between contacts or contacts between entities.
2 Logic Design Essentials
the logical design is to The ER model is converted to relational mode
1) relationship pattern derived from strong entity set
The main code of a strong entity set is the main code of the generated relational pattern. The description attribute of a strong entity set is the attribute of the relationship pattern.
2) relational patterns derived from weak entity sets
There are weak entity sets A and strong entity set b,a depend on B.
by a weak entity set The main code of the derived relational pattern is composed of the main code of B and the resolution of A.
in the relationship A indicates the outer code constraint,
3) relationship pattern derived from the contact set
have contact sets R, the set of the main code for all entity sets participating in R is C, and the set of attributes consisting of the P={P1,P2,P3,......}
The descriptive attribute of R is D ={d1,d2,d3,......}
The properties of the relationship pattern exported by the contact set correspond to the P and D are the various attributes of the set.
The rules for determining the main code of a relationship are:
A one to two yuan contact set, the main code of the relational pattern is the main code of any of the entities participating in the contact.
Many-to-one two-yuan contact sets, the main code of the relational pattern is the main code of the entity set of the "many" side of the contact set.
Many-to-many two-yuan contact sets, the main code of the relational pattern is the set of the main code of the entity that participates in the association.
4) mode redundancy
In general, the relationship pattern exported by this contact set is likely to be redundant because the entity that participates in the contact contains the weak entity set and its dependent strong entity set.
5) merging of relational patterns
One-to-one contact, the relationship pattern of a contact set can be combined with the relational schema of any entity set that participates in the contact.
Partially involved in a relationship, you can merge the patterns by using null values.
The external code constraint of the pattern after merging should be adjusted accordingly.
6) composite and multi-valued properties
How composite properties are handled: Create a separate property for each child property.
How multivalued properties are handled: For multi-valued properties M Create relationship mode R.
R has properties that correspond to the property A of M and the master code of the entity set where M resides.
The main code of R consists of all the attributes in the pattern.
7) Generalization
Creates a pattern for a high-level entity set that creates a pattern for each lower-level entity set for this high-layer entity set, with properties in the schema that correspond to each property of the lower-level entity set and each attribute corresponding to the master code of the upper-layer entity set. The main code attribute of a high-level entity set is also the main code attribute of the lower entity set.
You can also create a pattern for a high-level entity set, create a pattern for each lower-layer entity set, so that the properties of the lower-level entity set are the properties of the corresponding lower-layer entity set, and the corresponding upper-layer entity set.
8) The relationship pattern derived from the aggregation
Indicates that the aggregation does not require a separate export relationship pattern, which is the relational pattern that defines the aggregated contact set.
9) Relationship between aggregation and entity set export pattern
This linkage is derived from the relational schema, which includes a descriptive attribute of each attribute in the master code of the corresponding entity set, each attribute in the contact set master code, and the contact set (the aggregation and association between the entity sets).
V. Design examples
Only some of the key modeling issues are analyzed below.
Application Background Description:
provide online documentation to read and download websites that have regular users and site administrators if the platform is open to the outside API Service, then there is the developer, the user authorized to invoke the API.
Literature can be classified by Category: periodicals, conferences, newspapers, doctoral theses, books, etc.
If the literature classification by industry: building materials, water conservancy and hydropower, petrochemical, information industry.
If each industry segments, such as: telecommunications, telephone, printing, publishing, news, broadcasting and so on.
For the convenience of the user to read, so also to provide navigation, navigation similar to the results of each industry segmentation.
The system presents the user's behavior to the user, such as download volume, reading volume, and browsing history.
You can subscribe to your favorite publications and keep track of the latest articles in real time.
Users can customize the theme and retrieve the relevant literature according to a custom theme. The system also provides a default theme.
Users can collect favorite articles, and can also download literature.
Obtain the entity according to the above description:
General User (reader), the organization to which the user belongs, the webmaster, API developer, literature, industry, navigation, topics, statistics.
E-R modeling Analysis and export relationship mode:
The user's account number can be used as the main code, but in the actual use process, but often use the self-increment ID as the main code, called the UserID, although the self-increment ID is not necessary to form an entity, because the self-increment ID is an integer and the account is generally a string , and the general relational database is indexed by default, and the index on the integral type has some advantages.
Statistical information is only associated with ordinary users, you can combine statistics and ordinary users into one entity, that is, statistics become composite properties, but this is a bad design. Although this can reduce the number of entities, but the basic information of ordinary users, such as accounts, passwords, mailboxes, etc. are not often changed. Statistical information is often changed as the user's behavior changes. So here we use entities instead of attributes. There is no composite attribute to convert a weak entity set to a strong entity set.
Statistics form a weak entity set, which relies on a strong entity set, which is a set of entities composed of all ordinary users. The weak entity set resolution is taken from the increment Id, called Staticid. The main code of the export relationship mode is the staticid and user userid
Collection is a user's behavior, a user can collect more than one article, an article can also be collected by multiple users, they are many-to-many relationship . By collecting this behavior the user and the document establish a relationship.
Collection time is a descriptive attribute of the relationship, because it is a many-to-many relationship, so the collection time should be placed in the contact set. The main code for the exported relational pattern is the set of the master code for the user entity set and the article entity set.
The relationship between navigation and users is also many-to-many. The navigation of the client page should have a display order, and it can be adjusted. Then the navigation order attribute should also be placed in the contact set.
users of the platform include ordinary users and API consumers, they are users, there are similarities and differences. Depending on the access rights of the API interface, you can segment the user using the API, for example: A superuser can invoke an API interface that involves a modification operation, whereas a generic user can only invoke API interfaces that involve query operations, but this subdivision does not constitute a generalization and specialization concept. The generalization process is not implemented in a way that creates patterns for high-level entities. Instead, create a pattern for each lower-level entity.
Bibliography:
Database System Concept (original book 5th edition)
Author:(UnitedStates) Abraham silberschatz/(us) Henry f.korth/(United States) S.sudarshan
Publishing house : Mechanical Industry Press
translator : Yang Dongqing/MA/Dong
relational database design