SQL Server database design specifications (1)
1. Introduction
Database Design refers to constructing the optimal database mode for a given application environment, establishing a database and other application systems, so as to effectively store data and meet the needs of various users. Naming Conventions are very important in the database design process. Rational naming conventions can save developers a lot of time to differentiate database entities.
Recently, I have compiled this word document because of my work needs. I hope you can correct it.
2. Database Design
Database Planning → requirement analysis → database design → application design → implementation → test → running and Maintenance
2.1 Database Planning
Define the main objectives of the database application system, define system-specific tasks, including workload estimation, resource usage, and demand funding, and define the scope and boundaries of the system.
2.2 Requirement Analysis
2.1.1 Requirements Analysis Steps and results
Involved personnel: Users and analysts
Task: Conducts a detailed investigation of objects to be processed in the real world, collects basic data and processing methods, and analyzes the objects based on user surveys to gradually clarify users' requirements for the system, including information requirements and handling requirements.
Methods and steps: 1. Collect User information requirements through surveys with users.
2. while collecting data, the designer needs to process and organize it, describe it in the form of data dictionary and data flow diagram, and tell the user information from the perspective of the designer, modify the process based on user feedback and determine that the process is repeated)
Results: Data flow chart, data dictionary, various descriptive tables, statistical output tables, and system function structure.
2.1.2 data flow diagram basic elements and data flow diagram
External entityA person or organization square or cube that exists outside the software system ).
Processing: Data Processing, indicating that the input data is transformed here to generate a giant or circular representation of the output data rounded corner ).
Data Stream: Indicates the Flowing Data arrows ).
Data Storage: Indicates the Opening Rectangle or two parallel horizontal lines of the data to be stored ).
Top-level flowchart of the Order Processing System:
Layer-0 data flow diagram:
2.3 Database Design
2.3.1 Conceptual Structure Design
-
Abstract transactions and describe them in the form of a E-R Diagram
-
E-R graph entity contact diagram): includes entity, contact, attribute
Entity: Real things such as students and teachers
Contact: Relationship between two entities:, 1: N, and M: N
Attribute: Attributes of an object, such as student ID, name, and gender
For example: A student belongs to a class, a class has more than one student, the E-R figure is as follows
E-R diagram of the Online Shopping System, which has the following constraints between the system data
-
A unique customer ID) can have multiple orders. Each order belongs to only one customer.
-
A single order number is unique) can contain multiple order details, each order detail belongs to only one order.
-
A single item can contain multiple order items. One order item only contains multiple items.
-
A commodity category can contain multiple commodities, and a commodity belongs to only one commodity category.
2.3.2 Logical Structure Design
2.3.2.1E-R diagram converted to relational mode
Converts an object to a relational mode. The attributes of an object are the attributes of the relational mode, and the identifiers of an object are the keys of the relational mode.
-
If the link between an object is (), the two entities can be converted to two relational modes. The primary keys of another relational mode can be added to the attributes of any relational mode as foreign keys) and contact attributes
-
If the link between objects is one-to-multiple (1: n), convert the object type of n-end to the primary key of the 1-end object type added to the link mode as the foreign key) and contact type attributes.
-
If the link between objects is multi-to-many m: n), the link type is also converted to the link mode, and its attribute is 2 the primary key of the object type as the foreign key) add the attributes of the contact type, and the primary key of the link mode is a combination of two-end object primary keys.
-
If the link mode is, the conversion principle is the same.
-
If the link mode is 1: 1: n, the conversion principle is the same as 1: n.
-
If the link mode is 1: n: m, you can also convert the link type to the link mode. Its attribute is the primary key of the m and n object types as the foreign key) add the attributes of the contact type, and the primary key of the link mode is a combination of the primary keys of the n and m objects.
-
If the link mode is n: m: p, the conversion rule is the same as m: n.
According to E-RLinks between graph objects can be converted to the following link modes::
CustomerCustomer ID, name, phone number, E-mail ). Primary Key of the link: Customer ID; foreign key: None
OrderOrder Number, order time, customer number ). Primary Key of the link: Order Number; foreign key: Customer Number
Order DetailsOrder details number, order quantity, payment amount, order number ). Relational primary key: Order detail number; foreign key: Order Number.
AppearsOrder details number, Product Number, type ). Primary Key of the link: Order details number, product number; foreign key: Order details number, product number.
Product: Product ID, product name, unit price, production date, product category number, product alias ). Primary Key of the link: item number; foreign key: None
The following problems may occur in the Relationship Pattern Design: data redundancy, inconsistent data changes, abnormal data insertion, and abnormal data deletion. Therefore, the requirements of the paradigm are put forward, the goal is to minimize redundancy and avoid insertion, deletion, and modification exceptions.
2.3.2.2 paradigm
Primary attribute: All attributes that contain keys.
1NF): If each component of the relational model R is an inseparable data item, the relational model belongs to the first paradigm. Each attribute cannot be split.
Second Paradigm (2NF): R belongs to 1NF, and each non-primary attribute is completely dependent on the key but not partially dependent), R belongs to 2NF.
Example: Course Selection relationship (student ID, course number, score, credits)
The primary key of the relationship is student ID, course number), but course number → credits. Therefore, the credit attribute is partly dependent on the primary key, that is, the relationship Department satisfies the second paradigm and can be split into student ID, course number, score), course number, and credit)
Third Paradigm (3NF): R belongs to 2NF, and each non-primary attribute is not partly dependent on the code, nor is it transmitted dependent on the code
Example: Student Relationship student ID, name, Department, Department address)
The primary key of the link is student ID.
Student ID → department, department → student ID, department → department address; according to the function dependency principle, the system address transfer function depends on the student ID, that is, the relationship does not meet the third paradigm, you can split the link into student ID, name, and Department), department, and Department address)
If the data is not split, there will be an exception in data modification. For example, if the student changes the Department, the Department is changed, but the Department address is not modified, this will cause an exception in modification.
BCNF: R belongs to 3NF, and there is no main attribute-to-code part or transfer function dependency.
For example: link R part number, part name, manufacturer name), if you set that each part number has only one part name, but different part numbers can have the same part name, each part can be produced by multiple manufacturers, but each manufacturer must have a different part name. In this way, we can get:
Part number → part name, manufacturer name, part name) → part number
Therefore, the main attributes include the part number, manufacturer name, and part name. However, the "part name" is transferred Based on the code "manufacturer name, part name". Therefore, the relationship R does not meet BCNF requirements, when a part is produced by multiple manufacturers, since there is only one part number and the part name varies with the manufacturer, the relationship between the part name and the part number will be repeated multiple times, data redundancy and operational exceptions
Can be divided into parts number, manufacturer name), part number, part name)
4NF: relational mode R belongs to 1NF. If every non-trivial multi-value of R depends on X → Y and Y is not included in X, X must contain the code, then R belongs to 4NF.
5NF: projection the relationship to eliminate the connection dependency not contained by the candidate code.
For the preceding commodity relationship, because the primary key of the relation is the commodity number, the commodity category number → the commodity alias
Therefore, the commodity relationship Department satisfies the third paradigm. The delivery of non-primary product aliases depends on product numbers, which may cause data redundancy and abnormal data modification. The product relationship is divided:
Product ID, product name, unit price, production date, product category number)
Product Category Number, product alias)
2.3.3 Physical Structure Design
The process of designing a physical structure that best fits the application requirements for a given Logical Data Model
-
Database creation
-
Create a data table
-
Index creation
-
View Creation
-
Trigger Creation
-
Stored Procedure Design
-
User-Defined Function Design
-
Restrict data items in link mode, such as check constraints, primary key constraints, and integrity constraints to ensure data correctness
2.4 Application Design
Use advanced languages for structural design or object-oriented design
2.5 system implementation
3. Optimization Strategy
3. 1. Query Optimization policies
-
Minimize multi-table queries or create materialized views
-
Retrieve only required Columns
-
Replace or statements with the level of the conditional words with IN
-
COMMIT frequently to release the lock as soon as possible
3.2 Table Design
1. If frequent access involves joining two related tables, consider merging them.
2. If frequent access is only performed on some fields in the table, consider splitting the table and using this part as a separate table.
3. Introduce materialized views for tables with few updates
4. When there are a few duplicate values in the system, use dictionary tables to save storage space and optimize queries. For example, user code of the region or system. These values do not change during the running period of the program, but must be stored in the database.
For the region, if we want to query the records of a region, the database needs to query the records by string matching. If we change the region to the region code in the table, when you query by region code, the query efficiency will be greatly improved.
In the program, a large number of dictionary tables should be used to represent such values. The dictionary table stores the code of this type of value and the set of objects. The foreign key is associated with the table using this type of value. However, in the coding stage, programmers do not use dictionary tables because they first query the entity code in the dictionary table, which violates the original intention of improving the query efficiency. With the help of Data Dictionary, programmers directly use code to represent entities, thus improving efficiency.
Although the dictionary table is not actually used, it should be kept in the database at least during the development period ). A dictionary table appears as another form of "data dictionary document" to show which tables in the database use dictionary tables.
To improve the data integrity of the database, you can retain the foreign key constraints of the complete dictionary table and common table during the development phase. However, in the running stage of the database, the foreign keys of common tables and dictionary tables should be deleted to improve the running efficiency, especially when many dictionary tables are used in some tables.
Case: a database contains millions of user information, and the application system often needs to query user information by region. The user information table was previously saved according to the specific region name. Now, the specific name is changed to the region code in the dictionary table, which greatly improves the query efficiency.
3.3 Index
-
If a query is a bottleneck, an appropriate index is created on the link. Generally, an index is created on the attributes used as the query condition to improve the query efficiency.
-
If the update is a bottleneck because the index on the table is re-created for each update, resulting in lower efficiency, some indexes should be deleted.
-
Select an appropriate index. If range query is frequently used, the B-tree index is more efficient than the hash index.
-
Set indexes that are conducive to most queries and updates as clustered indexes.
3.4 improve IO Efficiency
-
Index files and data files are stored separately, and transaction log files are stored on high-speed devices.
-
Frequently modify the page size of data files and index files
-
Sort data regularly
-
Add necessary index items