I. Database Design Process
Database technology is the most effective means of information resource management. Database Design refers to constructing the optimal database mode for a given application environment, establishing a database and its application system, and effectively storing data to meet user information requirements and processing requirements.
In the demand analysis phase of database design, the application requirements of various users are integrated (real-world requirements ), in the conceptual design stage, a conceptual model (information world model) independent of the machine characteristics and various DBMS products is formed, which is described by E-R diagram. In the logic design stage, the E-R diagram is converted into the data model supported by the specific database product, such as the relational model, to form the database logic model. Then, based on the user's processing requirements and security considerations, the necessary view is created on the basis of the basic table to form an external data mode. In the physical design phase, you can arrange physical storage based on the characteristics and processing needs of the DBMS, design indexes, and form the internal database mode.
1. Demand analysis stage
Requirement collection and analysis: the data requirement described by the data dictionary is obtained (and the processing requirement described by the data flow diagram ).
Requirement Analysis focuses on investigating, collecting, and analyzing users' information requirements, processing requirements, and security and integrity requirements in data management.
Requirement Analysis Methods: Investigate the organization and organization, investigate the business activities of each department, assist users in clarifying various requirements for the new system, and determine the boundaries of the new system.
Common survey methods include: follow-up operations, opening survey meetings, inviting special personnel to introduce, ask, and designing surveys. Users are requested to fill in and check records.
The methods for analyzing and expressing user requirements mainly include top-down and bottom-up methods. The top-down structured analysis method (SA) starts with the upper-level system organization and analyzes the system layer by layer, the data flow diagram and data dictionary are used to describe each layer.
The data flow chart shows the relationship between the data and the processing process. The data in the system is described using the data dictionary (dd.
A data dictionary is a collection of various data descriptions. It describes the data in a database, that is, metadata, rather than the data itself. A data dictionary consists of data items, data structures, data streams, data storage, and processing processes (at least the Data Type of each field and the Primary and Foreign keys in each table should be included ).
Data item description = {data item name, description, alias, data type, length,
Value range, value meaning, and logical relationship with other data items}
Data Structure Description = {Data Structure name, description, composition: {data item or data structure }}
Data Stream description = {data stream name, description, data stream source, data stream destination,
Composition: {data structure}, average traffic, peak traffic}
Data storage description = {data storage name, description, number, inbound data stream, outbound data stream,
Composition: {data structure}, data volume, access method}
Process description = {process name, description, input: {data stream}, output: {data stream },
Handling: {brief description }}
2. Conceptual Structure Design Stage
A conceptual model independent of a specific DBMS is formed through the synthesis, induction and abstraction of user requirements, which can be expressed by a E-R diagram.
The conceptual model is used for modeling the information world. The conceptual model does not depend on the data model supported by a DBMS. A conceptual model can be converted to a specific data model supported by a DBMS on a computer.
Concept model features:
(1) Strong semantic expression ability, which can easily and directly express various semantic knowledge in applications.
(2) It should be simple, clear, and easy to understand. It is a language for communication between users and database designers.
A common method for conceptual model design is the idef1x method, which applies the entity-contact method to a Semantic Modeling Technology in the semantic data model and is used to establish a system information model.
The steps to create a E-R model using the idef1x method are as follows:
2.1 Step 1 -- initialize the project
In this phase, the task starts from the description of the purpose and scope, determines the modeling target, develops the modeling plan, organizes the modeling team, collects the source material, and develops constraints and specifications. Collecting source materials is the focus of this phase. Through investigation and observation results, business processes, input and output of the original system, various reports, and collection of raw data, a basic data table is formed.
2.2 Step 1 -- Define an object
Entity integrators have a common set of features and attributes that can identify most entities directly or indirectly from the collected source material-basic data tables. Indicates the term of the object in the source material name table and Code "The terminologies at the end, such as the customer code, agency code, and product code, mark the entities represented by the terms to initially identify potential entities and form a preliminary entity table.
2.3 Step 2 -- Define contact
Only binary connections are allowed in the idef1x model. N-element connections must be defined as N binary connections. Based on actual business requirements and rules, the entity contact matrix is used to identify the binary relationship between entities. Then, the potential, Link name, and description of the connection relationship are determined based on the actual situation, and the link type is determined, whether it is an identity link, a non-identity Link (mandatory or optional), a non-deterministic link, or a classification link. If each instance of the Child body needs to be identified by the relationship with the parent object, the relationship is identified, otherwise it is not identified. In a non-identity relationship, if the instance of each child entity is associated with only one parent entity, it is mandatory; otherwise, it is not mandatory. If the parent entity and the Child entity represent the same real object, they are classification relationships.
Step 3-definition code
Remove the non-deterministic relationship generated in the previous stage by introducing the cross entity, and then identify the Hou code attribute from the non-cross entity and the independent entity to uniquely identify the instance of each entity, then confirm the master code from the candidate code. To determine the validity of the master code and link, the non-empty rules and non-multi-value rules are used to ensure that an attribute of an object instance cannot be null, you cannot have more than one value at the same time. Identify the identified relationship by mistake, break down the entity, and finally construct the key base view of the idef1x model (kb chart ).
2.5 Step 4 -- Define attributes
Extract descriptive terms from the source data table to develop an Attribute Table and determine the attribute owner. Defines non-primary Code attributes and checks non-null and non-multi-value rules of attributes. In addition, check fully dependent function rules and non-pass dependency rules to ensure that a non-primary code attribute must depend on the primary code, the entire primary code, and only the primary code. In this way, the full attribute view of the improved idef1x model, which is at least in line with the third paradigm of the relational theory, is obtained.
Step 5: Define other objects and rules
Defines the data type, length, precision, non-null, default value, and constraint rules of an attribute. Defines trigger, stored procedure, view, role, synonym, sequence, and other object information.
3. Logical Structure Design Stage
Converts a conceptual structure to a data model (such as a relational model) supported by a DBMS and optimizes it. To design the logical structure, select the data model that best suits to describe and express the corresponding conceptual structure, and then select the most appropriate DBMS.
To convert a E-R diagram to a relational model is to convert the relationship between the object, the object attributes, and the object into a relational model. This conversion generally follows the following principles:
1) A real shape is converted into a relational model. An object attribute is a link attribute. The entity code is the link code.
2) One M: N contact is converted into a relational mode. The codes of the entities connected to the contact and the attributes of the contact are converted to the Link Attributes. The link code is a combination of Entity Codes.
3) A 1: n link can be converted into an independent link mode, or merged with the link mode corresponding to n. If it is converted to an independent link mode, the codes of the entities connected to the contact and the attributes of the contact are converted to the link attributes, the link code is the entity code of N.
4) A link can be converted into an independent link mode, or merged with the link mode corresponding to any end.
5) One multivariate link between three or more entities is converted into a relational model. The codes and attributes of each object connected to the multivariate link are converted to the link attributes. The link code is a combination of Entity Codes.
6) links between entities in the same entity set, that is, self-links, can also be handled in three situations:, 1: N, and M: N.
7) The link mode with the same code can be merged.
In order to further improve the performance of the database application system, the data model structure should be modified and adjusted appropriately under the guidance of standardization theory. This is the optimization of the data model. Determine the data dependency. Eliminate redundant connections. Determine which relationship modes belong to the nth paradigm. Determine whether to merge or break them down. In general, the relationship is divided into 3nf criteria, namely:
Each value in the table can be expressed only once.
•? Each row in the table should be uniquely identified (with a unique key ).
The table should not store non-key information that depends on other keys.
4. Physical database design phase
Select the physical structure (including the storage structure and access method) that best suits the application environment for the Logical Data Model ). Arranges physical storage and designs indexes based on DBMS features and processing needs to form the internal database mode.
5. database implementation stage
Use the Data Language (such as SQL) provided by DBMS and its host language (such as C) to establish a database based on the logic design and physical design results, and compile and debug applications. Program Organize data warehouse receiving and perform trial run. Database implementation mainly includes the following tasks: defining the database structure with DDL, organizing data warehouse receiving, compiling and debugging applications, and database trial run
6. Database Operation and Maintenance
The database application system can be put into operation after trial run. During the operation of the database system, it must be constantly evaluated, adjusted, and modified. Including: Database dumping and recovery, database security, integrity control, database performance monitoring, analysis and improvement, database re-organization and re-construction.
Use of modeling tools
To speed up database design, there are currently many database auxiliary tools (CASE tools), such as Rational Rose, Erwin and bpwin of CA, sybase powerdesigner and oracle designer.
Erwin is mainly used to establish conceptual and physical models of databases. It can describe the attributes of objects, links, and objects in a graphical manner. Erwin supports the idef1x method. By using the Erwin modeling tool to automatically generate, change, and analyze the idef1x model, we can not only obtain excellent business functions and data requirement models, but also transform the idef1x model to the physical design of the database. The model drawn by Erwin tool corresponds to two types: Logical Model and physical model. In the logical model, the idef1x toolbox can be used to easily create and draw object links and attributes in a graphical manner. In the physical model, Erwin can define the corresponding tables and columns and automatically convert them to appropriate types for various database management systems.
Designers can select appropriate database design modeling tools as needed. For example, after the requirement analysis is complete, the designer can use Erwin to draw an erdiagram, convert the erdiagram into a relational data model, generate a database structure, draw a data flow diagram, and generate an application.
2. Database Design Skills
1. Before designing the database (demand analysis stage)
1) understand customer requirements and ask users how to view future demand changes. Ask the customer to explain their needs, and as the development continues, ask the customer frequently to ensure that their needs are still under development.
2) understanding enterprise business can save a lot of time in the future development phase.
3) Pay attention to input and output.
When defining database tables and field requirements (input), Check Existing or designed reports, queries, and views (output) to determine which tables and fields are necessary to support these outputs.
For example, if you want a report to be sorted, segmented, and summed by zip code, make sure that it contains a separate zip code field instead of mixing the zip code into the address field.
4) create a data dictionary and ER chart
Er charts and data dictionaries allow anyone who knows the database to clearly learn how to obtain data from the database. The erdiagram is useful for displaying the relationship between tables, while the data dictionary describes the purpose of each field and any possible aliases. It is absolutely necessary to document SQL expressions.
5) define standard object naming rules
The names of various objects in the database must be standardized.
2. Table and field design (Database Logic Design)
Table Design Principles
1) standardization and standardization
Data Standardization helps eliminate data redundancy in databases. There are several forms of standardization, but third Normal Form (3nf) is generally considered to have achieved the best balance in terms of performance, scalability and data integrity. To put it simply, the table design principle for databases that comply with the 3nf standard is: "One fact in one place", that is, a table only includes its basic attributes, when they are not their own properties, they need to be decomposed. The relationships between tables are connected by foreign keys. It has the following features: a group of tables specifically store the associated data connected by keys.
For example, a 3nf database storing customers and their related orders may have two tables: customer and order. The Order table does not contain any information about the customer associated with the order, but stores a key value. The key points to the row in the customer table that contains the customer information.
In fact, for the sake of efficiency, it is sometimes necessary not to standardize the table.
2) data-driven
Using data-driven, not hard-coded methods, many policy changes and maintenance are much easier, greatly enhancing system flexibility and scalability.
For example, if you want to access external data sources (files, XML documents, and other databases) on the user interface, you may wish to store the connection and path information in the user interface support table. In addition, if you execute tasks such as workflows on the user interface (such as sending emails, printing letterhead, and modifying record status), the workflow data can also be stored in the database. Role permission management can also be completed through data-driven. In fact, if the process is data-driven, you can push a considerable amount of responsibility to users to maintain their own workflow processes.
3) consider various changes
When designing a database, consider which data fields may change in the future.
For example, the last name is the same as the last name of a Western person, for example, the last name of a female after marriage ). Therefore, when a system is created to store customer information, the last name field is stored in a separate data table, and fields such as the start date and end date are appended, in this way, we can track the changes of this data entry.
Field Design Principles
4) three useful fields should be added to each table
•? Drecordcreationdate. In VB, the default value is now (), and in SQL Server, the default value is getdate ()
•? Srecordcreator, which defaults to not null Default User in SQL Server
•? Nrecordversion indicates the version of the record. It helps to accurately explain the causes of NULL data or data loss in the record.
5) use multiple fields for the address and phone number
It is not enough to describe the street address in just one line. Address_line1, address_line2, and address_line3 provide greater flexibility. In addition, it is recommended that you have your own data table with your phone number and email address.
6) use the role entity to define columns of a certain type
When defining things belonging to a specific category or with a specific role, you can use the role entity to create a specific time-related relationship, so as to achieve self-documenting.
For example, the person object and the person_type object are used to describe the person. For example, when John Smith and engineer are promoted to John Smith, Director and finally to John Smith, the CIO is high, all you need to do is change the key value of the relationship between person and person_type in two tables, and add a date/time field to know when the change will happen. In this way, your person_type table contains the possible types of all persons, such as associate, engineer, ctor, CIO, or CEO. Another alternative is to change the person record to reflect the change of the new title, but in this way, the specific time of the individual's position cannot be tracked in time.
7) Select numeric and text types as adequate as possible
Be careful when using smallint and tinyint types in SQL. For example, if you want to see the total monthly sales amount and the Field Type of the total amount is smallint, you cannot perform the calculation if the total amount exceeds $32,767.
Text fields of the ID type, such as the customer ID or order number, should be set to be larger than expected. Assume that the customer ID is 10 digits long. Then you should set the field length of the database table to 12 or 13 characters. However, this extra space can increase the database size without restructuring the entire database in the future.
8) add and delete tag Fields
The table contains a "delete tag" field to mark rows as deleted. Do not delete a single row in a relational database. It is best to clear the Data Program and carefully maintain the integrity of the index.
3. Select keys and indexes (Database Logic Design)
Key selection principles:
1) Key design 4 Principles
•? Create a foreign key for the associated field.
•? All keys must be unique.
•? Avoid using compound keys.
•? Foreign keys are always associated with unique key fields.
2) use the primary key generated by the system
When designing a database, the System-generated key is used as the primary key, which actually controls the index integrity of the database. In this way, the database and non-manual mechanism effectively control access to each row of stored data. Using the system-generated key as the primary key has another advantage: when having a consistent key structure, it is easy to find logical defects.
3) do not use the user's key (do not make the primary key updatable)
When determining the fields used as the table key, you must be careful with the fields to be edited. Generally, do not select an Editable field as the key.
4) Optional keys can sometimes be used as primary keys.
The optional keys can be further used as primary keys, which can be used to build powerful indexes.
Indexing principles:
Indexing is one of the most efficient ways to obtain data from a database. Index technology can be used to solve 95% of database performance problems.
1) the logical primary key uses a unique group index, and uses a unique non-group index for the system key (as a stored procedure) and a non-group index for any foreign key column. Consider the size of the database space, how tables are accessed, and whether these accesses are mainly used for reading and writing.
2) most databases index primary key fields automatically created, but do not forget to index Foreign keys, which are also frequently used keys, for example, running a query to display a record of the master table and all associated tables can be used.
3) do not index Memo/Note fields, or index large fields (with many characters). This will occupy too much storage space for indexes.
4) do not index frequently used small tables
Do not set any keys for small data tables. Do not do this if they are often inserted or deleted. The index maintenance for these insert and delete operations may consume more time than the scan tablespace.
4. Data Integrity Design (Database Logic Design)
1) Integrity implementation mechanism:
Entity Integrity: Primary Key
Integrity of reference:
Delete data in the parent table: cascade Delete; restricted Delete; null
Insert data in the parent table: limited insert; recursive insert
Update Data in the parent table: cascade update; restricted update; Null Value
DBMS provides two methods to achieve the integrity of the Reference: The foreign key implementation mechanism (constraint rules) and the trigger implementation mechanism.
User-Defined integrity:
Not NULL; check; trigger
2) Use constraints rather than business rules to force data integrity
The database system is used for data integrity. This includes not only integrity achieved through standardization, but also data functionality. When writing data, you can also add triggers to ensure data correctness. Do not rely on the business layer to ensure data integrity; it cannot ensure the integrity of (Foreign keys) between tables, so it cannot be imposed on other integrity rules.
3) mandatory indication integrity
Remove harmful data before it enters the database. Activate the indication integrity feature of the database system. In this way, data can be kept clean and developers can be forced to spend more time processing error conditions.
4) use search to control data integrity
The best way to control data integrity is to restrict user selection. Whenever possible, users should be provided with a clear value list for their choice. This reduces the errors and misunderstandings of the entered code and provides data consistency. Some public data is particularly suitable for searching: Country Code, Status Code, etc.
5) view
To provide another abstraction layer between the database and application code, you can create a special view for the application without requiring the application to directly access the data table. This gives you more freedom to handle database changes.
5. other design skills
1) Avoid using triggers
The functions of a trigger can be implemented in other ways. When debugging a program, the trigger may become interference. If you do need a trigger, you 'd better document it in a centralized manner.
2) use common English (or any other language) instead of coding
When creating drop-down menus, lists, and reports, it is best to sort the reports by English name. If encoding is required, you can attach the English language that the user knows to the encoding.
3) save common information
It is very useful to allow a table to store general database information. Store the current database version, recent check/repair (ACCESS), name of the associated design document, and customer information in this table. In this way, a simple mechanism can be implemented to track databases. When customers complain that their databases do not meet the expected requirements and contact you, this is especially useful for non-client/server environments.
4) include version Mechanism
Introduce the version control mechanism in the database to determine the version of the database in use. After a long period of time, users' needs will always change. In the end, you may need to modify the database structure. It is more convenient to store version information directly in the database.
5) document preparation
All shortcuts, naming conventions, restrictions, and functions must be documented.
Database tools that annotate tables, columns, and triggers are used. It is useful for development, support, and tracking and modification.
Document the database, or create documents in the database itself or separately. In this way, after more than a year, I will go back to the 2nd versions, and the chances of making mistakes will be greatly reduced.
6) tests, tests, and repeated tests
After creating or revising the database, you must use the new data test data field entered by the user. The most important thing is to allow users to perform tests and ensure that the selected data type meets the commercial requirements. The test should be completed before the new database is put into the actual service.
7) Check Design
The common technology used to check the database design during development is to check the database through its supported application prototype. In other words, for each prototype application that ultimately expresses data, you must check the data model and view how to retrieve the data.
Iii. Database naming rules
1. Object (table) Naming
1) The table is named by a noun or a noun phrase to determine whether the table name is in the form of a plural or singular. In addition, a simple rule is defined for the table alias (for example, if the table name is a word, an alias is the first four letters of a word. If the table name is two words, the first two letters of each word are used to form an alias with four letters; if the table name is composed of three words, take one of the first two words, and then extract two letters from the last word, the result is a 4-letter alias, and the rest are in turn)
For a working table, the table name can be prefixed with work _ followed by the name of the application using the table. During the naming process, you can piece together the abbreviations Based on the semantics. Note that the orcle field names must be in uppercase or lowercase letters, so the field names must be underlined.
Example:
Sales: Sal sales;
Order: ord order;
Detail: DTL details;
The sales order list is named sal_ord_dtl;
2) If the table or field name has only one word, we recommend that you use a complete word instead of an abbreviation.
Example:
The abbreviation of the definition is material Ma;
The item table name is material, not Ma.
However, the field item code is ma_id, not material_id.
3) Add the prefix Z to the table of all stored values.
The purpose is to sort these Value List classes at the end of the database.
4) Add the prefix X to the names of all redundant classes (mainly accumulative tables ).
The redundancy class is used to improve database efficiency. It is used to add fields or tables to a non-standardized database.
5) join classes connect two basic classes with underscores and add the prefix R to name them. The names of the two tables or abbreviations of the table names are listed in alphabetical order.
An association table is used to save many-to-many relationships.
If the associated table name is greater than 10 letters, the original table name must be abbreviated. If there are no other reasons, we recommend that you use abbreviations.
For example, if a table object has many-to-many relationships with itself, the name of the table that stores many-to-many relationships is r_object;
The table depart and employee have many-to-many relationships. The associated table is named r_dept_emp.
2. Name of attributes (columns)
1) Use meaningful column names. The columns in the table must adopt a complete set of design rules for keys. Each table will have an automatic ID as the primary key, and the logical primary key is defined as the first group of Candidate Primary keys. If the encoding is automatically generated by the database, it is uniformly named: ID; for custom logic encoding, use the abbreviation "ID. If the key is of the numeric type, you can use _ No as the suffix. If the key is of the character type, you can use the _ code suffix. Standard prefixes and suffixes should be used for column names.
For example, name sal_ord_id as the number field of a sales order. If there is an automatic number generated by a database, name it: ID.
2) Add the type suffix to all attributes. Note that if other suffixes are required, they are placed before the type suffix.
Note: The data type is a text field, and the type suffix TX can be left empty. Some fields with obvious types can be left with a type suffix.
3) prefix naming
Use a uniform prefix for the column names of each table, which greatly simplifies the preparation of SQL expressions. This operation also has some disadvantages. For example, the function of the automatic table Connection Tool is damaged, and the latter associates the public column names with some databases.
3. View naming
1) The view is prefixed with V. Other naming rules are similar to the table naming rules;
2) Naming should reflect the functions of each view as much as possible.
4. Trigger name
The trigger uses tr as the prefix. The trigger name is suffixed with the corresponding table name. The insert trigger is appended with '_ I', the delete trigger is appended with '_ d', and the update trigger is prefixed with' _ U ', for example, tr_customer_ I, tr_customer_d, and tr_customer_u.
5. Stored Procedure name
The stored procedure should start with 'up _ 'and be differentiated from the stored procedure of the system. The subsequent parts are mainly in the form of dynamic objects and are separated by underscores. For example, the stored procedure for adding an agent account is 'up _ ins_agent_account '.
6. variable name
The variable name is in lowercase. If it is in the phrase format, each word is separated by an underscore, for example, @ my_err_no.
7. Other considerations
1) The names must not exceed 30 characters. The variable name length is limited to 29 (excluding the identifier character @).
2) The names of Data Objects and variables are all English characters. Chinese names are prohibited. Do not leave spaces between characters in the object name.
3) Be careful when retaining words. Ensure that your field names do not conflict with the reserved words, database systems, or common access methods.
5) ensure the consistency of Field Names and types. Ensure consistency when naming fields and specifying data types for them. If the data type is an integer in one table, the Data Type in the other table should not be converted to the numeric type.