Database Design Process

Source: Internet
Author: User
I. Database Design Process


Database technology is the most effective means of information resource management.

Database Design refers to constructing the optimal database mode for a given application environment, establishing a database and its application system, and effectively storing data to meet user information requirements and processing requirements.

Database Design phases:

A. Demand analysis stage: comprehensive application requirements of various users (real-world requirements ).

B. In the conceptual design stage: A Conceptual Model (information world model) independent of machines and various DBMS products is formed, which is described by a E-R diagram.

C. In the logic design stage: converts the E-R diagram to the data model supported by the specific database product, such as the relational model, to form the database logic model. Then, based on the user's processing requirements and security considerations, the necessary VIEW is created on the basis of the basic table to form an external data mode.

D. In the physical design stage: arranges physical storage and designs indexes based on the characteristics and processing needs of DBMS to form the internal database mode.

1. Demand analysis stage

Requirement collection and analysis: the data requirement described by the data dictionary is obtained (and the processing requirement described by the data flow diagram ).

Requirements Analysis focuses on investigating, collecting, and analyzing users' information requirements, handling requirements, and security and integrity requirements in data management.

Requirement Analysis Methods: Investigate the organization and organization, the business activities of each department, assist the user to clarify various requirements for the new system, and determine the boundaries of the new system.

Common survey methods include: follow-up operations, opening survey meetings, inviting special personnel to introduce, ask, and designing surveys. Users are requested to fill in and check records.

The methods for analyzing and expressing user requirements mainly include top-down and bottom-up methods. The top-down Structured Analysis method (SA) starts with the upper-level system organization and analyzes the system layer by layer, the data flow diagram and data dictionary are used to describe each layer.

The data flow chart shows the relationship between the data and the processing process. The Data in the system is described using the Data Dictionary (DD.

2. Conceptual Structure Design Stage

A conceptual model independent of a specific DBMS is formed through the synthesis, induction and abstraction of user requirements, which can be expressed by a E-R diagram.

The conceptual model is used for modeling the information world. The conceptual model does not depend on the data model supported by a DBMS. A conceptual model can be converted to a specific data model supported by a DBMS on a computer.

Concept model features:

(1) Strong semantic expression ability, which can easily and directly express various semantic knowledge in applications.

(2) It should be simple, clear, and easy to understand. It is a language for communication between users and database designers.

A common method for conceptual model design is the IDEF1X method, which applies the entity-contact method to a Semantic Modeling Technology in the semantic data model and is used to establish a system information model.

Author: Xiao Ling, source: Forum, responsible editor: Li shuqin

This article details the database design process, design skills, and summarizes the database naming rules ......

2.1 Step 1 -- initialize the project

In this phase, the task starts from the description of the purpose and scope, determines the modeling target, develops the modeling plan, organizes the modeling team, collects the source material, and develops constraints and specifications. Collecting source materials is the focus of this phase. Through investigation and observation results, business processes, input and output of the original system, various reports, and collection of raw data, a basic data table is formed.

2.2 Step 1 -- Define an object

Entity integrators have a common set of features and attributes that can identify most entities directly or indirectly from the collected source material-basic data tables. Mark the entities represented by the terms in the source material name table that indicate things and terms that end with "code", such as customer code, agent code, and product code, in this way, potential entities are preliminarily identified to form a preliminary entity table.

2.3 Step 2 -- Define contact

Only binary connections are allowed in the IDEF1X model. n-element connections must be defined as n binary connections. Based on actual business requirements and rules, the entity contact matrix is used to identify the binary relationship between entities. Then, the potential, Link name, and description of the connection relationship are determined based on the actual situation, and the link type is determined, whether it is an identity link, a non-identity Link (mandatory or optional), a non-deterministic link, or a classification link. If each instance of the Child body needs to be identified by the relationship with the parent object, the relationship is identified, otherwise it is not identified. In a non-identity relationship, if the instance of each child entity is associated with only one parent entity, it is mandatory; otherwise, it is not mandatory. If the parent entity and the Child entity represent the same real object, they are classification relationships.

Step 3-definition code

Remove the non-deterministic relationship generated in the previous stage by introducing the cross entity, and then identify the Hou code attribute from the non-cross entity and the independent entity to uniquely identify the instance of each entity, then confirm the master code from the candidate code. To determine the validity of the master code and link, the non-empty rules and non-multi-value rules are used to ensure that an attribute of an object instance cannot be null, you cannot have more than one value at the same time. Identify the identified relationship by mistake, break down the entity, and finally construct the key base view of the IDEF1X model (KB chart ).

2.5 Step 4 -- Define attributes

Extract descriptive terms from the source data table to develop an Attribute Table and determine the attribute owner. Defines non-primary Code attributes and checks non-null and non-multi-value rules of attributes. In addition, check fully dependent function rules and non-pass dependency rules to ensure that a non-primary code attribute must depend on the primary code, the entire primary code, and only the primary code. In this way, the full attribute view of the improved IDEF1X model, which is at least in line with the third paradigm of the relational theory, is obtained.

Step 5: Define other objects and rules

Defines the data type, length, precision, non-null, default value, and constraint rules of an attribute. Defines trigger, stored procedure, view, role, synonym, sequence, and other object information.

3. Logical Structure Design Stage

Converts a conceptual structure to a data model (such as a relational model) supported by a DBMS and optimizes it. To design the logical structure, select the data model that best suits to describe and express the corresponding conceptual structure, and then select the most appropriate DBMS.

To convert a E-R diagram to a relational model is to convert the relationship between the object, the object attributes, and the object into a relational model. This conversion generally follows the following principles: A real shape is converted into a relational model. An object attribute is a link attribute. The entity code is the link code.

Optimize the data model, determine data dependencies, eliminate redundant connections, and determine the relationship patterns that belong to the nth paradigm. Determine whether to merge or break them down. In general, the relationship is divided into 3NF criteria, namely:

Each value in the table can be expressed only once.

Each row in the table should be uniquely identified (with a unique key ).

The table should not store non-key information that depends on other keys.

Author: Xiao Ling, source: Forum, responsible editor: Li shuqin

This article details the database design process, design skills, and summarizes the database naming rules ......

4. Physical database design phase

Select the physical structure (including the storage structure and access method) that best suits the application environment for the Logical Data Model ). Arranges physical storage and designs indexes based on DBMS features and processing needs to form the internal database mode.

5. database implementation stage

Use the Data Language (such as SQL) provided by DBMS and its host language (such as C) to establish a database based on the logic design and physical design results, compile and debug applications, and organize the data warehouse receiving, and run the test. Database implementation mainly includes the following tasks: defining the database structure with DDL, organizing Data warehouse receiving, compiling and debugging applications, and testing the database (Data Definition Language) used to open a new data table, set fields, delete data tables, delete fields, and manage all related database structures)

● Create (DDL is used to add something about the database structure)

● Drop (delete something about the database structure, DDL)

● Alter (Alter structure, DDL)

6. Database Operation and Maintenance

During the operation of the database system, it must be constantly evaluated, adjusted, and modified. The content includes: Database dumping and recovery, database security, integrity control, database performance monitoring, analysis and improvement, and database re-organization and re-construction.

7. Use of modeling tools

To speed up database design, there are currently many database auxiliary tools (CASE tools), such as Rational Rose, Erwin and Bpwin of CA, sybase PowerDesigner and Oracle Designer.

ERwin is mainly used to establish conceptual and physical models of databases. It can describe the attributes of objects, links, and objects in a graphical manner. ERwin supports the IDEF1X method. By using the ERwin modeling tool to automatically generate, change, and analyze the IDEF1X model, we can not only obtain excellent business functions and data requirement models, but also transform the IDEF1X model to the physical design of the database. The model drawn by ERwin tool corresponds to two types: Logical Model and physical model. In the logical model, the IDEF1X toolbox can be used to easily create and draw object links and attributes in a graphical manner. In the physical model, ERwin can define the corresponding tables and columns and automatically convert them to appropriate types for various database management systems.

Designers can select appropriate database design modeling tools as needed. For example, after the requirement analysis is complete, the designer can use Erwin to draw an erdiagram, convert the erdiagram into a relational data model, generate a database structure, draw a data flow diagram, and generate an application.

2. Database Design Skills

1. Before designing the database (demand analysis stage)

1) understand customer requirements, including future changes to user needs.

2) understand the business types of enterprises and save a lot of time in the development phase.

3) Pay attention to input (the data to be recorded) and output (reports, queries, and views ).

4) create a data dictionary and ER chart

A Data Dictionary (DD) is a collection of various Data descriptions. It is a description of Data in a database, that is, metadata, not the Data itself. (At least the Data Type of each field and the Primary and Foreign keys in each table should be included ).

Data item description: data item name, description, alias, data type, length, value range, value meaning, logical relationship with other data items

Data Structure Description: Data Structure name, meaning description, composition: [data item or data structure]

Data Stream Description: Data Stream name, description, data stream source, data stream direction, composition: [data structure], average traffic, peak traffic

Data storage Description: Data Storage name, description, number, inbound data stream, outbound data stream, composition: [data structure], data volume, access method

Process description: process name, description, input: [data stream], output: [data stream], processing: [brief description]

ER charts and data dictionaries allow anyone who knows the database to clearly learn how to  from the database. The erdiagram is useful for displaying the relationship between tables, while the data dictionary describes the purpose of each field and any possible aliases. It is absolutely necessary to document SQL expressions.

5) define standard object naming rules

The names of various objects in the database must be standardized.

Author: Xiao Ling, source: Forum, responsible editor: Li shuqin

This article details the database design process, design skills, and summarizes the database naming rules ......

2. Table and field design (Database Logic Design)

Table Design Principles

1) standardization and standardization

Data Standardization helps eliminate data redundancy in databases. There are several forms of standardization, but Third Normal Form (3NF) is generally considered to have achieved the best balance in terms of performance, scalability and data integrity. To put it simply, the table design principle for databases that comply with the 3NF standard is: "One Fact in One Place", that is, a table only includes its basic attributes, when they are not their own properties, they need to be decomposed. The relationships between tables are connected by foreign keys. It has the following features: a group of tables specifically store the associated data connected by keys.

2) data-driven

Using data-driven, not hard-coded methods, many policy changes and maintenance are much easier, greatly enhancing system flexibility and scalability.

For example, if you want to access external data sources (files, XML documents, and other databases) on the user interface, you may wish to store the connection and path information in the table supported by the user interface. If you execute tasks such as workflows on the user interface (such as sending emails, printing letterhead, and modifying record status), the data generated for workflows can also be stored in the database. Role permission management can also be completed through data-driven. In fact, if the process is data-driven, you can push a considerable amount of responsibility to users to maintain their own workflow processes.

3) consider various changes

When designing a database, consider which data fields may change in the future.

4) naming conventions for table name, report name, and query name

(Prefix naming) the naming rules between the checklist name, report name, and query name. You may soon be confused by the names of these different database elements. You can name different components of these databases in a unified manner. At least you should use Table, Query, Report, and other prefixes at the beginning of these object names to differentiate them. If Microsoft Access is used, you can use symbols such as qry, rpt, tbl, and mod to identify objects (such as tbl_Employees ). Use sp_company to identify a stored procedure, and use udf _ (or similar tag) to identify a udf.

Field Design Principles:

1) three useful fields should be added to each table.

DRecordCreationDate. The default value is GETDATE () in SQL Server ()

SRecordCreator, which defaults to not null default user in SQL Server

NRecordVersion indicates the version of the record. It helps to accurately explain the causes of null data or data loss in the record.

Timeliness data should include the "Last updated date/time" field. Time tag is particularly useful for finding the cause of data problems, performing a date-based re-processing/reload of data, and clearing old data.

2) use multiple fields for the address and phone number

It is not enough to describe the street address in just one line. Address_Line1, Address_Line2, and Address_Line3 provide greater flexibility. In addition, it is recommended that you have your own data table with your phone number and email address.

3) Naming rules for the column [field] in the table (with prefix/suffix) and meaningful field names

Standard prefixes and suffixes should be used for column [field] names. For example, if the key type is a number, use the _ N suffix, character type: _ C suffix, and date type: _ D suffix. For another example, if your table contains many "money" fields, you may add a _ M suffix to each [field.

Author: Xiao Ling, source: Forum, responsible editor: Li shuqin

This article details the database design process, design skills, and summarizes the database naming rules ......

Assume there are two tables:

Customer and Order. The prefix of the Customer table is cu _. Therefore, the sub-segment names in the table are as follows: cu_name_id, cu_surname, cu_initials, and cu_address. The prefix of the Order table is or _, so the sub-segment name is:

Or_order_id, or_cust_name_id, or_quantity, and or_description.

In this way, the SQL statement for selecting all data from the database can be written as follows:

Select * From Customer, Order Where cu_surname = "MYNAME ";

And cu_name_id = or_cust_name_id and or_quantity = 1

If these prefixes are not available, write them as follows (alias is used to differentiate them ):

Select * From Customer, Order Where Customer. surname = "MYNAME ";

And Customer. name_id = Order. cust_name_id and Order. quantity = 1

The number of characters in the 1st SQL statements is not small. However, if the query involves five tables or more columns [fields], you will know how useful this technique is.

5) The length of the selected number type and text type should be as adequate as possible

Assume that the customer ID is 10 digits long. Then you should set the field length of the database table to 12 or 13 characters. However, this extra space can increase the database size without restructuring the entire database in the future.

6) add and delete tag Fields

The table contains a "delete tag" field to mark rows as deleted. Do not delete a single row in a relational database. It is best to clear the Data Program and carefully maintain the integrity of the index.

7) use case-insensitive object names and special characters

Use all uppercase letters and the name containing the delimiter has better readability (CUSTOMER_DATA). Do not leave spaces between characters of the Object Name.

8) Retain words with caution

Make sure that your field name does not conflict with the reserved word, database system, or common access method. For example, DESC is used as the description field name. The consequences can be imagined! DESC is the reserved word after DESCENDING. A select * Statement in the table can be used, but a lot of useless information is obtained.

9) maintain the consistency of Field Names and types.

Ensure consistency when naming a field and specifying its data type. If the field in table 1 is called "agreement_number", do not change the name to "ref1" in table 2 ". If the data type is an integer in Table 1, do not change it to positive in table 2. Of course, a key ID exists in table 1 (ABC). For readability, you can name it ABC_ID when associating table 2.

10) Avoid using triggers

The functions of a trigger can be implemented in other ways. When debugging a program, the trigger may become interference. If you do need a trigger, you 'd better document it in a centralized manner.

This article details the database design process, design skills, and summarizes the database naming rules ......

3. Select keys and indexes (Database Logic Design)

Reference: SQL Optimization-Index

4. Data Integrity Design (Database Logic Design)

1) Integrity implementation mechanism:

Entity Integrity: Primary Key

Integrity of reference:

Delete data in the parent table: cascade Delete; restricted Delete; null

Insert data in the parent table: limited insert; recursive insert

Update Data in the parent table: cascade update; restricted update; Null Value

DBMS can implement the following two methods: foreign key implementation mechanism (constraint rule) and trigger implementation mechanism. User-Defined integrity:

Not null; CHECK; trigger

2) Use constraints rather than business rules to force data integrity

The database system is used for data integrity. This includes not only integrity achieved through standardization, but also data functionality. Do not rely on the business layer to ensure data integrity; it cannot guarantee the integrity between tables (Foreign keys), so it cannot be imposed on other integrity rules. If you use constraints on the data layer, make sure that you have a way to notify the user interface of the reasons why the update cannot pass the constraints check in a language that the user understands.

3) mandatory indication integrity

Remove harmful data before it enters the database. Activate the indication integrity feature of the database system. In this way, data can be kept clean and developers can be forced to spend more time processing error conditions.

4) use search to control data integrity

The best way to control data integrity is to restrict user selection. Whenever possible, users should be provided with a clear value list for their choice. This reduces the errors and misunderstandings of the entered code and provides data consistency. Some public data is particularly suitable for searching: Country Code, Status Code, etc.

5) view

To provide another abstraction layer between the database and application code, you can create a special view for the application without requiring the application to directly access the data table. This gives you more freedom to handle database changes.

6) Distributed Data System

For a distributed system, estimate the data volume in the next five or ten years before you decide whether to copy all the data on each site or store the data in one place. When you transfer data to other sites, it is best to set some tags in the database field to update your tags after the target site receives your data. For this data transmission, write down your own batch processing or scheduling program to run at a specific interval instead of letting users transmit data after daily work. Copy your maintenance data locally, such as calculating constants and interest rates. Set the version number to ensure that the data is completely consistent on each site.

7) relationship

If there is a many-to-one relationship between two entities and it may be converted to many-to-many relationships, you 'd better set them to many-to-many relationships at the beginning. Switching from the existing multi-to-one relationship to the multi-to-many relationship is much harder than the first one.

8) plan data retention and recovery

Consider data retention policies and include them in the design process to design your data recovery process in advance. The data dictionary that can be released to users/developers is used for convenient data identification and data source documentation. Write online updates to "Update queries" for future updates if data is lost.

9) react the system with Stored Procedures

Provides a set of general stored procedures to access each group to speed up and simplify the development of customer program code. A database is not only a place to store data, but also a place to simplify coding.

This article details the database design process, design skills, and summarizes the database naming rules ......

5. other design skills

1) Avoid using triggers

The functions of a trigger can be implemented in other ways. When debugging a program, the trigger may become interference. If you do need a trigger, you 'd better document it in a centralized manner.

2) use common English (or any other language) instead of coding

When creating drop-down menus, lists, and reports, it is best to sort them by English name. If encoding is required, you can attach the English language that the user knows to the encoding.

3) save common information

It is very useful to allow a table to store general database information. Store the current database version, recent check/repair (Access), name of the associated design document, and customer information in this table. In this way, a simple mechanism can be implemented to track databases. When customers complain that their databases do not meet the expected requirements and contact you, this is especially useful for non-client/server environments.

4) include version Mechanism

Introduce the version control mechanism in the database to determine the version of the database in use. After a long period of time, users' needs will always change. In the end, you may need to modify the database structure. It is more convenient to store version information directly in the database.

5) document preparation

All shortcuts, naming conventions, restrictions, and functions must be documented.

Database tools that annotate tables, columns, and triggers are used. It is useful for development, support, and tracking and modification.

Document the database, or create documents in the database itself or separately. In this way, after more than a year, I will go back to the 2nd versions, and the chances of making mistakes will be greatly reduced.

6) tests, tests, and repeated tests

After creating or revising the database, you must use the new data test data field entered by the user. The most important thing is to allow users to perform tests and ensure that the selected data type meets the commercial requirements. The test should be completed before the new database is put into the actual service.

7) Check Design

The common technology used to check the database design during development is to check the database through its supported application prototype. In other words, for each prototype application that ultimately expresses data, you must check the data model and view how to retrieve the data.

Iii. Database naming rules

1. Object (table) Naming

1) The table is named by a noun or a noun phrase to determine whether the table name is in the form of a plural or singular. In addition, a simple rule is defined for the table alias (for example, if the table name is a word, an alias is the first four letters of a word. If the table name is two words, the first two letters of each word are used to form an alias with four letters; if the table name is composed of three words, take one of the first two words, and then extract two letters from the last word, the result is a 4-letter alias, and the rest are in turn)

For a working table, the table name can be prefixed with WORK _ followed by the name of the application using the table. During the naming process, you can piece together the abbreviations Based on the semantics. Note: The field names are unified into uppercase or lowercase letters, so they are underlined in the middle.

This article details the database design process, design skills, and summarizes the database naming rules ......

Example:

Sales: Sal Sales;

Order: Ord Order;

Detail: Dtl details;

The sales order list is named Sal_Ord_Dtl;

2) If the table or field name has only one word, we recommend that you use a complete word instead of an abbreviation.

Example:

The abbreviation of the definition is Material Ma;

The item table name is Material, not Ma.

However, the field item code is Ma_ID, not Material_ID.

3) Add the prefix Z to the table of all stored values.

The purpose is to sort these Value List classes at the end of the database.

4) Add the prefix X to the names of all redundant classes (mainly accumulative tables ).

The redundancy class is used to improve database efficiency. It is used to add fields or tables to a non-standardized database.

5) join classes connect two basic classes with underscores and add the prefix R to name them. The names of the two tables or abbreviations of the table names are listed in alphabetical order.

An association table is used to save many-to-many relationships.

If the associated table name is greater than 10 letters, the original table name must be abbreviated. If there are no other reasons, we recommend that you use abbreviations.

For example, if a table Object has many-to-many relationships with itself, the name of the table that stores many-to-many relationships is R_Object;

Author: Xiao Ling, source: Forum, responsible editor: Li shuqin

This article details the database design process, design skills, and summarizes the database naming rules ......

2. Name of attributes (columns)

1) Use meaningful column names

The columns in the table must adopt a complete set of design rules for keys. Each table uses an automatic ID as the primary key, and the logical primary key is defined as the first group of candidates;

A. For the encoding automatically generated by the database, the uniform name is: ID

B. For custom logic encoding, use the abbreviation "ID", that is, "XXXX_ID"

C. If the key is a number, you can use _ NO as the suffix;

D. If it is a character type, use the _ CODE suffix.

E. Use standard prefixes and suffixes for column names.

For example, name Sal_Ord_ID as the number field of a sales order. If there is an automatic number generated by a database, name it: ID.

2) All attributes are suffixed with related types.

NOTE: If other suffixes are required, they are placed before the type suffix.

Note: The data type is a text field, and the type suffix TX can be left empty. Some fields with obvious types can be left with a type suffix.

3) prefix naming

Use a uniform prefix for the column names of each table, which greatly simplifies the preparation of SQL expressions. This operation also has some disadvantages. For example, the function of the automatic table Connection Tool is damaged, and the latter associates the public column names with some databases.

3. View naming

1) The view is prefixed with V. Other naming rules are similar to the table naming rules;

2) Naming should reflect the functions of each view as much as possible.

4. Trigger naming (do not use it whenever possible)

The trigger uses TR as the prefix. The trigger name is suffixed with the corresponding table name. The Insert trigger is appended with '_ I', the Delete trigger is appended with '_ d', and the Update trigger is prefixed with' _ U ', for example, TR_Customer_ I, TR_Customer_D, and TR_Customer_U.

5. Stored Procedure name

The stored procedure should start with 'up _ 'and be differentiated from the stored procedure of the system. The subsequent parts are mainly in the form of dynamic objects and are separated by underscores. For example, the stored procedure for adding an agent account is 'up _ Ins_Agent_Account '.

6. variable name

The variable name is in lowercase. If it is in the phrase format, each word is separated by an underscore, for example, @ my_err_no.

7. Other considerations

1) The names must not exceed 30 characters. The variable name length is limited to 29 (excluding the identifier character @).

2) The names of Data Objects and variables are all English characters. Chinese names are prohibited. Do not leave spaces between characters in the object name.

3) Be careful when retaining words. Ensure that your field names do not conflict with the reserved words, database systems, or common access methods.

4) maintain the consistency of Field Names and types. Ensure consistency when naming fields and specifying data types for them. If the data type is an integer in one table, the Data Type in the other table should not be converted to the numeric type.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.