The process of database design
Database technology is the most effective means of information resource management. Database design refers to a given application environment, constructs the best database model, establishes the database and its application system, effectively stores the data, satisfies the user information request and the processing request.
In the database design, the requirement Analysis stage integrates the application requirements of each user (real world demand), and forms the concept model (Information World model) which is independent of the machine characteristic and independent of each DBMS product in the conceptual design stage, and is described by E-R diagram. In the logic design stage, the E-R diagram is transformed into the data model of the specific database product support, such as the relational model, which forms the database logic model. Then according to the user processing requirements, security considerations, based on the basic table and then establish the necessary views (view) to form the external model of data. In the physical design phase, according to the characteristics of DBMS and the need of processing, physical storage arrangement, design index, form the database model.
1. Requirements Analysis Phase
Requirements collection and analysis results from data dictionary Description of the data requirements (and flow diagram description of processing requirements).
The focus of the requirement analysis is to investigate, collect and analyze the information requirements, processing requirements, security and integrity requirements of users in data management.
Requirements Analysis Methods: Investigate the organizational situation, investigate the business activities of various departments, assist users to clear the new system requirements, determine the boundaries of the new system.
Commonly used methods of investigation are: Valet homework, open an investigation, please introduce, ask, design questionnaires please fill in, check the record.
The methods of analyzing and expressing users ' requirements mainly include Top-down and bottom-up methods. The Top-down structural analysis Method (structured, SA method) starts from the top-level system organization, uses the method of layer decomposition to analyze the system, and describes each layer with data flow diagram and a database dictionary.
The data flow diagram expresses the relationship between the information and the processing process. The data in the system is described with the aid of a data dictionary (Dictionary, abbreviated as DD).
A data dictionary is a collection of data descriptions that describe the data in a database, that is, metadata, not the data itself. Data dictionaries typically include five parts of data items, data structures, data flows, data storage, and processing (at least the data type of each field should be included and the primary foreign key in each table).
Data item Description ={data item name, data item meaning description, alias, data type, length,
Value range, value meaning, logical relationship to other data items}
Data structure Description ={data structure name, meaning description, composition: {Date Item or data structure}}
Data Flow description ={data stream name, description, data stream source, data flow whereabouts,
Composition: {Data structure}, average flow, peak traffic}
The data store describes the ={data store name, description, number, flow of data stream, streaming of data,
Composition: {Data structure}, amount of information, access mode}
Process description ={Process name, description, input: {Data flow}, output: {Data Flow},
Processing: {Brief description}}
2. Conceptual structure Design phase
Through the synthesis, induction and abstraction of user requirements, a conceptual model is formed, which can be represented by e-R diagram.
Conceptual models are used for modeling the information world. The conceptual model does not depend on a data model supported by one DBMS. A conceptual model can be converted to a specific data model supported by a DBMS on a computer.
Conceptual model Features:
(1) has the strong semantic expression ability, can express the various semantic knowledge in the application conveniently and directly.
(2) should be simple, clear, easy to understand users, users and database designers to communicate between the language.
A common method of conceptual model design is the IDEF1X method, which is a semantic modeling technique applied to the semantic data model by the entity-contact method, which is used to establish the system Information model.
The steps to create an E-R model using the IDEF1X method are as follows:
2.1 The 0th step--Initialization of the project
The task at this stage is to start with the objective description and scope description, to identify the modeling objectives, to develop the modeling plan, to organize the modeling team, to collect source materials, and to make constraints and specifications. Collecting source materials is the focus of this phase. Through the investigation and observation results, business processes, the original system input and output, various reports, the collection of raw data, the formation of a basic data table.
2.2 First step-defining entities
Entity set members have a common set of characteristics and attributes that can identify most entities directly or indirectly from the source material that is collected-the basic data table. According to the terms of the source material name list and the term with "code", such as customer code, Agent code, product code, etc., identify the entity represented by the noun part, thus preliminarily identify the potential entity and form the preliminary entity table.
2.3 Step Two-defining the connection
The IDEF1X model only allows two yuan to contact, n-ary connection must be defined as n two-yuan connection. According to the actual business requirements and rules, the entity contact matrix is used to identify the two-element relationship between entities, then the potential, relation name and description of the connection relation are determined according to the actual situation, and the relation type is identified, which is the identification relationship, the non identity relation (mandatory or optional) or the uncertain relation and the classification relation. If each instance of the child entity needs to be identified by its relationship to the parent entity, it is an identity relationship or it is not an identity relationship. In a non-identity relationship, if an instance of each child entity is associated with and only one parent entity, it is mandatory, otherwise it is not mandatory. If the parent entity represents the same real object as the child entity, then they are categorized.
2.4 Third Step-definition code
By introducing the cross entity to remove the indeterminate relationship from the previous stage, the candidate attributes are identified from the non-cross entities and independent entities to uniquely identify the instance of each entity and then determine the main code from the candidate code. In order to determine the validity of the principal code and the relationship, we guarantee that a property of an entity instance cannot be null or have more than one value at the same time by means of a non-empty rule and a non multivalued rule. Identify the mistaken relationship, further decompose the entity, and finally construct the key base view (KB graph) of the IDEF1X model.
2.5 Step fourth-defining attributes
Extract descriptive nouns from the source datasheet to develop the property sheet and determine the owner of the property. Defines a non-primary code property that checks for Non-null and non-multivalued rules for attributes. In addition, it is necessary to check the full dependency function rule and the non-transitive dependency rule to ensure that a non-code attribute must be dependent on the main code, the whole main code, and only the main code. A full attribute view of the improved IDEF1X model, at least in accordance with the third normal form of the relational theory, is obtained.
2.6 Step Fifth-defining other objects and rules
Defines the data type, length, precision, non-null, default value, constraint rule, and so on of the property. Define object information such as triggers, stored procedures, views, roles, synonyms, sequences, and so on.
3. Logical Structure design phase
Transform the conceptual structure into a data model (such as a relational model) supported by a DBMS and optimize it. The design logic structure should select the data model that best describes and express the corresponding conceptual structure, and then select the most appropriate DBMS.
To transform e-R diagram into relational model is to transform the relation between entity, entity and entity into relational mode, which generally follows the following principles:
1 A solid type is converted to a relational pattern. A property of an entity is a property of a relationship. The code of the entity is the code of the relationship.
2 A M:N connection is converted to a relational pattern. The code of each entity associated with the connection and the properties of the contact itself are converted to the properties of the relationship. The code of the relationship is the combination of each entity code.
3 A 1:n connection can be converted to a separate relational pattern, or it can be merged with the corresponding relational schema of N-terminal. If you convert to a separate relational schema, the code for each entity connected to the connection and the properties of the contact itself are converted to the properties of the relationship, and the code for the relationship is the code for the N-terminal entity.
4 A 1:1 relationship can be converted to a separate relational schema, or it can be merged with the corresponding relational schema at either end.
5) A multiple relationship between three or three entities is converted to a relational pattern. The code of each entity connected to the multivariate connection and the properties of the contact itself are converted to the properties of the relationship. The code of the relationship is the combination of each entity code.
6 The relationship between entities of the same entity set, that is, from the contact, can also be treated according to the above 1:1, 1:n and m:n three kinds of cases respectively.
7 A relational pattern with the same code can be merged.
In order to improve the performance of database application system, it is usually guided by the normalization theory, and the structure of data model should be modified and adjusted appropriately, which is the optimization of data model. Determine data dependencies. Eliminates redundancy of connections. Determine each relational pattern belongs to the first paradigm. Determine if you want to merge or decompose them. In general, the relationship is broken down into the standard of 3NF, namely:
Each value in the table can only be expressed once.
Each row in the table should be uniquely identified (with a unique key).
Non-key information that depends on other keys should not be stored in the table.
4. Database Physical Design Phase
Select a physical structure (including storage structure and access method) that is best suited to the application environment for the logical data model. According to the characteristics of DBMS and the need of processing, physical storage arrangement, design index, form the database model.
5. Database implementation phase
Using the data language provided by DBMS (such as SQL) and its host language (such as C), the database is established according to the logical design and physical design results, the application is compiled and debugged, the data is stored and the test run is carried out. Database implementation mainly includes the following tasks: Defining database structure with DDL, organizing data warehousing, compiling and debugging application, database test running
6. Database operation and Maintenance phase
The database application system can be put into operation after trial operation. In the process of database system operation, it must be evaluated, adjusted and modified continuously. Including: Database's dump and restore, database security, Integrity control, database performance monitoring, analysis and improvement, database reconfiguration and reconstruction.
Use of modeling tools
To speed up database design, there are currently a number of database AIDS (case tools), such as Erwin and Bpwin of rational Rose,ca Company of Rational Company, The PowerDesigner of Sybase and Oracle Designer of Oracle Company.
Erwin is mainly used to establish the conceptual model and physical model of the database. It can graphically describe the properties of entities, relationships, and entities. Erwin supports IDEF1X methods. By using the Erwin Modeling tool to automatically generate, change, and analyze the IDEF1X model, we can not only get excellent business function and data requirement model, but also realize the transformation from IDEF1X model to database physical design. The Erwin tool draws models that correspond to both logical and physical models. In the logical model, the IDEF1X Toolbox makes it easy to graphically build and draw entity relationships and attributes of entities. In the physical model, Erwin can define corresponding tables and columns, and can be automatically converted to the appropriate type for various database management systems.
Designers can choose the corresponding database design modeling tool according to the need. For example, after the requirement analysis is completed, the designer can use Erwin to draw ER graph, transform ER diagram into relational data model, build database structure, draw data flow diagram, build application.
Second, database design skills
1. Before designing the database (Requirements analysis phase)
1 understand customer needs, ask users how to view the future changes in demand. Let customers explain their needs, and as development continues, ask customers frequently to ensure that their needs are still being developed.
2 Understand the enterprise business can save a lot of time in the future development phase.
3) Attach importance to input and output.
When defining database tables and field requirements (input), you should first examine existing or designed reports, queries, and views (outputs) to determine which tables and fields are necessary to support these outputs.
For example: If a customer needs a report to sort, segment, and sum the ZIP code, make sure to include a separate ZIP code field instead of Leelawadee the ZIP code into the Address field.
4 Create a data dictionary and ER Chart
ER charts and data dictionaries make it clear to anyone who knows the database how to get data from the database. Er diagrams are useful for indicating relationships between tables, while data dictionaries describe the purpose of each field and any aliases that may exist. This is absolutely necessary for the documentation of SQL expressions.
5 Define standard Object naming conventions
The naming of various objects in the database must be canonical.
2. Table and field design (Database logic design)
Table Design Principles
1) Standardization and normalization
The standardization of data helps to eliminate data redundancy in the database. There are several forms of standardization, but third normal Form (3NF) is often considered to be the best balance in performance, extensibility, and data integrity. In simple terms, the table design principle for a database that adheres to the 3NF standard is that "one Fact in" is a table that contains only its own basic properties and is decomposed when it is not a property of their own. The relationship between tables is connected by a foreign key. It has the following features: There is a set of tables that specifically store the associated data connected by a key.
For example: A 3NF database that holds customers and their related orders may have two tables: Customer and order. The order table does not contain any information about the orders associated with the customer, but the table
Holds a key value that points to the line in the Customer table that contains the client information.
In fact, it is sometimes necessary to standardize a table for efficiency reasons.
2 Data-driven
With data-driven rather than hard coding, many policy changes and maintenance will be much more convenient, greatly enhancing the flexibility and scalability of the system.
For example, if the user interface to access external data sources (files, XML documents, other databases, etc.), it may be appropriate to store the connection and path information in the User Interface support table. Also, if the user interface executes tasks such as workflow (sending mail, printing letterhead, modifying record status, etc.), the data that generates the workflow can also be stored in the database. Role Rights management can also be done through data-driven. In fact, if the process is data-driven, you can push a lot of responsibility to the user and the user to maintain their workflow process.
3 Consideration of changes
When designing a database, consider which data fields may change in the future.
For example, the surname is the case (note is the western surname, such as women married after the husband surname, etc.). So, when you build a system to store customer information, you store the last Name field in a separate data table, plus fields such as start and end days, so that you can track changes in this data entry.
Field design principles
4 the 3 useful fields that should be added to each table
drecordcreationdate, in VB under the default is now (), and in SQL Server defaults to GETDATE ()
srecordcreator, defaults to not NULL default USER under SQL Server
nrecordversion, a recorded version mark, which helps to accurately describe the reason for null data or data loss in the record
5 Apply multiple fields to address and telephone
It is not enough to describe a street address on a single line. Address_line1, Address_line2 and Address_line3 can provide greater flexibility. Also, phone numbers and mailing addresses have their own data tables, with their own types and tag categories.
6 Use Role entities to define columns belonging to a category
When you need to define things that belong to a particular category or have a specific role, you can use a role entity to create a specific time association relationship that enables you to document yourself.
Example: the person entity and the Person_type entity are used to describe the personnel. For example, when John Smith, Engineer promoted to John Smith, Director, and eventually climbed to John Smith, the CIO's high, all you had to do was change the key value of the relationship between two table person and Person_type, and Add a Date/Time field to know when the change occurred. In this way, your Person_type table contains all the possible types of person, such as associate, Engineer, Director, CIO, or CEO. Another alternative would be to change the person record to reflect the change in the new title, but it would not be possible to track the time of the individual's position in time.
7 Select the number type and text type as much as possible
Use smallint and tinyint types in SQL with special care. For example, if you want to see the total monthly sales, the total field type is smallint, then, if the total exceeds the $32,767 can not be calculated operations.
A text field of type ID, such as a customer ID or order number, should be set larger than the general imagination. Assume a customer ID of 10 digits long. Then you should set the database table field length to 12 or 13 characters long. But this extra space does not need to refactor the entire database in the future to achieve the growth of the database scale.
8 Add delete tag field
Include a delete tag field in the table so that you can mark the row for deletion. Do not delete a row individually in a relational database, preferably with a data-clearing program and carefully maintain the integrity of the index.
3. Select keys and indexes (Database logic design)
Key Selection principle:
1 Key Design 4 principle
creates a foreign key for the associated field.
All keys must be unique.
Avoid using composite keys.
The foreign Key always associates a unique key field.
2 using system-generated primary keys
When designing a database, the system-generated key is used as the primary key, then the index integrity of the database is actually controlled. In this way, the database and the non human mechanisms effectively control access to each row in the stored data. The advantage of using a system-generated key as a primary key is that it is easy to find a logical defect when you have a consistent key structure.
3 Do not use the user's key (do not allow the primary key to be updatable)
When deciding what field to use as a key for a table, be sure to be careful what fields the user will edit. In general, do not select a user-editable field as the key.
4 optional keys can sometimes be the key
The option key to further use the master key, you can have the ability to build a strong index.
Index usage principles:
An index is one of the most efficient ways to get data from a database. 95% of database performance problems can be solved by indexing technology.
1 The logical primary key uses a unique group index, which takes a unique, nonclustered index on the system key (as a stored procedure) and a non group index on any foreign key columns. Consider how much space the database has, how the tables are accessed, and whether these accesses are primarily used for reading and writing.
2 most databases index automatically created primary key fields, but don't forget to index foreign keys, which are also frequently used keys, such as running a query that shows a record of the primary table and all associated tables.
3 Do not index memo/note fields, do not index large fields (there are many characters), this will make the index takes up too much storage space.
4 do not index the use of small tables
Do not set any keys for small data tables, even if they often have inserts and deletes. The index maintenance for these insert and delete operations may consume more time than the scan table space.
4. Data integrity Design (database logic design)
1) Integrity Implementation Mechanism:
Entity integrity: Primary key
Referential integrity:
Delete data in parent table: cascading delete; restricted deletion; null value
Insert data in parent table: Restricted insert; recursive insert
Update data in parent table: Cascade update; limited update; NULL value
There are two ways for DBMS to realize referential integrity: foreign key implementation mechanism (constraint rule) and trigger implementation mechanism
User-defined integrity:
Not null;check; triggers
2 Enforce data integrity with constraints rather than business rules
Using database system to achieve data integrity. This includes not only the integrity of the implementation through standardization but also the functionality of the data. When writing data, you can also increase the trigger to ensure the correctness of the data. Do not rely on the business layer to ensure data integrity; it does not guarantee the integrity of the tables (foreign keys) so it cannot be imposed on other integrity rules.
3 Force Indication Integrity
Remove unwanted data before it enters the database. Activates the indication integrity characteristics of the database system. This keeps the data clean and forces developers to devote more time to handling the error conditions.
4 Use Lookup to control data integrity
The best way to control data integrity is to limit the user's choice. Whenever possible, you should provide a clear list of values for the user to choose from. This will reduce the error and misunderstanding of the typing code while providing consistent data. Some public data is especially good for finding: Country code, status code, and so on.
5 Use View
To provide another layer of abstraction between the database and the application code, you can create a specialized view of your application without having to access the data table directly from the application. Doing so would also give you more freedom when dealing with database changes.
5. Other design techniques
1) Avoid using triggers
The functionality of triggers can often be implemented in other ways. Triggers can become interference when the program is debugged. If you really need to use triggers, you'd better focus on documenting them.
2 use common English (or any other language) rather than using code
It is best to sort by English name when creating Drop-down menus, lists, reports. If you need to code, you can attach the code next to the user knows English.
3) Save the commonly used information
It is useful to have a table that stores general database information specifically. This table holds the current version of the database, the most recent check/fix (for access), the name of the associated design document, the customer, and so on. This enables a simple mechanism to track the database, which is especially useful for non-client/server environments when customers complain that their database is not meeting the desired requirements.
4) containing the version mechanism
A version control mechanism is introduced into the database to determine the version of the database in use. A long time, the user's needs will always change. The database structure may eventually be required to be modified. It is more convenient to deposit version information directly into the database.
5) Document Preparation
Documentation is documented for all shortcuts, naming conventions, restrictions, and functions.
Use database Tools that annotate tables, columns, triggers, and so on. is useful for development, support, and tracking modifications.
Document the database, either inside the database itself or in a separate document. In this way, after more than a year and then go back to the 2nd version, the opportunity to err will be greatly reduced.
6 testing, testing, repeated testing
After you have established or revised a database, you must test the data field with the new data entered by the user. Most importantly, let the user test and work with the user to ensure that the selected data type meets the business requirements. Testing needs to be done before the new database is put into actual service.
7) Inspection Design
A common technique for checking database design during development is to check the database through its supported application prototypes. In other words, the prototype application for each final expression data ensures that you examine the data model and see how to get the data out.
Third, the database naming specification
1. Name of entity (table)
1 The table is named by noun or noun phrase to determine whether the table name is in plural or singular form, in addition, define a simple rule for the alias of the table (for example, if the table name is a word, the alias takes the first 4 letters of the word; if the table name is two words, the first two letters of the two words are each named as the 4-letter-long alias.) If the name of the table consists of 3 words, one from the two words from the beginning and then another two letters from the last word, the result is a 4-letter alias, and so on.
For work tables, the table name can be prefixed with Work_ appended with the name of the application that uses the table. In the naming process, the abbreviations can be pieced together according to semantics. Note that because Orcle unifies the field names into either uppercase or lowercase, the underscore is required.
Example:
Defined abbreviations for sales:sal sales;
Order:ord orders;
DETAIL:DTL details;
The sales order BOM is named: SAL_ORD_DTL;
2 If the name of a table or field has only one word, it is recommended not to use abbreviations, but to use complete words.
Example:
defined abbreviation Material Ma goods;
The Item table is named: Material, not Ma.
But the field code is: ma_id, not material_id.
3 All the tables that store the values list prefix Z
The goal is to sort the values list classes at the end of the database.
4 The name of all redundant classes (mainly cumulative table) preceded by prefix X
Redundant classes are fields or tables that are added to improve database efficiency and are not normalized to a database
5 The Association class is named after connecting two basic classes with an underscore followed by a prefix of R, following an alphabetical list of two table names or table name abbreviations.
Association tables are used to save many-to-many relationships.
If the associated table name is greater than 10 letters, the original table name must be abbreviated. If there are no other reasons, it is recommended to use abbreviations.
For example: The table Object has a many-to-many relationship with itself, then the table that holds the many-to-many relationship is named: R_object;
Table depart and employee; there are many-to-many relationships; The associated table is named R_dept_emp
2. Name of attribute (column)
1 Use a meaningful column name, the column in the table to use a set of design rules for the key. Each table will have an automatic ID as the primary health, the logical master is defined as the first set of candidates, and if the encoding is automatically generated by the database, it is named: ID, and if it is a custom logical encoding, it is named with the "id" method. If the key is a numeric type, you can use _no as the suffix, and if it is a character type, you can use the _code suffix. A standard prefix and suffix should be used for column names.
Example: The number field of a sales order is named: sal_ord_id; If there is a database-generated AutoNumber, it is named: ID.
2 All attributes plus the suffix of the type, note that if additional suffixes are required, they are placed before the type suffix.
Note: The data type is the field of the text, and the type suffix TX can not be written. Some of the more obvious types of fields, you can not write the type suffix.
3 using prefix naming
Using a uniform prefix for the column names of each table is greatly simplified when writing SQL expressions. This also does have drawbacks, such as breaking the role of the Automatic Table Connection tool, which links common column names to some databases.
3. Naming of views
1 The view is prefixed with V, and other naming rules and tables are named similarly;
2 name should try to embody the functions of each view.
4. Naming of triggers
Triggers are prefixed with TR, the trigger name is the corresponding table name plus the suffix, insert trigger plus "_i", delete trigger Plus "_d", UPDATE trigger Plus "_u", such as:
Tr_customer_i,tr_customer_d,tr_customer_u.
5. Stored Procedure Name
Stored procedures should be "Up_" beginning, and the system's stored procedures to distinguish, the next part mainly in the form of movable object, and the use of the underline segmentation of the various components. For example, the stored procedure for increasing the agent's account is "Up_ins_agent_account".
6. Variable Name
Variable names are lowercase, and if they are in the form of phrases, separate each word with an underscore, such as @my_err_no.
7. Other considerations in naming
1 The above name must not exceed 30 characters of the system limit. The variable name has a length limit of 29 (excluding the identity character @).
2 The names of the data objects and variables are written in English characters, which prohibit the use of Chinese names. Never leave spaces between characters in an object name.
3 Be careful to retain the word, to ensure that your field name does not and reserved words, database systems or common access methods conflict
5 Maintain consistency in field names and types, and ensure consistency when naming fields and specifying data types for them. If the data type is an integer in one table, then the other table can not be a character type.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.