The discussion of anti-standard technology in database design

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Specification | design | data | database | Database design

Discussion on anti-standard technology in database design

Note: This is a long time ago in a forum to see the article, feel good writing; in the spirit of communication with many people in the industry, the purpose of disseminating ideas in this, the original forum name has not been remembered, was also reproduced. You may reprint this article arbitrarily, but any moral or legal dispute arising therefrom has nothing to do with http:blog.csdn.net/aceplus, and http:blog.csdn.net/aceplus guarantees that this article is not used for any commercial or illegal purposes. If you are the original author of this article, think Http:blog.csdn.net/aceplus reprint damage your rights and interests, please mail contact me: aceplus@263.net

Original title: Talking about the anti-norm in database design

1. The SUMMARY

From the angle of improving the database performance, this paper introduces several common counter standard methods in database design, and discusses the advantages and disadvantages of these methods as well as the precautions in use.

Key words: Database design counter specification improve performance

2. Database Design Brief
Database design is the process of transforming the real world business model and requirement into a database model, which is the core problem of establishing the database application system. The key of the design is how to make the database of the design reasonably store the user's data and make the data processing easy for the users.

Database design is a human problem, not a database management system. The system works regardless of whether the design is good or bad. Database design should be the database administrator and system analyst work together with users to understand the requirements of each user, together for the entire database to make the appropriate and complete design.

Database and its application performance and tuning are based on good database design, database data is the basis of all operations, if the database design is not good, then all other tuning methods to improve the performance of the database is limited.

Normalization of data
2.1. Paradigm Overview
The theory of normalization is a theory that studies how to transform a bad relationship model into a good relational one, and the normalization theory is built around the paradigm. The theory of normalization holds that all the relationships in a relational database should meet certain criteria (constraints). The normative theory divides the standard requirements that the relationship should meet into a few levels, the first level that satisfies the minimum requirement is called the primary normal form (1NF), the second normal form (2NF) is proposed on the basis of the first normal form, and the third normal form (3NF) is proposed on the basis of the second normal form, and then the BCNF paradigm is proposed and 4NF,5NF. The higher the rank of the paradigm, the stricter the constraint set conditions should be met. Each level of the specification depends on its previous level, for example, if a relational pattern satisfies the 2NF, it must satisfy 1NF. Here we'll just introduce the 1NF,2NF,3NF paradigm.
2.2 1NF.
1NF is the minimum requirement for a relational model, and its rules are:

Each column must be atomic and cannot be divided into several child columns.

There can be only one value for each row and column position.

Cannot have multivalued columns.

Example: If one student is required to choose a class, the following "Student" table does not satisfy 1nf:student (s-no,s-name,class-no)

Among them: S-no for the school number, s-name for the student name, class-no for the course number. Because a student can choose more than one course, the column class-no has multiple values, so the null does not conform to 1NF.

Normalization is to divide it into the following two tables: the "Students" table and the "Select Class" table, both tables are 1NF.

Student (S-no,s-name)

Stu-class (s-no,class-no)
2.3 2NF.
For tables that meet 2NF, in addition to 1NF, a non-code column must rely on all the main codes, not part of the combined master code.　　If the main code of a table that satisfies 1NF is only one column, it automatically satisfies 2NF. Example: The following "Course selection" table does not conform to 2NF.

Stu-class (S-no,class-no,class-name)

Where: Class-name is the course name. Because the main code for the Thesaurus is: (s-no,class-no), the non-main code column Class-name depends on a part of the combined master code class-no, so it does not conform to 2NF.

Normalization of the table also breaks it down into two tables: the Select Lesson table and the course table, and they all meet 2NF.

Stu-class (s-no,class-no)

Class (Class-no,class-name)
2.4 3NF.
The rule of 3NF is that any non-main code column cannot rely on other non-main code columns except for 2NF. Example: The following "course" table does not conform to 3NF.

Class (Class-no,class-name,teacher-no,teacher-name)

Among them: Teacher-no for the classroom teacher number, Teacher-name for the instructor name.　　Because a teacher-name column is dependent on another teacher-no column, it does not conform to 3NF. The solution is to break it down into two tables: the course table and the teachers table, and they all meet 3NF.

Class (Class-no,class-name,teacher-no)

Teacher (Teacher-no,teacher-name)
2.5. Summary
When a table is canonical, its non-main code columns depend on the main code column. From the relational model point of view, the table satisfies the 3NF most conforms to the standard, such design is easy to maintain. A fully normalized design does not always generate optimal performance, so it is usually first designed according to 3NF, if there is a performance problem, and then through the counter specification to solve.

Data normalization in the database has the advantage of reducing the data redundancy, saving the storage space, the corresponding logical and physical I/O times reduced, while speeding up the increase, deletion, change speed, but the full specification of the database query, usually requires more connection operations, thus affecting the speed of the query. As a result, it is sometimes necessary to break the canonical rule, that is, to improve the performance of certain queries or applications.
3. Anti-specification of data

3.1. The benefits of the anti-norm
Is the degree of normalization more high, the better? This depends on the need to decide, because the deeper the "separation", resulting in more relationships, more frequent connection operations, while the connection operation is the most time-consuming, especially for query-oriented database applications, frequent connection will affect the query speed. As a result, relationships are sometimes deliberately retained as non-standard, or normalized and then deserialized, usually to improve performance. For example, the Account table B-TB01 in the account system, its column busi-balance (the total balance of the enterprise account) violates the specification, and the values can be obtained from the following query:

Select Busi-code,sum (acc-balance)

From B-TB06

GROUP BY Busi-code

If you do not have this column in B-TB01, you will need to do a connection operation if you want to obtain the total balance of busi-name (enterprise name) and enterprise account:

Select Busi-name,sum (acc-balance)

From B-TB01,B-TB06

where B-tb01.busi-code=b-tb06.busi-code

GROUP BY Busi-code

If you do this kind of query frequently, it is necessary to add column busi-balance to the B-TB01, at the expense of creating an incremental, deleted, and modified trigger on the table b-tb06 to maintain the Busi-balance column values on the B-TB01 table. Similar situations occur frequently in decision support systems.

The benefit of the counter specification is to reduce the need for connection operations, to reduce the number of codes and indexes, and to reduce the number of tables, and the resulting problem is the likelihood of data integrity issues. Speed up the query, but reduce the speed of modification. Therefore, when deciding to do the counter specification, it is necessary to weigh the pros and cons, carefully analyze the application of data access requirements and actual performance characteristics, good indexes and other methods often can solve performance problems, and do not need to use the method of counter specification.

3.2. Common anti-standard technology
Before the operation of the counter specification, we should take into account the data access requirements, the size of common tables, some special calculations (such as totals), where the data is physically stored, and so on. Commonly used anti-specification techniques include adding redundant columns, adding derived columns, re-group tables, and split tables.

3.2.1. Adding redundant columns

Adding redundant columns refers to having the same columns in more than one table, which is often used to avoid connection operations while querying. For example, in the previous example, if you frequently retrieve the instructor name of a course, you need to do a connection query for class and teacher tables:

Select Class-name,teacher-name

From Class,teacher

where class.teacher-no=teacher.teacher-no

In this way, you can add a column of Teacher-name to the class table without the need for a connection operation.

Adding redundant columns can avoid connection operations while querying, but it requires more disk space while increasing the workload of table maintenance.
3.2.2. Add derived columns
Adding a derived column refers to the addition of data from other tables that are calculated by the generated columns. Its role is to reduce connection operations while querying, and to avoid using set functions. For example, the column busi-balance of the table B-TB01 in the account system mentioned earlier is the derived column. Derived columns also have the same disadvantages as redundant columns.
3.2.3. Re-group table
Re-group table refers to the fact that if many users need to see the two tables connected to the result data, the two tables will be formed into a table to reduce the connection and improve performance. For example, users often need to view the course number, the course name, the instructor number, the instructor's name, then the table Class (Class-no,class-name,teacher-no) and table teacher (Teacher-no,teacher-name) Merge into one table Class (Class-no,class-name,teacher-no,teacher-name). This improves performance, but requires more disk space, while also losing the conceptual independence of the data.

3.2.4. Split table
Sometimes partitioning a table can improve performance. There are two ways to split a table:

1 Horizontal segmentation: Place data rows into two separate tables based on the values of one or more columns of data. Horizontal segmentation is usually used under the following circumstances: A table is very large, after segmentation can reduce the query needs to read the data and index pages, but also reduce the number of layers of the index, improve query speed. The data in table B is inherently independent, for example, the data in each region is recorded in the table, or data in different periods, especially when some data is commonly used, while some other data is not used. 　　C need to store the data on multiple media. For example, statutes can be divided into two tables Active-law and Inactive-law. The contents of the Activea-authors table are the laws that are in force, are frequently used, and the Inactive-law table makes the laws that have been voided, not often queried. Horizontal segmentation adds complexity to the application, which typically requires multiple table names when querying, and the union operation is required to query all data. In many database applications, this complexity can outweigh the benefits, because as long as the index key is small, when the index is used for querying, the table adds two to three times times the amount of data, and the query increases the number of disks that read an index layer.

2 Vertical split: Place the main code and some columns in one table, and then place the main code and another column in another table. If some columns in a table are commonly used, while others are less common, vertical segmentation can be used, and vertical segmentation can make data rows smaller, a data page can hold more data, and the number of I/O times will be reduced when querying. The disadvantage is that you need to manage redundant columns, and querying all data requires a JOIN operation.
4. Anti-normative technology needs to maintain data integrity
No matter what kind of reverse specification technology is used, some management is needed to maintain the integrity of data, and the common methods are batch maintenance, application logic and triggers. Batch maintenance means that after a certain amount of time has elapsed for modification of a replicated column or derived column, a batch of processing jobs or stored procedures can be used to modify the copied or derived columns, which is only possible if the real time requirement is not high. The integrity of the data can also be implemented by application logic, which requires that all tables involved in the same transaction be added, deleted, and modified. The use of application logic to achieve data integrity risk is greater, because the same logic must be used in all applications and maintenance, easy to omit, especially when the requirements change, not easy to maintain. Another way is to use triggers, and any modifications to the data immediately trigger the corresponding modifications to the copied or derived columns. Triggers are real-time, and the corresponding processing logic appears only in one place and is easy to maintain. In general, it is the best way to solve such problems.
5. Concluding remarks
The counter specification design of database can improve query performance. Commonly used anti-specification techniques include adding redundant columns, adding derived columns, re-group tables, and split tables. But the anti-specification technology needs to maintain the integrity of the data. Therefore, in doing the counter specification, we must weigh the pros and cons, carefully analyze the application of data access requirements and actual performance characteristics.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The discussion of anti-standard technology in database design

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The discussion of anti-standard technology in database design

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support