Improving system performance database design by transverse segmentation and longitudinal segmentation Technology __ Database

Source: Internet
Author: User
Tags system log

This paper introduces some techniques to improve the performance of the system, such as column segmentation, row segmentation, instance segmentation and physical storage segmentation.

This article mainly explains the design of the database, because a good data structure, the overall system operation is too important, please look at the content of this article.

Referring to program performance, we all know the time complexity of the formula O (f (n)). In this puzzle to improve performance, many people will try to reduce the complexity of the algorithm function f, or to improve the operation of the function f speed. But these are no way, it is the way to find the end. If cardinality n is large, such methods do not have a good effect, because the problem itself is Cardinal N. Do everything possible to reduce the number of cardinal N to obtain quality improvement.

How can we achieve the effect of reducing cardinal n? Suppose you're a grocery store owner and you have a magazine container. But there are more and more kinds of magazines, and the containers won't fit, what will you do? is to lengthen the length of the container or to classify the magazine, and then add a container. I prefer to add a container instead of extending the original container indefinitely. Revelation from the grocery store container I think the effective way to reduce the number of cardinal N is divide and conquer: First, column segmentation

Typically, as the development process progresses, the number of fields in the primary table in the system becomes more numerous. However, the number of fields in a table is limited by the database specification and performance. For example, a table in a SQL Server database can contain up to 1024 fields, but in practice it cannot generally exceed 246 fields, and each row of data can store 8060 bytes. (See http://msdn.microsoft.com/zh-cn/library/ms143432.aspx)

In addition, for large data tables, the number of columns directly affects the access speed. The storage structure of the data optimizes the access to the rows, not the access to the columns. Below, several cases of partitioning storage columns are proposed. 1, too many columns.

I had a failed project in 2004. When designing a 5-year purchase schedule, I designed the monthly purchase plan in columns, so that the table column added 60 columns, which eventually reached 120 columns. The result of this design is that when the table has 1 million records, the calculation of the purchase plan takes 2.5 hours. The improvement is to change the column to store and add a planned year field so that the total number of columns is reduced by nearly 40%, the number of rows in the table increases to 5 million, but the access speed is increased so that the calculation time of the purchase plan is reduced to 0.5 hours.

So, I gave the opinion that

The number of columns in L table is best not more than 80

L row storage is faster than column storage 2, table column data is relatively large or infrequently used

The following fields exist in the table that need to be quarantined:

L Picture Object

L Annexes (documents, etc.)

L A lot of memo information

L Almost not queried column 3, the fields in the table, can be categorized according to business logic.

According to the design specification of the database, a property of the same object should belong to a table. However, the reality is that if you divide the tables according to business logic, you get better performance. second, row segmentation

For a table with a large number of rows, the best way to do this is to split the rows by category, so that the data is stored in multiple tables. The natural world has a universal "28" law, applied to the data, it can be explained: In all the data 20% of the data can meet our 80% demand. “

According to this principle, there are usually two classification methods: 1, by the time of classification

If the data is time-sensitive, we can assume that 20% of the data recently updated in all data can meet the business 80% demand. For example, if we have 5 years of historical data, we can think of the data that has been updated in 1 years (20%), or the data created within 1 years, can meet 80% of the business needs, so we can split the table into two tables, storage of 20% and 80% of the data, to achieve efficiency purposes. If the two tables still do not improve performance effectively, you can also use the "28" law to split again.

Effectiveness of data volume and usage relations: 2, by index classification

When the timeliness of the data is not obvious, you can classify data by index. The index can be any field that can be used for categorization, such as department number, employee number, process number, and so on. We can assume that the table stores all the information about the parts, but in 80% of the cases, the 1th production workshop will only access the parts that are used by the department.   So, we can divide the table into multiple according to the department number. third, the database instance partition

In general, developers are accustomed to configuring a database for each project. But actually we can give an application more database instances. For example, in a network game server, often have an account database (for authentication), storage database (for storage status), log database (used for storage monitoring status), map database (used to store map status) and so on. Analogy to the ERP system, we can abstract a number of common parts of the project, stored in different database instances. For example, user information, departmental information, system log information can be made into a common database, each software project can be used. Iv. Segmentation of physical deployment of database instances

Table splitting, segmentation of database instances, provides flexibility for physical deployments. For example, we split the table according to different headings, and deployed the tables in different database instances, so that we have the conditions to deploy these database instances on separate physical database servers. The advantage of this deployment is that we can instruct all the database servers to distribute the computation in the material calculation, greatly increasing the speed of the operation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.