How to make SQL Server efficient-design (itput discussion summary)

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What factors do you consider when designing SQL Server objects to avoid performance problems?

Discussion Summary-Overall Design

L targeted design should be conducted based on the system architecture type or main operations: such as OLTP, OLAP, ETL, etc.

L clear logic, making applications easier to develop. Excellent scalability and maintainability, reducing data redundancy, etc.

L in DB management, we must start with the CPU and HDD configurations. In dB development, we must accurately design and complete the objectives of business processes, logic, functions, DB structures, and SQL commands.

L The design view should be tailored to a specific requirement, and its reusability should not be considered too much

L The trigger should also be avoided, and the functions completed by the trigger should be rewritten to the trigger statement. For tuning and performance tracking, it is more intuitive without triggers.

L first consider what kind of performance problems will occur (access performance? Insert performance ?); Next we will consider the changing characteristics of data (small tables or large tables? How often does the data change? What is the data Repeatability ?), At the same time, we also consider the business features (whether there are many accesses, many inserts, or many accesses and inserts). Then we design them based on the principles and features of sqlserver, such as whether to use indexes, is it better to use clustered indexes or non-clustered indexes? How to store table data to make full use of the features of SQL Server (such as SQL server data storage, page, and fragmentation. Do you want to use read/write splitting? Do you want to consider the whole database level ?); Finally, we need to test the objects after the design, and then find the optimal solution.

L logical database and table design. The logical design of databases, including the relationship between tables, is the core of optimizing the performance of relational databases. A good logical database design can lay a good foundation for optimizing databases and applications. Standardized database logic design includes replacing long data tables with multiple and correlated narrow tables.

L read/write splitting architecture (such as report database and transaction database deployment and separation); pay attention to physical storage; file layout (data files and log files are isolated, and log files should preferably use high-speed devices)

L considering the size of the database and the size of the key table, the size of the database file group and the internal partition should also be taken into account. Here we can consider it together with the disk.

L hardware second-level cache affects data, logs, and indexes stored on different I/O devices, increasing the reading speed. The larger the data size (size), the more important it is to increase I/O.

Discussion and summarization-table and index design

L analysis of table application scenarios, including log tables, parameter tables, and frequently updated and deleted tables

L reasonable estimation of table data volume, which can be divided into large tables, medium tables, and small tables. Consider large table partitions/shards.

L clear logic and physical model design requires a third paradigm and an inverse Paradigm (refer to some of the following scenarios)

L The logical structure can be appropriately redundant to avoid too many table associations. The physical structure can be highly selective in disk policies, table partitions, and read/write splitting.

L when designing table objects, consider functions first, and then performance. If there are too many fields in the table, consider splitting multiple small objects (small tables). If the table has low access frequency but requires a large storage space, consider splitting other storage

L The Archive mechanism should be well considered to ensure that the data volume of Online Business tables cannot be too large, and the remaining data should be in historical tables. Otherwise, it would be hard to make a single SQL Server table of hundreds of millions of tables.

L select appropriate data types based on business needs to save data space while improving efficiency

L use data types that occupy as little space as possible to store data, such as date, smalldatetime, smallint, tinyint, etc.

L when designing an index, consider the query frequency and update frequency first. Do not create an index for occasionally executed statements. When creating a composite index, consider the column selectivity and usage frequency to determine the order of order.

L re-evaluate the table index (requires the participation of application developers to obtain their SQL statements), initially build an index, from the selection of index columns and the skew of index columns, index column usage frequency, index column usage sequence, and table index count

L The efficiency of indexing is also very important, depending on how the business will be used, the design of a reasonable index combination, and regular tracking

L create efficient indexes and use the indexing functions including columns and filtering indexes.

Discussion Summary-tempdb

L. tempdb is stopped on SQL server. It will automatically drop and re-create upon restart. by default, a new 8 MB (MDF file: 8 MB; LDF file: 1 MB, and ecoverymodel is simple) will be created based on the model database.

L tempdb has high Io requirements. It is best to allocate it to a disk with high Io and allocate it to a non-used disk with other data files to improve read/write efficiency.

L The number of tempdb files should be configured based on the number of CPUs (several CPUs are used to create several files)

Personal supplement: The initial size of the tempdb file can be set. You can use alter database Modify file or interface operations to directly set the initial size of the file. This setting will be used when the SQL service is restarted.

Personal supplement

L rationally divide the database. At the beginning of the design, database isolation should be considered, and nothing should be put in a database. Database division can consider these aspects comprehensively: business functions, data importance, query and read ratio, data storage cycle

L select a recovery model for the database ). This has a great impact on Database writing and backup.

L control requirements, clarify the necessity of using the database and the details of data storage

L define regular fields (especially when multiple development teams exist) to avoid performance loss caused by interaction (including mutual query) data due to different designs

L determine a reasonable object use period. Some people prefer to design tables closely based on their current needs. After a period of time, some changes may cause them to have to modify the table structure and modify the table structure containing data, which has a big impact; some people simply design a very loose table to avoid this situation, and the modification frequency of the result table structure may be very low, however, the performance may be greatly affected (for example, some key fields in the design have a wide length to meet future needs, resulting in poor query performance)

Controversial issues

L use auto-increment columns for primary keys

Supported points of view:

Use the primary key generated by the system to avoid the use of compound keys. Foreign keys are always associated with unique key fields. Do not select the social security number (SSN) or ID Number (ID) as the key.

Objection:

If an auto-incrementing column can be used as the primary key of a table, the primary key of the table can be unavailable. For example, a log table that records logon information of a Website user. This type of table does not have a primary key. You can set the logon time to a clustered index.

CPU usage is required when a guid is generated. When a table does not have a proper column to set a primary key, I will consider fields such as fixed-length sequential numbers as primary keys.

Whether the auto-incrementing column, guid, or serial number is used as the primary key, as long as the table field has a suitable field as the primary key, even if it is a composite key, as long as the length is not too long (within 100 bytes can be considered, if the value is less than 50, and the number of fields is not too large (three are acceptable ranges, and five can also be considered), these fields should be used as the primary key. The reason is: In table join and data query, these fields are actually queried. Pseudo-Primary keys have no advantage except space occupation.

If you set an auto-increment column as a primary key, what are the benefits of this column? In this way, the table is created to have a primary key. You may say that auto-incrementing columns are primary keys, which greatly reduces page spliting and fragments. Clustered indexes are included at the end of each non-clustered index and greatly reduces the index size. I think: 1. it is better to reflect the logical design of a table's primary key. This means that, as soon as I look at the primary key of the table, I will roughly know what is stored in the table. 2: Even if you use an auto-incrementing column as the primary key, you still have to create such an index for the logical primary key of the table. Because data is often queried based on this condition during data retrieval. The primary key of the auto-increment column is not used as a foreign key because its value is extremely unreliable. In this way, you still need to use the logical primary key field of the table to associate with other tables. Therefore, such an auto-incrementing primary key is redundant and unnecessary.

My opinion:

The focus is to make a reasonable choice based on the Application Scenario. It is not absolutely usable or unnecessary.

For a table, the primary key is not required, and the focusing index is not required.

In terms of function, the primary key is a constraint on the data in the table (unique, not allowed null value). We use a unique index (or constraint ), in addition, setting the not null attribute on the field can achieve the same effect. Therefore, the jade key can be replaced, which means it is not necessary (of course, even if there is no equivalent replaceable function, in terms of business, not every table must have a unique feature ).

When the primary key is raised, the focus index is raised, mainly because the primary key is focused index by default (if the table does not focus on the index and does not specify nonclustered ), this indicates that the primary key is not focused on the index, but can be set as focused index.

Therefore, it makes no sense to separate the auto-incrementing column as the primary key. We add an auto-incrementing column and confirm that it is unique and non-null. For our business data, there is no meaning (redundant and unnecessary ).

It is important to consider that the primary key is used as the index focus. When the primary key is used as the index focus, we should consider not only the primary key, but also the index Focus feature. The focus index is stored together with the data, which determines the data storage order. The non-leaf layer of the data storage in the table is the focus index value (the leaf layer is the data). In addition, the focusing index key is a non-focusing index row Locator (pointing to a pointer to a data record. If the clustered index is not a unique index, SQL Server adds the value generated internally (called a unique value) to make all duplicate keys unique. The values of these four bytes are invisible to users. This value is added only when the clustered key must be unique for non-clustered indexes ). If the table does not focus on the index, the row Locator of the non-focused index is the pointer to the row (by the File Identifier (ID), page number, and the row ID (RID) generated by the number of lines on the page )). Obviously, non-focused indexes do not rely on focused indexes, and data storage does not necessarily need to be stored in a certain order. Therefore, focusing on indexes is not necessary. However, in terms of efficiency, sequential reading is more efficient than unordered reading like row ID, so focusing on indexes is required for frequently-queried tables.

Based on the preceding information, we know that indexes must be focused for cases that may involve frequent queries (to ensure effective sequential READING). We also know that, the key value of the focused index is the row Locator of the non-focused index (stored in the leaf layer of the non-focused index), so the width of the focused index should be as small as possible (each non-focused index should be saved once, of course, the smaller the space usage, the higher the I/O efficiency), and ensure the unique (so there is no need to ensure that the unique value of the additional load is added); in addition, the null value is meaningless, so it is recommended that it be non-null. The write sequence of the data should be the same as that of the key value generated by the focused index, in addition, frequent modifications do not disrupt the sequence (so as to reduce the generation of fragments). Finally, if the definition of the focused index is modified, it is equivalent to re-organizing the storage of both the table and index, therefore, the best case is that the index-focused column definition is basically not modified. In combination with the evaluation of all these situations, if the business data column is not suitable, the auto-increment column will be a good choice.

Of course, some people may say that the index-focused retrieval efficiency is the highest. Using Auto-incrementing columns as the primary key wastes this most efficient opportunity. In fact, this is a bit of a misunderstanding. The leaf node that focuses on the index is data, and the leaf node that does not focus on the index is a row positioner. This means that if the table contains more than one column of data, the seek of a focused index may load more pages than a non-focused index (because its leaf layer is data, it occupies more space than the row positioner of a non-focused index ), the advantage is that when seek reaches the key value, the corresponding records can be retrieved directly (all on the same page ); instead of focusing on indexes, the corresponding records must be obtained through the row positioner. However, if all the data we want is included in the index, the efficiency of focusing on indexes may be lower. The other is scan. Obviously, non-focused indexes involve fewer pages than focused indexes, and scan has more advantages. Not all queries can be performed as seek. Generally, tables with frequent queries have a variety of common query combinations. Therefore, in general, the primary key of business data columns does not reflect the advantage in all scenarios.

Discussion post

Subsequent topics:

2. What factors do you think will affect SQL Server efficiency in writing T-SQL (including stored procedures, functions, and views?

3. What do you think should you pay attention to when designing database operation programs to ensure effective use of the database?

4. What performance problems are confusing to you during your SQL server usage?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to make SQL Server efficient-design (itput discussion summary)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to make SQL Server efficient-design (itput discussion summary)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support