Database performance optimization 1: Database optimization to improve performance

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Database optimization includes the following three parts: Database optimization, database table optimization, and program operation optimization. This article is the first part.

Optimization ①: add data files and set Automatic File growth (rough data partition)
1.1: added data files
Since SQLSERVER2005, the database does not generate NDF data files by default. Generally, one primary data file (MDF) is enough. However, in some large databases, due to a large amount of information and frequent queries, to improve the query speed, you can store part of the records in some tables or tables separately in different data files.
Because the CPU and memory speed is much higher than the hard disk read/write speed, you can store different data files in different physical hard disks. In this way, when performing a query, multiple hard disks can be queried at the same time to make full use of the CPU and memory performance and improve the query speed. Here we will introduce in detail the writing principles. The writing methods of data files (MDF, NDF) and log files (LDF) are different:
Data File: SQLServer distributes new data to all spatial data files according to the size of the existing free space of all files in the same file group, if there are three data files. MDF, B. NDF, C. NDF, the idle size is 200 mb, 100 mb, and 50 mb respectively, then write a 70 mb object, it will write 40, 20, and 10 Data to three ABC files at a time. If a log file is full, it will not be written to it.
Log File: the log file is written in sequence. Only when one file is full can another file be written.
It can be seen from the above that increasing the data file NDF is helpful for querying large amounts of data, but adding log files is useless.
1.2: sets automatic file growth (large data volume, small data volume does not need to be set)
In SQLServer2005, the initial size of MDF files is 5 MB by default, and the auto-increment value is 1 MB, which is not limited to growth. The initial size of LDF is 1 MB, which is 10%, which limits the number of files to a certain extent, in general design, you can use the built-in SQL design, but in the design of large databases, it is best to design their own growth and initial size. If the initial value is too small, the database will soon be full, if it is full, what will happen when inserting? When the data file is full and some operations are performed, SQLServer will wait for the operation until the automatic file growth ends and the previous operation can continue. If the auto-increment takes a long time, the original operation will be canceled after the timeout (generally, the default threshold is 15 seconds). This operation will not only roll back, automatic File growth will also be canceled. That is to say, this time the file is not increased, and the growth time is determined based on the size of the automatic growth. If it is too small, it is possible that an operation needs to grow several times in a row to meet the requirement. If it is too large, it takes a long time, so pay attention to the following points when setting automatic growth:
1) set to increase by fixed size instead of by proportion. This avoids unnecessary troubles caused by too much or too little growth at a time. We recommend that you set an increase of 50 MB to 100 MB for a small database. For large databases, set an increase of 200 MB to MB at a time.
2) regularly monitor the usage of each data file and try to ensure that the remaining space of each file is as large as possible or the expected proportion.
3) set the maximum file size to avoid the use of disk space and affecting the operating system as the SQL Server File grows itself.
4) Check the distribution of new data files in a timely manner after automatic growth. Avoid SQL Server Always writing data to individual files.
Therefore, for a busy database, we recommend that you enable the automatic database growth option to avoid application failure due to exhausted database space, but avoid automatic growth. At the same time, try not to use the auto-shrinking function.
1.3 Data and log files are stored separately on different Disks
Operations on data files and log files generate a large amount of I/O. If possible, log files should be stored on a hard disk different from the data files where the data and indexes are located to distribute I/O, and facilitate database disaster recovery.
Optimization ②: Table partitions and index partitions (optimized ① rough table partitions, optimized ② accurate data partitions)
Why Table Partitioning?
What is the most important thing we want to do when the data volume of a table is too large? Split the table into two or more points, but the table is still the same, just store its content separately, so reading is faster than N times.
Principle: Table data cannot be stored in files, but file groups can be stored in files, and tables can be placed in file groups. In this way, table data is indirectly stored in different files. Partition storage includes tables, indexes, and large object data.
SQLSERVER2005 introduces the table partition concept. When the data volume in the table increases, the data query speed slows down and the application performance degrades, in this case, we should consider partition the table. When there is a lot of data in a table, we can split it into multiple tables, because the data to be scanned becomes less, and the query can run faster, this operation greatly improves the performance. After the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in multiple tablespaces (physical files ), in this way, the whole table is not scanned every time when data is queried.
2.1 when to use a partition table:
1. The table size exceeds 2 GB.
2. The table contains historical data. New data is added to the new partition.
2.2 Advantages and Disadvantages of Table Partitioning
Table partitions have the following advantages:
1. Improved query performance: You can query partition objects by searching only the partitions you are concerned about, improving the search speed.
2. Enhanced availability: If a partition of the table fails, the data of the table in other partitions is still available;
3. Easy Maintenance: If a partition of the table fails, you only need to fix the partition to fix the data;
4. Balanced I/O: You can map different partitions to disks to balance I/O and improve the overall system performance.
Disadvantages:
Partition Table: an existing table cannot be directly converted to a partition table. However, Oracle provides the online table redefinition function.
2.3 perform Table Partitioning in three steps
2.31 create a partition function
CREATEPARTITIONFUNCTIONxx1 (int)
ASRANGELEFTFORVALUES (bytes, 20000 );
Note: Create a partition function: myRangePF2, Which is partitioned by INT type. It is divided into three intervals, with a value less than 10000 in Area A, 1 w-2 W in Area B, and 2 W or more in Area C.
2.3.2 create a partition Architecture
CREATEPARTITIONSCHEMEmyRangePS2
ASPARTITIONxx1
TO (a, B, c );
Note: Create A partition architecture on the partition function XX1: myRangePS2, which has three intervals: A, B, and C.
A, B, and C are the names of the three file groups respectively, and the three NDF must belong to these three groups. Once the file group is created, it cannot be modified.
2.3.3 partition a table
Common data specifications-Modify the data space type to: Partition Scheme, and then select the partition scheme name and partition column list. Results:
You can also use SQL statements to generate
CREATETABLE [dbo]. [AvCache] (
[AVNote] [varchar] (300) NULL,
[Bb] [int] IDENTITY (1, 1)
) ON [myRangePS2] (bb); -- note that the [myRangePS2] architecture is used here.
2.3.4 query table partitions
SELECT *, $ PARTITION. [myRangePF2] (bb) FROMdbo. AVCache

In this way, we can clearly see how table data is partitioned.
2.3.5 create an index Partition

Optimization ③: Distributed Database Design
The distributed database system is developed on the basis of a centralized database system. It is easy to understand, that is, to separate and distribute the entire database to various places. In essence, distributed database systems are divided into two types: 1. data is logically unified, but physically scattered. a distributed database is logically unified as a whole, physically, they are stored on different physical nodes. Generally, distributed databases are like this. the logic is distributed and physically distributed. This is also a federated distributed database. Because the sub-database systems that constitute the Federation are relatively "autonomous, this system can accommodate a variety of databases with different purposes and large differences, and is suitable for database integration in a wide range.
Distributed databases are complex and are not used and described in detail here. It is just an example to illustrate that distributed databases are mostly used in systems with strong user segmentation. If a national chain store, the general design is that each branch has its own sales and inventory information, and the Headquarters needs to have databases such as employees, suppliers, and branch information. This type of Branch database can be completely consistent, many systems may also lead to inconsistency. In this way, the data of each chain store is stored locally, which increases the speed of impact and reduces communication costs, and data is distributed across different sites, there are multiple copies, so that even if some sites fail, the whole system will not be paralyzed. However, it also brings many problems, such as data consistency, remote data transfer implementation, and reduced communication overhead, which makes the development of distributed database systems more complex, I just want everyone to understand its principles. The specific usage will not be described in detail.
Optimization 4: Sort database fragments
If your table has already created an index but its performance is still poor, it is likely that index fragmentation is generated and you need to perform index fragmentation.
What is index fragmentation?
Due to the excessive insert, modify, and delete operations on the table, the index page is divided into multiple parts to form index fragmentation. If the index fragmentation is serious, the time for scanning the index will become longer, the index may even be unavailable, so the data retrieval operation slows down.
How do I know if index fragmentation has occurred?
In the SQLServer database, you can use DBCCShowContig or DBCCShowContig (Table Name) to check the index fragmentation and instruct us to regularly rebuild it.

By analyzing the results of scanning density (too low) and scanning fragmentation (too high), we can determine whether index reconstruction is required. The main points are as follows:
ScanDensity [BestCount: ActualCount]-scan density [Best Value: actual value]: DBCCSHOWCONTIG returns the most useful percentage. This is the ratio of the Best Value of the extended disk to the actual value. The percentage should be close to 100% as much as possible. If it is low, external fragments exist.
LogicalScanFragmentation-logical scan fragmentation: Percentage of unordered pages. The percentage should be between 0% and 10%. If the percentage is high, external fragments exist.
Solution:
First, use DBCCINDEXDEFRAG to sort index fragments.
The second is to use DBCCDBREINDEX to reconstruct the index.
The differences between the two are as follows:
The DBCCINDEXDEFRAG command is an online operation. Therefore, the index can be used only when the command is running and can be interrupted without losing the completed work. The disadvantage of this method is that there is no clustered index removal/re-creation operation in reorganizing data.
Re-create a clustered index will re-organize the data. The result is that the data page is filled up. You can use the FILLFACTOR option to configure the fill level. The disadvantage of this method is that the index is offline during the deletion/re-creation period and the operation belongs to the atomic level. If the index creation is interrupted, the index will not be re-created. That is to say, to achieve good results, you still need to re-build the index, so you have to re-build the index.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database performance optimization 1: Database optimization to improve performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database performance optimization 1: Database optimization to improve performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support