Optimize SQL Server databases with large data volumes

Source: Internet
Author: User

1.1: added data files

Starting from SQL SERVER 2005, the database does not generate NDF data files by default. Generally, a primary data file MDF is enough. However, for some large databases, due to a large amount of information, in addition, some records in some tables or tables can be stored separately in different data files to improve the query speed.

Because the CPU and memory speed is much higher than the hard disk read/write speed, you can store different data files in different physical hard disks. In this way, when performing a query, multiple hard disks can be queried at the same time to make full use of the CPU and memory performance and improve the query speed. Here we will introduce in detail the writing principles. The data file MDF, NDF) and log file LDF are written in different ways:

Data Files: SQL Server distributes new data to all spatial data files according to the size of the existing free space of all files in the same file group, if there are three data files. MDF, B. NDF, C. NDF, the idle size is 200 mb, 100 mb, and 50 mb respectively, then write a 70 mb object, it will write 40, 20, and 10 Data to three ABC files at a time. If a log file is full, it will not be written to it.

Log File: the log file is written in sequence. Only when one file is full can another file be written.

It can be seen from the above that increasing the data file NDF is helpful for querying large amounts of data, but adding log files is useless.

1.2: Set the file to automatically increase the large data volume, and do not need to set the small data volume)

In SQL Server 2005, the initial size of the MDF file is 5 MB by default, and the auto-increment is 1 MB, which is not limited to growth. The initial size of LDF is 1 MB, and the growth is 10%, limit the number of files to a certain extent. In general design, you can use the built-in SQL design. However, in large database design, it is best to design its own growth and initial size. If the initial value is too small, soon the database will be full. If it is full, what will happen when the database is inserted? When the data file is full and some operations are performed, SQL Server will wait for the operation until the automatic file growth ends and the previous operation can continue. If the auto-increment takes a long time, the original operation will wait and time out to cancel. The default threshold value is 15 seconds.) not only will this operation be rolled back, but the automatic file growth will also be canceled. That is to say, this time the file is not increased, and the growth time is determined based on the size of the automatic growth. If it is too small, it is possible that an operation needs to grow several times in a row to meet the requirement. If it is too large, it takes a long time, so pay attention to the following points when setting automatic growth:

1) set to increase by fixed size instead of by proportion. This avoids unnecessary troubles caused by too much or too little growth at a time. We recommend that you set an increase of 50 MB to 100 MB for a small database. For large databases, the setting increases by 100 MB to 200 MB at a time.

2) regularly monitor the usage of each data file and try to ensure that the remaining space of each file is as large as possible or the expected proportion.

3) set the maximum file size to prevent the SQL Server File from exhausting disk space and affecting the operating system.

4) Check the distribution of new data files in a timely manner after automatic growth. Avoid SQL Server Always writing data to individual files.

Therefore, for a busy database, we recommend that you enable the automatic database growth option to avoid application failure due to exhausted database space, but avoid automatic growth. At the same time, try not to use the auto-shrinking function.

1.3 Data and log files are stored separately on different Disks

Operations on data files and log files generate a large amount of I/O. If possible, log files should be stored on a hard disk different from the data files where the data and indexes are located to distribute I/O, and facilitate database disaster recovery.

Optimization ②: Table Partitioning and index partitioning optimization ① rough Table Partitioning and optimization ② precise data partitioning)

Why Table Partitioning?

What is the most important thing we want to do when the data volume of a table is too large? Split the table into two or more points, but the table is still the same, just store its content separately, so reading is faster than N times.

Principle: Table data cannot be stored in files, but file groups can be stored in files, and tables can be placed in file groups. In this way, table data is indirectly stored in different files. Partition storage includes tables, indexes, and large object data.

SQL SERVER 2005 introduces the table partition concept. As the data volume in the table increases, the data query speed slows down and the application performance degrades, in this case, we should consider partition the table. When there is a lot of data in a table, we can split it into multiple tables, because the data to be scanned becomes less, and the query can run faster, this operation greatly improves the performance. After the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in multiple tablespaces (physical files ), in this way, the whole table is not scanned every time when data is queried.

2.1 when to use a partitioned table:

1. The table size exceeds 2 GB.

2. The table contains historical data. New data is added to the new partition.

2.2 Advantages and Disadvantages of Table Partitioning 

Table partitions have the following advantages:

1. Improved query performance: You can query partition objects by searching only the partitions you are concerned about, improving the search speed.

2. Enhanced availability: If a partition of the table fails, the data of the table in other partitions is still available;

3. Easy Maintenance: If a partition of the table fails, you only need to fix the partition to fix the data;

4. Balanced I/O: You can map different partitions to disks to balance I/O and improve the overall system performance.

Disadvantages:

Partition Table: an existing table cannot be directly converted to a partition table. However, Oracle provides the online table redefinition function.

2.3 perform Table Partitioning in three steps

2.31 create a partition function

Create partition function xx1 (int)

As range left for values (10000,200 00 );

Note: Create A partition function: myRangePF2, Which is partitioned by INT type. It has three intervals: 10000 in Zone A, 1 w-2 in Zone B, and 2 W in Zone C.

2.3.2 create a partition Architecture

Create partition scheme myRangePS2

As partition xx1

TO (a, B, c );

Note: Create A partition architecture on the partition function XX1: myRangePS2, which has three intervals: A, B, and C.

A, B, and C are the names of the three file groups respectively, and the three NDF must belong to these three groups. Once the file group is created, it cannot be modified.

2.3.3 partition a table

Common data specifications-Modify the data space type to: Partition Scheme, and then select the partition scheme name and partition column list. Results:

You can also use SQL statements to generate

 
 
  1. CREATE TABLE [dbo].[AvCache](  
  2. [AVNote] [varchar](300) NULL, 
  3. [bb] [int] IDENTITY(1,1) 
  4. ) ON [myRangePS2](bb);  

-- Note that the [myRangePS2] architecture is used here, based on the bb Partition

2.3.4 query table partitions

SELECT *, $ PARTITION. [myRangePF2] (bb) FROM dbo. AVCache

In this way, we can clearly see how table data is partitioned.

2.3.5 create an index Partition

Optimization ③: Distributed Database Design

The distributed database system is developed on the basis of a centralized database system. It is easy to understand, that is, to separate and distribute the entire database to various places. In essence, distributed database systems are divided into two types: 1. data is logically unified, but physically scattered. a distributed database is logically unified as a whole, physically, they are stored on different physical nodes. Generally, distributed databases are like this. the logic is distributed and physically distributed. This is also a federated distributed database. Because the sub-database systems that constitute the Federation are relatively "autonomous, this system can accommodate a variety of databases with different purposes and large differences, and is suitable for database integration in a wide range.

Distributed databases are complex and are not used and described in detail here. It is just an example to illustrate that distributed databases are mostly used in systems with strong user segmentation. If a national chain store, the general design is that each branch has its own sales and inventory information, and the Headquarters needs to have databases such as employees, suppliers, and branch information. This type of Branch database can be completely consistent, many systems may also lead to inconsistency. In this way, the data of each chain store is stored locally, which increases the speed of impact and reduces communication costs, and data is distributed across different sites, there are multiple copies, so that even if some sites fail, the whole system will not be paralyzed. However, it also brings many problems, such as data consistency, remote data transfer implementation, and reduced communication overhead, which makes the development of distributed database systems more complex, I just want everyone to understand its principles. The specific usage will not be described in detail.

Optimization 4: Sort database fragments

If your table has already created an index but its performance is still poor, it is likely that index fragmentation is generated and you need to perform index fragmentation.

What is index fragmentation?

Due to the excessive insert, modify, and delete operations on the table, the index page is divided into multiple parts to form index fragmentation. If the index fragmentation is serious, the time for scanning the index will become longer, the index may even be unavailable, so the data retrieval operation slows down.

How do I know if index fragmentation has occurred?

In the SQLServer database, you can use DBCC ShowContig or DBCC ShowContig (Table Name) to check the index fragmentation and instruct us to rebuild it regularly.

 

By analyzing the results of low scanning density) and high scanning fragmentation, we can determine whether index reconstruction is required. The following are two main points:
Scan Density [Best Count: Actual Count]-Scan Density [Best Value: Actual value]: dbcc showcontig returns the most useful percentage. This is the ratio of the Best Value of the extended disk to the actual value. The percentage should be close to 100% as much as possible. If it is low, external fragments exist.

Logical Scan Fragmentation-Logical Scan Fragmentation: Percentage of unordered pages. The percentage should be between 0% and 10%. If the percentage is high, external fragments exist.

Solution:

First, use dbcc indexdefrag to sort index fragments.

Second, use dbcc dbreindex to recreate the index.

The differences between the two are as follows:

The dbcc indexdefrag command is an online operation. Therefore, the index can be used only when the command is running and can be interrupted without losing the completed work. The disadvantage of this method is that there is no clustered index removal/re-creation operation in reorganizing data.

Re-create a clustered index will re-organize the data. The result is that the data page is filled up. You can use the FILLFACTOR option to configure the fill level. The disadvantage of this method is that the index is offline during the deletion/re-creation period and the operation belongs to the atomic level. If the index creation is interrupted, the index will not be re-created. That is to say, to achieve good results, you still need to re-build the index, so you have to re-build the index.

Edit recommendations]

  1. Performance testing of Several SQL statements with millions of data
  2. Use the database connection monitoring component to solve the shutdown Problem
  3. High Availability of SQL Server
  4. SQL Server 2008 R2 failover Cluster Environment preparation
  5. SQL Server: how local variables affect query performance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.