Database performance optimization 1: Database optimization (large data volume)

Source: Internet
Author: User
Database optimization includes the following three parts: Database optimization and database table optimization,ProgramOperation Optimization. This article is the first part Database performance optimization 1: Database Optimization Optimization ①: add data files and set Automatic File growth (rough data partition) 1.1: added data files

From SQL Server 2005, the database does not generate NDF by default.Data files,Generally, it is enough to have a primary data file (MDF), but in some large databases, due to the large information and frequent queries, in order to improve the query speed, some table or some table records can be stored separately in different data files.
BecauseCPUAnd the memory speed is much higher than the hard disk read/write speed, so you can put different data files in different physical hard disks, so that when performing a query, you can query multiple hard disks at the same time to make full use of them.CPUAnd memory performance to improve the query speed.Here we will introduce in detail the writing principle, the data file (MDF,NDF) And log files (LDF:

Data file: S QL ServerDistribute new data to all data files with space according to the size of existing free space of all files in the same file group. If there are three data files a. MDF , B. NDF , C. NDF And the idle size is 200 MB , 100 MB , And 50 MB , Write 70 MB And he will ABC Write three files at a time 40 , 20 , 10 If a log file is full

Log File: the log file is written in sequence. Only when one file is full can another file be written.

As shown in the preceding figure, if you can add the data file NDF,It is helpful for querying large data volumes, but adding log files is useless.

1.2: sets automatic file growth (large data volume, small data volume does not need to be set)

In SQL Servers 2005 Medium, default MDF The initial file size is 5 MB , Auto-increment 1 MB , Unlimited growth, LDF Initially 1 MB , Increasing 10% , Restrict the growth of files to a certain number, In general design, use SQL The built-in design is enough, but it is best to design the growth and initial size of a large database. If the initial value is too small, the database will be full soon. If it is full, what will happen when inserting? When the data file is full and some operations are performed, SQL Server Will wait for the operation Until the automatic file growth ends, the original operation can continue. If it takes a long time to increase itself, the original operation will not wait until the timeout is canceled (the default threshold is 15 Seconds), not only will this operation be rolled back, but the automatic file growth will also be canceled. That is to say, this time the file is not increased. The growth time depends on the size of the automatic growth. If it is too small, an operation may need to grow several times in a row to meet the requirement. If it is too large, it will take a long time, pay attention to the following points when setting automatic growth:

1) To increase by fixed size, instead of by proportion. This avoids unnecessary troubles caused by too much or too little growth at a time. We recommend that you set an increase for a small database.50 MBTo100 MB. Set an increase for large databases100 MBTo200 MB.

2) Regularly monitors the usage of each data file and tries to ensure that the remaining space of each file is as large as possible, or the expected proportion.

3) Set the maximum file size to avoidSQL ServerThe disk space is used up when files grow and affects the operating system.

4) Check the distribution of new data files in a timely manner after automatic growth. AvoidSQL ServerData is always written to individual files.

Therefore, for a busy database, we recommend that you enable the automatic database growth option to avoid application failure due to exhausted database space, but avoid automatic growth. At the same time, try not to use the auto-shrinking function.

1.3 Data and log files are stored separately on different Disks

Operations on data files and log files generate a large numberI/O. If possible, log files should be stored on a hard disk different from the data file where the data and index are located to distributeI/OAnd is also conducive to the disaster recovery of the database.

Optimization ②: Table partitions and index partitions (optimized ① rough table partitions, optimized ② accurate data partitions)

Why Table Partitioning?

What is the most important thing we want to do when the data volume of a table is too large? Split the table into two or more points, but the table is still the same, just store its content separately, so reading is faster.NTimes

Principle: Table data cannot be stored in files, but file groups can be stored in files, and tables can be placed in file groups. In this way, table data is indirectly stored in different files. Partition storage includes tables, indexes, and large object data.

SQL Server 2005Table partitions are introduced.When the data volume in the table increases, the speed of data query slows down, and the performance of the application degrades. In this case, you should consider partitioning the table, when there is a large amount of data in a table, it can be split into multiple tables, because the data to be scanned becomes less, and the query can run faster, which greatly improves the performance, after the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in multiple tablespaces (physical files, does not scan the entire table every time

2.1 when to use a partition table:

1. The table size exceeds 2 GB.

2. The table contains historical data. New data is added to the new partition.

2.2 Advantages and Disadvantages of Table Partitioning

Table partitions have the following advantages:
1To improve query performance: You can query partition objects by searching only the partitions you are concerned about, improving the search speed.
2Enhanced availability: If a partition of the table fails, the table's data in other partitions is still available;
3Easy Maintenance: If a partition of the table fails, you only need to repair the data;
4, BalancedI/O: You can map different partitions to the disk for balance.I/OTo improve the overall system performance.
Disadvantages:
Partition Table: an existing table cannot be directly converted to a partition table. HoweverOracleProvides the online table redefinition function..

2.3 Three steps for Table Partitioning

 2.31 create a partition function

Create Partition Function xx1 (INT)

As range left for values (10000,200 00 );

Note: Create a partition function: myrangepf2, with intType partition, divided into three intervals,10000LessAZone,1 w-2 WInBZone,2 WThe precedingCZone.

2.3.2 create a partition Architecture

Create Partition Scheme myrangeps2

As partition xx1

To (A, B, C );

Note: In the Partition Function xx1Create a partition architecture on:Myrangeps2, Which Is A, B, and CThree intervals

A, B, cThe names of the three file groups must beNDFThe file group to which the file belongs cannot be modified once the file group is created.

2.3.3 partition a table

Common data specifications --Change the data space type to: Partition Scheme, and then select the partition scheme name and partition column list. The result is as follows:

You can also use SQLStatement generation

Create Table [DBO]. [avcache] (

[Avnote] [varchar] (300) null,

[BB] [int] identity (1, 1)

) On [myrangeps2] (bb );--Note that[Myrangeps2]Architecture, accordingBb Partition

2.3.4 query table partitions

Select *, $ partition. [myrangepf2] (bb) from DBO. avcache

In this way, we can clearly see how table data is partitioned.

2.3.5 create an index Partition

 

Optimization ③: Distributed Database Design

The distributed database system is developed on the basis of a centralized database system. It is easy to understand, that is, to separate and distribute the entire database to various places. In essence, distributed database systems are divided into two types: 1. data is logically unified, but physically scattered. a distributed database is logically unified as a whole, physically, they are stored on different physical nodes. Generally, distributed databases are like this.The logic is distributed and physically distributed, which is also a federated distributed database,Because the sub-database systems that constitute the Federation are relative"Autonomy"This system can accommodate a variety of different databases for different purposes, and is suitable for database integration in a wide range..

Distributed databases are complex and are not used and described in detail here. It is just an example to illustrate that distributed databases are mostly used in systems with strong user segmentation. If a national chain store, the general design is that each branch has its own sales and inventory information, and the Headquarters needs to have databases such as employees, suppliers, and branch information. This type of Branch database can be completely consistent, many systems may also lead to inconsistency. In this way, the data of each chain store is stored locally, which increases the speed of impact and reduces communication costs, and data is distributed across different sites, there are multiple copies, so that even if some sites fail, the whole system will not be paralyzed. However, it also brings many problems, such as data consistency, remote data transfer implementation, and reduced communication overhead, which makes the development of distributed database systems more complex, I just want everyone to understand its principles. The specific usage will not be described in detail.

Optimization 4: Sort database fragments

If your table has already created an index but its performance is still poor, it is likely that index fragmentation is generated and you need to perform index fragmentation.

What is index fragmentation??

Due to the excessive insert, modify, and delete operations on the table, the index page is divided into multiple parts to form index fragmentation. If the index fragmentation is serious, the time for scanning the index will become longer, the index may even be unavailable, so the data retrieval operation slows down.

How do I know if index fragmentation has occurred??

InSqlserverDatabase, throughDBCC showcontigOrDBCC showcontig (Table Name)Check the index fragmentation and instruct us to rebuild it regularly. 

By analyzing the results of scanning density (too low) and scanning fragmentation (too high), we can determine whether index reconstruction is required. The main points are as follows:
Scan density [best count: actual count]-Scan density [optimal value:Actual value]:DBCC showcontigReturns the most useful percentage. This is the ratio of the Best Value of the extended disk to the actual value. This percentage should be as close as possible100%. If it is low, external fragments exist.

Logical scan fragmentation-Logical scan fragmentation:Unordered page percentage. The percentage should be in0%10%. If it is higher, it indicates there are external fragments.

Solution:

First, useDBCC indexdefragSort index fragments

Second, useDBCC dbreindexRe-create the index.

The differences between the two are as follows:
DBCC indexdefragCommands are online operations. Therefore, indexes are available only when the command is running and can be interrupted without losing the completed work. The disadvantage of this method is that no clustered index is removed in reorganizing data./The recreate operation is valid.
Re-create a clustered index will re-organize the data. The result is that the data page is filled up. The filling level can be used.FillfactorOption. The disadvantage of this method is that the index is removed/The instance is offline during the re-creation period, and the operation belongs to the atomic level. If the index creation is interrupted, the index will not be re-created. That is to say, to achieve good results, you still need to re-build the index, so you have to re-build the index.

Possible partsArticleI have used other online articles for reference. This article is for learning purposes only. For more information, see the source.

-------------- AK (old K): 2012-12-28

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.