Database performance Optimization One: Database self-optimization (large data volume)

Last Update:2017-07-20 Source: Internet

Author: User

Tags filegroup

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

database optimization includes the following three parts, database optimization, database table optimization, program operation optimization. This article is the first partDatabase performance Optimization One: Database self-optimizationoptimize ①: Add secondary data file, set file autogrow (coarse data partition)1.1: Add secondary data files

Starting with SQL SERVER 2005, the database does not generate the NDF data file by default, generally there is a master data file (MDF) is enough, but some large databases, because of a lot of information, and query frequently, so in order to improve the query speed, Some tables or some of the tables can be stored separately in different data files because the CPU and memory speed is much faster than the hard disk read and write speed, so you can put different data files on different physical hard disk, so that when the query, you can make more than one hard disk query at the same time, To take full advantage of CPU and memory performance, improve query speed. Here is a detailed description of how it is written, and the data file (MDF,NDF) and log file (LDF) are written differently:

Data files:SQL ServerAccording to the size of the existing free space of all the files in the same filegroup, the new data is distributed to all the spatial data files according to this scale, if there are three data files A.mdf,b.ndf,c.ndf, idle size 200mb,100mb, and 50mb, then write a 70mb thing, He would write to the ABC three files one at a time,and If a log file is full, it will not be written to

Log files: Log files are written in sequence, and one is full before writing to another

From the above, if you can increase its data file ndf, to facilitate the query speed of large data, but to increase the log file is not useful.

1.2: Set file autogrow (Big Data volume, small amount of data without setting)

InSQL Server 2005, the defaultMDFFile Initial size is1mbldf 1MB 10% sql design, but large database design, it is best to design its growth and initial size, if the initial value is too small, then soon the database will be full, if it is full, what happens when inserting? When the data file is full and certain operations are performed, Sql server15 seconds), Not only will the operation be rolled back, but the file autogrow will also be canceled. That is, this time the file does not get any increase, the growth time is determined by the size of auto growth, if it is too small, it may need a few successive growth to meet, if too large, it will take a long time, so set up automatic growth to note the points:

1) to be set to grow by a fixed size, but not proportionally. This avoids the unnecessary hassle of having to grow too much or too little at a time. It is recommended that you compare smaller databases and set the growth time to up to one MB . For large databases, set the growth time to one MB to five MB.

2) to regularly monitor the use of each data file, as far as possible to ensure that the remaining space for each file as large, or the desired proportion.

3) Set the maximum file size to prevent SQL Server files from growing out of disk space and affecting the operating system.

4) After the growth, the new data file space allocation should be checked in time. Avoid SQL Server always writing data to individual files.

Therefore, for a busy database, the recommended setting is to turn on the automatic database Growth option to prevent the application from running out of database space, but avoid automatic growth. Also, try not to use the auto-shrink function.

1.3 Data and log files are stored separately on different disks

The operation of data files and log files generates a large amount of I/O. Where possible, log files should be stored on a hard disk that differs from the data file on which the data and indexes reside to distribute I/O, while also benefiting the database for disaster recovery.

optimized ②: Table partitioning, index partitioning (optimized ① coarse table partitioning, optimized ② for accurate data partitioning)

Why table partitioning?

What is the one thing we want to do most when the amount of data on a table is too large? Divide the table into one or more points, but the table is still the table, but the contents are stored separately, so that the read is almost N times faster.

Principle: Table data cannot be placed in the file, but the filegroup can be placed in the file, the table can be placed in the filegroup, so that the table data is indirectly stored in different files. There are also tables, indexes, and large object data that can be stored in partitions.

In SQL SERVER 2005 , the concept of table partitioning is introduced, and when the amount of data in a table grows, the speed of query data slows down and the performance of the application degrades, so you should consider partitioning the table, which can be split into multiple tables when there is a lot of data Because the data to be scanned becomes less, the query runs more quickly, so the operation greatly improves performance, the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in multiple table spaces (physical files) so that querying the data does not always scan the entire table

2.1 When to use the partition table:

1, the size of the table is more than 2GB.

2. The table contains historical data, and new data is added to the new partition.

2.2 Advantages and disadvantages of table partitioning

Table partitioning has the following advantages: 1. Improve query performance: Queries on partitioned objects can search only the partitions they care about and improve the retrieval speed. 2.Enhanced Usability: If one partition of the table fails, the data of the table on the other partition is still available; 3, convenient maintenance: If a partition of the table fails, you need to repair the data, only repair the partition; 4.Balanced I/O : You can map different partitions to disk to balance I/Oand improve overall system performance. Cons: Partitioned table Related: Existing tables no method can be converted directly into a partitioned table. However, Oracle provides the ability to redefine tables online.

2.3Table Partitioning operation three-step walk

2.31 Creating a partition function

CREATE PARTITION FUNCTION xx1 (int)

As RANGE left for VALUES (10000, 20000);

Note: create partition function: MyRangePF2, partition with int type, divided into three intervals,10000 with inner a zone,1w-2w in B area,2W above in C area .

2.3.2 Creating a partition Schema

CREATE PARTITION SCHEME myRangePS2

As PARTITION xx1

To (A, B, c);

Note: Create a partition schema on the partition function XX1 : myRangePS2, a,b,c three intervals respectively

A,b,c is the name of three filegroups, and must have three NDF subordinate to these three groups, and the file group to which it belongs cannot be modified once it is created .

2.3.3 Partitioning a table

Common data Specification- data space type modified to: Partition scheme, and then select the partition scheme name and partition column list, the result:

You can also use SQL statements to generate

CREATE TABLE [dbo]. [Avcache] (

[Avnote] [varchar] () NULL,

[BB] [INT] IDENTITY (+)

) on [myRangePS2] (BB); -- Note that the [myRangePS2] schema is used here, according to the BB partition

2.3.4 Querying table partitioning

SELECT *, $PARTITION. [MyRangePF2] (BB) FROM dbo. Avcache

So you can clearly see how the table data is partitioned.

2.3.5 Creating an index partition

optimized ③: Distributed database Design

Distributed database system is developed on the basis of centralized database system, it is simple to understand, it is to separate the whole database and distribute it to all places, in essence, The distributed database system is divided into two kinds: 1. The data is logically unified, but physically dispersed, a distributed database is logically a unified whole, physically stored on different physical nodes, and we typically say that the distributed database is this 2. The logic is distributed, is also distributed in the physical, this is also a federated distribution database, because the constituent federation of the various sub-database system is relatively " autonomous " , this system can accommodate a variety of different purposes, a large number of databases, More suitable for large-scale database integration.

Distributed database is more complex, this does not make the detailed use and the explanation, just illustrates, now the distributed database is more used in the user partition strong system, if a national chain store, general design for each branch has its own sales and inventory information, headquarters need to have staff, suppliers, branch information database , this type of store database can be fully consistent, many systems may also lead to inconsistencies, so that each chain store data stored locally, thereby increasing the impact speed, reduce communication costs, and the data distributed in different venues, and there are multiple copies, even if the individual site failure, will not cause the entire system paralysis. But he also brought a lot of problems, such as: data consistency problem, the realization of data remote transmission, the reduction of communication cost, which makes the development of distributed database system become more complex, just let everyone understand its principle, the specific use of the method does not do a detailed introduction.

Optimizing ④: Defragmenting database Fragments

If your table has already created indexes, but the performance is still not good, it is likely that index fragmentation is generated and you need to defragment the index.

　　What is index fragmentation ?

Because of the excessive insert, modify, and delete operations on the table, the index page is fragmented into chunks, and if the index fragment is severe, the time to scan the index becomes longer and even the index is unavailable, so the data retrieval operation slows down.

　How do I know if an index fragment has occurred ?

In the SQL Server database, the index fragmentation is checked through DBCC SHOWCONTIG or DBCC SHOWCONTIG ( table name ) , which guides us through the scheduled rebuild.

By analyzing the result of scanning density (too low), scanning fragment (too high), it is necessary to determine whether the index reconstruction is needed, mainly see the following two: Scan Density [best count:actual count]- scanning density [optimal value : actual value]:DBCC Showcontig returns one of the most useful percentages. This is the ratio of the best and actual values of the extents. The percentage should be as close as possible . Low, it means that there are external fragments.

Logical Scan fragmentation- Logical Scan Fragmentation: Percentage of unordered pages. The percentage should be between 0 and ten, and the height would indicate an external fragment.

How to resolve:

One is to use DBCC Indexdefrag to organize index fragmentation

The second is to reconstruct the index using DBCC dbreindex .

The difference between the two calls to Microsoft is as follows: the DBCC indexdefrag command is an online operation, so the index is available only when the command is running, and can be interrupted without losing the completed work. The disadvantage of this approach is that the drop /re-create operation without a clustered index is valid in terms of data re-organization . Re-creating the clustered index will reorganize the data, with the result that the data page fills up. The fill level can be configured using the FILLFACTOR option. The disadvantage of this approach is that the index is offline in the drop / recreate cycle, and the operation is atomic level. If the index creation is interrupted, the index is not recreated. In other words, to achieve good results, you still have to rebuild the index, so you decide to rebuild the index.

Database performance Optimization One: Database self-optimization (large data volume)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More