optimization of SQL Server database under large data volume

Source: Internet
Author: User
Keywords Can be if different so

1.1: Increase the secondary data file

Starting with SQL SERVER 2005, the database does not default to generate NDF data files, generally have a main data file (MDF) is enough, but some large databases, because of information, and frequent queries, so in order to improve the speed of query, You can store some of the records in some tables or tables in separate data files

Because the CPU and memory speed is much higher than the hard disk read and write speed, so you can put different data files on different physical hard drive, so that when the query, you can have multiple hard drives at the same time query, to fully utilize the CPU and memory performance, improve query speed. Here's a detailed description of how the write works, and how data Files (MDF, NDF) and log files (LDF) are written differently:

Data files: SQL Server distributes the new data to all the space data files according to the size of the existing free space of all files in the same filegroup, if there are three data files A.mdf, B.NDF,C.NDF, the idle size is 200MB,100MB, and 50MB, then write a 70mb thing, he will write to the ABC three files in 40, 20, 10 of the data, if a log file is full, it will not write to it

Log files: Log files are written in order and one is full before writing to another

Visible from the above, if you can increase the data file ndf, in favor of large data query speed, but increase the log file is not useful.

1.2: Set up automatic file growth (large amount of data, small amount of data without setting)

In SQL Server 2005, the default MDF file initial size of 5MB, since the increase of 1MB, unlimited growth, LDF initially 1MB, growth of 10%, limited file growth to a certain number, the general design, using SQL with the design can, but the large database design, It is best to design their growth and initial size in person, if the initial value is too small, then the database will be full, if full, what will be the insertion? When the data file is full, and some operations are done, SQL Server waits until the automatic growth of the file is over and the original operation continues. If it takes a long time to grow, the original operation can not wait to cancel the timeout (the default threshold is 15 seconds), not only this operation will be rolled back, the automatic growth of the file will also be canceled. In other words, this time the file did not get any increase, the growth of time based on the size of the automatic growth, if too small, it may require a number of consecutive growth to meet, if too large, you need to wait a long time, so set up automatic growth to pay attention to a few:

1 to be set to increase by a fixed size, and not proportionally. This avoids the unnecessary hassle of growing too much or too little. It is recommended to compare smaller databases and set a growth of MB to MB. For large databases, set a growth of MB to MB.

2 to regularly monitor the use of each data file, as far as possible to ensure that the remaining space for each file as large, or the expected proportion.

3 Set the maximum file size to prevent SQL Server files from growing out of disk space, affecting the operating system.

4 after the occurrence of growth, to timely check the new data file space allocation. Avoid SQL Server always writing data to individual files.

Therefore, for a more busy database, the recommended setting is to turn on the database automatic growth option to prevent the application from running out of database space, but to strictly avoid the occurrence of automatic growth. Also, try not to use the automatic shrink function.

1.3 Data and log files are stored separately on different disks

The operation of data files and log files produces a large amount of I/O. Where possible, the log file should be stored on a hard disk that is different from the data file where the data and index are located to distribute I/O, while also facilitating disaster recovery for the database.

Optimized ②: Table partitioning, index partitioning (optimized ① rough table partitioning, optimizing ② for exact data partitioning)

Why do I need a table partition?

When the amount of data in a table is too large, what do we want to do most? Divide the table into one or more points, but the table is still the table, and it's just separating its contents, so it's almost n times faster to read.

Principle: Table data cannot be placed in a file, but a filegroup can be placed in a file, and a table can be placed in a filegroup, which indirectly implements the table data in a different file. There are tables, indexes, and large object data that can be partitioned.

SQL SERVER 2005, introduced the concept of table partitioning, when the amount of data in the table is increasing, the speed of the query data will be slow, the performance of the application will be reduced, then you should consider partitioning the table, when there is a lot of data, you can split it into multiple tables, Because the data to be scanned becomes less, the query can run faster, this operation greatly improves performance, after the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in more than one table space (physical file), so that when querying data, do not scan the entire table every time

2.1 When to use the partition table:

1, the size of the table more than 2GB.

2. The table contains historical data, and new data is added to the new partition.

2.2 Table partitioning Advantages and disadvantages

Table partitioning has the following advantages:

1, improve query performance: Query the partition object can only search their own care, improve the speed of retrieval.

2. Enhance usability: If a partition of a table fails, the data in other partitions is still available;

3, Maintenance Convenience: If the table of a section of the fault, need to repair the data, only repair the partition can;

4, balanced I/O: You can map different partitions to disk to balance I/O, improve the overall system performance.

Shortcomings:

Partition Table correlation: Existing tables do not have methods that can be directly converted into partitioned tables. However, Oracle provides the ability to redefine tables online.

2.3 Table Partitioning operations three steps away

2.31 Creating partition functions

CREATE PARTITION FUNCTION xx1 (int)

As RANGE left for VALUES (10000, 20000);

Note: Create partition functions: myRangePF2, with int type partition, divided into three intervals, within 10000 in area A, 1w-2w in area B, 2W or above in Area C.

2.3.2 Create a partition schema

CREATE PARTITION SCHEME myRangePS2

As PARTITION xx1

To (A, B, c);

Note: Create a partition schema on the partition function XX1: myRangePS2, a,b,c three intervals respectively

A,b,c are the names of three filegroups and must have three NDF affiliation to these three groups, and once created, file groups cannot be modified

2.3.3 Table Partitioning

Common data Specification-data space type modified to: Partition scheme, then select the partition scheme name and the partition column list, the result is as shown in the figure:

You can also use SQL statements to generate

CREATE TABLE [dbo]. [Avcache] ([avnote] [varchar] (+) NULL, [BB] [int] IDENTITY (1,1)) On [myRangePS2] (BB);

--note that the [myRangePS2] architecture is used here, according to the BB partition

2.3.4 Query Table Partitioning

SELECT *, $PARTITION. [MyRangePF2] (BB) FROM dbo. Avcache

So you can clearly see how the table data is partitioned.

2.3.5 Create an index partition

Optimizing ③: Distributed Database design

Distributed database system is developed on the basis of centralized database system, and it is very simple to understand, that is, to separate the whole database and distribute it to various places, in essence, distributed The database system is divided into two kinds: 1. The data is logically unified and physically dispersed, and a distributed database is logically a unified whole, and is physically stored on separate physical nodes. We usually say that the distributed database is this 2. Logic is distributed and physically distributed, and this is also a federated distribution database, because the various sub database systems that make up the federation are relatively "autonomous", and this system can accommodate a wide variety of disparate databases, It is more suitable for the integration of database in a large scope.

Distributed databases are more complex, in this no detailed use and description, just for example, now distributed database for more user-partitioned system, if a national chain, the general design for each branch has its own sales and inventory information, the headquarters will need to have employees, suppliers, store information and other databases , this type of branch database can be completely consistent, many systems can also cause inconsistencies, so that each chain store data is stored locally, thereby increasing the impact speed, reducing communication costs, and distributing data across different sites, with multiple replicas, even if the individual site fails, without causing the entire system to collapse. But he also brings a lot of problems, such as: data consistency problem, the implementation of remote data transmission, the reduction of communication costs, which makes the development of distributed database system more complex, just let everyone understand its principle, the specific use of the way does not do a detailed introduction.

Optimizing ④: Organizing Database fragmentation

If your table has already been indexed, but the performance is still bad, it's likely to be an index fragment that you need to defragment.

What is an index fragment?

Because of excessive inserts, modifications, and deletions on the table, index pages are fragmented, and if the index is fragmented, the time to scan the index becomes longer and even the index is unavailable, so the data retrieval operation slows down.

How do I know if an index fragment has occurred?

In the SQL Server database, the index fragmentation is checked by the DBCC SHOWCONTIG or DBCC SHOWCONTIG (table name) to guide us in a timed rebuild.

Through the scanning density (too low), scanning debris (too high) results analysis, determine whether the need for index reconstruction, mainly look at the following two:

Scan density [best http://www.aliyun.com/zixun/aggregation/16460.html >count:actual count]-scan density [optimal: actual value]:DBCC Showcontig returns the most useful percentage. This is the ratio of the best value and the actual value of the extents. The percentage should be as close to 100% as possible. Low indicates an external fragment.

Logical Scan fragmentation-Logical Scan Fragmentation: The percentage of unordered pages. The percentage should be between 0% and 10%, and high indicates an external fragment.

Solution:

First, use DBCC INDEXDEFRAG to defragment index fragments

The second is to reconstruct the index using DBCC DBREINDEX.

The difference between the two calls to Microsoft is as follows:

The DBCC indexdefrag command is an online operation, so the index is available only when the command is running, and you can break the operation without losing the completed work. The disadvantage of this approach is that there is no valid drop/re-create operation of the clustered index in the organization of the data.

Recreating the clustered index will rearrange the data so that the data page fills up. The level of fullness can be configured using the FILLFACTOR option. The disadvantage of this approach is that the index is offline during the drop/recreate cycle and the operation is atomic. If you break the index creation, the index is not re-created. In other words, to achieve good results, you have to use the Rebuild index, so you decide to rebuild the index.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.