Database performance Optimization One: Database self-optimization (large data volume)

Source: Internet
Author: User
Tags filegroup

Database optimization consists of the following three parts, database optimization, database table optimization, program operation optimization. This article is the first part of database performance optimization: Database optimization optimization ①: Add secondary data file, set file autogrow (coarse data partition) 1.1: Add secondary data file

Starting with SQL SERVER 2005, the database does not generate the NDF data file by default, generally there is a master data file (MDF) is enough, but some large databases, because of a lot of information, and query frequently, so in order to improve the query speed, You can store some of the records in a table or some tables separately in different data files.
Because the CPU and memory speed is much faster than the hard disk read and write speed, so you can put different data files on different physical hard disk, so that when the query execution, you can make more than one hard disk query at the same time, to take full advantage of CPU and memory performance, improve query speed. Here is a detailed description of how it is written, and the data file (MDF, NDF) and log file (LDF) are written differently:

Data files: SQL Server distributes the new data to all spatial data files according to the size of the existing free space of all the files in the same filegroup. If there are three data files A.MDF,B.NDF,C.NDF, the idle size is 200MB,100MB, and 50MB, then write a 70mb thing, he will write to the ABC three files one time 40, 20, 10 of the data, if a log file is full, Will not be written to it

Log files: Log files are written in sequence, and one is full before writing to another

From the above, if you can increase its data file ndf, to facilitate the query speed of large data, but to increase the log file is not useful.

1.2: Set file autogrow (Big Data volume, small amount of data without setting)

In SQL Server 2005, the default MDF file Initial size is 5MB, self-increment to 1MB, unlimited growth, LDF initial 1MB, growth of 10%, limit the file growth to a certain number, general design, using SQL comes with the design, but large database design, It's best to design the growth and initial size yourself, and if the initial value is too small, then the database will be full, and if it's full, what happens when you insert it? When the data file is full and some operations are performed, SQL Server waits until the file autogrow is over and the original operation continues. If the self-growth takes a long time, the original operation can not wait to cancel the timeout (the default threshold is typically 15 seconds), not only the operation will be rolled back, the file autogrow will be canceled. That is, this time the file does not get any increase, the growth time is determined by the size of the automatic growth, if it is too small, it may take a few successive growth to meet, if too large, it will need to wait a long time, so set up automatic growth to pay attention to the points:

1) to be set to grow by a fixed size, but not proportionally. This avoids the unnecessary hassle of having to grow too much or too little at a time. It is recommended that you compare smaller databases and set the growth time to up to one MB. For large databases, set the growth time to one MB to five MB.

2) to regularly monitor the use of each data file, as far as possible to ensure that the remaining space for each file as large, or the desired proportion.

3) Set the maximum file size to prevent SQL Server files from growing out of disk space and affecting the operating system.

4) After the growth, the new data file space allocation should be checked in time. Avoid SQL Server always writing data to individual files.

Therefore, for a busy database, the recommended setting is to turn on the automatic database Growth option to prevent the application from running out of database space, but avoid automatic growth. Also, try not to use the auto-shrink function.

1.3 Data and log files are stored separately on different disks

The operation of data files and log files generates a large amount of I/O. Where possible, log files should be stored on a hard disk that differs from the data file on which the data and indexes reside to distribute I/O, while also benefiting the database for disaster recovery.

Optimized ②: Table partitioning, index partitioning (optimized ① coarse table partitioning, optimized ② for accurate data partitioning)

Why table partitioning?

What is the one thing we want to do most when the amount of data on a table is too large? Divide the table into one or more points, but the table is still the table, but the contents are stored separately, so that the read is almost n times faster.

Principle: Table data cannot be placed in the file, but the filegroup can be placed in the file, the table can be placed in the filegroup, so that the table data is indirectly stored in different files. There are also tables, indexes, and large object data that can be stored in partitions.

SQL SERVER 2005, introduced the concept of table partitioning, when the amount of data in the table is increasing, the speed of query data will be slow, the performance of the application will be reduced, you should consider partitioning the table, when the data in a table is many, you can split it into multiple tables, Because the data to be scanned becomes less, the query runs more quickly, so the operation greatly improves performance, the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in multiple table spaces (physical files) so that querying the data does not always scan the entire table

2.1 When to use the partition table:

1, the size of the table is more than 2GB.

2. The table contains historical data, and new data is added to the new partition.

2.2 Advantages and disadvantages of table partitioning

Table partitioning has the following advantages:
1. Improve query performance: Queries on partitioned objects can search only the partitions they care about and improve the retrieval speed.
2. Enhanced usability: If one partition of the table fails, the data of the table in other partitions is still available;
3, Maintenance Convenience: If a partition of the table fails, you need to repair the data, only repair the partition;
4. Balanced I/O: Different partitions can be mapped to disk to balance I/O and improve overall system performance.
Disadvantages:
Partition Table Related: Existing tables have no methods that can be converted directly into partitioned tables. However, Oracle provides the ability to redefine tables online.

2.3 Table Partitioning operation three-step walk

2.31 Creating a partition function

CREATE PARTITION FUNCTION xx1 (int)

As RANGE left for VALUES (10000, 20000);

Note: create partition function: myRangePF2, with int type partition, divide three interval, 10000 within a area, 1w-2w in B area, 2W above in C area.

2.3.2 Creating a partition Schema

CREATE PARTITION SCHEME myRangePS2

As PARTITION xx1

To (A, B, c);

Note: Create a partition schema on the partition function XX1: myRangePS2, a,b,c three intervals respectively

A,b,c is the name of three filegroups, and must have three ndf subordinate to these three groups, and the file group to which it belongs cannot be modified once it is created.

2.3.3 Partitioning a table

Common data Specification-data space type modified to: Partition scheme, and then select the partition scheme name and partition column list, the result:

You can also use SQL statements to generate

CREATE TABLE [dbo]. [Avcache] (

[Avnote] [varchar] () NULL,

[BB] [INT] IDENTITY (+)

) on [myRangePS2] (BB); --note that the [myRangePS2] schema is used here, according to the BB partition

2.3.4 Querying table partitioning

SELECT *, $PARTITION. [MyRangePF2]  (BB) FROM dbo. Avcache

So you can clearly see how the table data is partitioned.

2.3.5 Creating an index partition

Optimized ③: Distributed database design

Distributed database system is developed on the basis of centralized database system, it is simple to understand, it is to separate the whole database and distribute it to all places, in essence, The distributed database system is divided into two types: 1. The data is logically unified, but physically decentralized, a distributed database is logically a unified whole, physically stored on different physical nodes, we usually say that the distributed database is this 2. The logic is distributed, and it's physically distributed, and this is also a federated distribution. Database, due to the formation of the Federation of the various sub-database system is relatively "autonomous", this system can accommodate a variety of different purposes, a large number of databases, more suitable for large-scale database integration.

Distributed database is more complex, this does not make the detailed use and the explanation, just illustrates, now the distributed database is more used in the user partition strong system, if a national chain store, general design for each branch has its own sales and inventory information, headquarters need to have staff, suppliers, branch information database , this type of store database can be fully consistent, many systems may also lead to inconsistencies, so that each chain store data stored locally, thereby increasing the impact speed, reduce communication costs, and the data distributed in different venues, and there are multiple copies, even if the individual site failure, will not cause the entire system paralysis. But he also brought a lot of problems, such as: data consistency problem, the realization of data remote transmission, the reduction of communication cost, which makes the development of distributed database system become more complex, just let everyone understand its principle, the specific use of the method does not do a detailed introduction.

Optimizing ④: Defragmenting database fragments

If your table has already created indexes, but the performance is still not good, it is likely that index fragmentation is generated and you need to defragment the index.

What is index fragmentation?

Because of the excessive insert, modify, and delete operations on the table, the index page is fragmented into chunks, and if the index fragment is severe, the time to scan the index becomes longer and even the index is unavailable, so the data retrieval operation slows down.

How do I know if an index fragment has occurred?

In the SQL Server database, the index fragmentation is checked through DBCC SHOWCONTIG or DBCC SHOWCONTIG (table name), which guides us through the scheduled rebuild.

By analyzing the result of scanning density (too low) and scanning fragmentation (too high), it is necessary to determine whether the index reconstruction is required, mainly see the following two:
Scan Density [best count:actual count]-scanning density [optimal value: Actual value]:DBCC Showcontig returns the most useful percentage. This is the ratio of the best and actual values of the extents. The percentage should be as close to 100% as possible. Low, it means that there are external fragments.

Logical Scan fragmentation-Logical Scan Fragmentation: Percentage of unordered pages. The percentage should be between 0% and 10%, and high will indicate an external fragment.

How to resolve:

One is to use DBCC INDEXDEFRAG to organize index fragmentation

The second is to reconstruct the index using DBCC DBREINDEX.

The difference between the two calls to Microsoft is as follows:
The DBCC indexdefrag command is an online operation, so the index is available only when the command is running and can be interrupted without losing the completed work. The disadvantage of this approach is that the drop/re-create operation without a clustered index is valid in terms of data re-organization.
Re-creating the clustered index will reorganize the data, with the result that the data page fills up. The fill level can be configured using the FILLFACTOR option. The disadvantage of this approach is that the index is offline in the drop/recreate cycle, and the operation is atomic level. If the index creation is interrupted, the index is not recreated. In other words, to achieve good results, you still have to rebuild the index, so you decide to rebuild the index.

Transferred from: http://www.cnblogs.com/AK2012/archive/2012/12/25/2012-1228.html

Database performance Optimization One: Database self-optimization (large data volume)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.