Database optimization includes the following three parts, database optimization, database table optimization, program operation optimization. This article is the first part
optimization ①: Increase the secondary data file, set the automatic growth of the file (rough data partition)
1.1: Increase the secondary data file
Starting from SQLSERVER2005, the database does not default to generate NDF data files, generally there is a main data file (MDF) is enough, but some large databases, because of information, and query frequently, so in order to improve the speed of query, You can store some of the records in a table or some of the tables separately in a different data file.
Because the CPU and memory speed is much higher than the hard disk read and write speed, so you can put different data files on different physical hard drive, so that when the query, you can have multiple hard drives at the same time query, to fully utilize the CPU and memory performance, improve query speed. Here's a detailed description of how the write works, and how data Files (MDF, NDF) and log files (LDF) are written differently:
Data files: SQL Server distributes the new data into all the space data files according to the size of the existing free space of all files in the same filegroup, If there are three data files A.MDF,B.NDF,C.NDF, the idle size is 200MB,100MB, and 50MB, then write a 70mb of things, he will write to the ABC three files in 40, 20, 10 of the data, if a log file is full, is not written to it
Log files: Log files are written in order and one is full before writing to another
Visible from the above, if you can increase the data file ndf, in favor of large data query speed, but increase the log file is not useful.
1.2: Set up automatic file growth (large amount of data, small amount of data without setting)
In SQLServer2005, the default MDF file initial size of 5MB, since the increase of 1MB, unlimited growth, LDF initially 1MB, the growth of 10%, limited file growth to a certain number of general design, using the SQL with the design can, but the large database design, It is best to design their growth and initial size in person, if the initial value is too small, then the database will be full, if full, what will be the insertion? When the data file is full, and some actions are made, SQL Server waits until the automatic growth of the file is over and the original operation continues. If it takes a long time to grow, the original operation can not wait to cancel the timeout (the default threshold is 15 seconds), not only this operation will be rolled back, the automatic growth of the file will also be canceled. In other words, this time the file did not get any increase, the growth of time based on the size of the automatic growth, if too small, it may require a number of consecutive growth to meet, if too large, you need to wait a long time, so set up automatic growth to pay attention to a few:
1 to be set to increase by a fixed size, and not proportionally. This avoids the unnecessary hassle of growing too much or too little. It is recommended to compare smaller databases and set an increase of 50MB to 100MB at a time. For large databases, set to grow 100MB to 200MB at a time.
2 to regularly monitor the use of various data files, as far as possible to ensure that the remaining space for each file as large, or the expected proportion.
3 Set the maximum file size to prevent SQL Server files from growing out of disk space, affecting the operating system.
4 after the occurrence of growth, to timely check the new data file space allocation. Avoid SQL Server always writing data to individual files.
Therefore, for a more busy database, the recommended setting is to turn on the database automatic growth option to prevent the application from running out of database space, but to strictly avoid the occurrence of automatic growth. Also, try not to use the automatic shrink function.
1.3 Data and log files are stored separately on different disks
The operation of data files and log files produces a large amount of I/O. Where possible, the log file should be stored on a hard disk that is different from the data file where the data and index are located to distribute I/O, while also facilitating disaster recovery for the database.
optimized ②: Table partitioning, index partitioning (optimized ① rough table partitioning, optimizing ② for exact data partitioning)
Why do I need a table partition?
When the amount of data in a table is too large, what is the one thing we want to do most? Divide the table into one or more points, but the table is still the table, and it's just separating its contents, so it's almost n times faster to read.
Principle: Table data cannot be placed in a file, but a filegroup can be placed in a file, and a table can be placed in a filegroup, which indirectly implements the table data in a different file. There are also tables, indexes, and large object data that can be partitioned.
In SQLSERVER2005, introduced the concept of table partitioning, when the amount of data in the table is increasing, the speed of the query data will be slow, the performance of the application will be reduced, then you should consider partitioning the table, when there is a lot of data, you can split it into multiple tables, Because the data to be scanned becomes less, the query can run faster, this operation greatly improves performance, after the table is partitioned, the logical table is still a complete table, but the data in the table is physically stored in more than one table space (physical file), so that when querying data, do not scan the entire table every time
2.1 When to use the partition table:
1, the size of the table more than 2GB.
2, the table contains historical data, the new data is added to the new partition.
2.2 Table partitioning Advantages and disadvantages
Table partitioning has the following advantages:
1, improve query performance: The query on the partition object can only search their own care, improve the speed of retrieval.
2. Enhance usability: If a partition of a table fails, the data in other partitions is still available;
3, Maintenance Convenience: If the table of a section of the fault, need to repair the data, only repair the partition can;
4, balanced I/O: You can map different partitions to disk to balance I/O, improve the overall system performance.
Disadvantages:
Partition Table correlation: Existing tables do not have methods that can be directly converted into partitioned tables. However, Oracle provides the ability to redefine tables online.
2.3 Table Partitioning operations three steps away
2.31 Creating partition functions
CREATEPARTITIONFUNCTIONxx1 (int)
Asrangeleftforvalues (10000,20000);
Note: Create partition functions: myRangePF2, with int type partition, divided into three intervals, within 10000 in area A, 1w-2w in area B, 2W or above in Area C.
2.3.2 Create a partition schema
CREATEPARTITIONSCHEMEmyRangePS2
ASPARTITIONxx1
to (A,B,C);
Note: Create a partition schema on the partition function XX1: myRangePS2, a,b,c three intervals respectively
A,b,c are the names of three filegroups and must have three NDF affiliation to these three groups, and once created, file groups cannot be modified
2.3.3 Table Partitioning
Common data Specification-data space type modified to: Partition scheme, then select the partition scheme name and the partition column list, the result is as shown in the figure:
You can also use SQL statements to generate
CREATETABLE[DBO]. [Avcache] (
[Avnote] [varchar] (300) Null
[BB] [INT] IDENTITY (1,1)
) on[myrangeps2] (BB);--note that the [myRangePS2] architecture is used here, according to the BB partition
2.3.4 Query Table Partitioning
select*, $PARTITION. [MyRangePF2] (BB) Fromdbo.avcache
So you can clearly see how the table data is partitioned.
2.3.5 Create an index partition
Optimizing ③: Distributed Database Design
The distributed database system is developed on the basis of centralized database system, and it is simple to understand, that is, to separate the whole database and distribute it to various places, in essence, The distributed database system is divided into two kinds: 1. The data is logically unified and physically dispersed, a distributed database is logically a unified whole, and in physics it is stored separately on different physical nodes, and we usually say that the distributed database is 2. Logic is distributed, distributed in physics, and it is also distributed in a federal form. Database, because the Federation of the various sub-database system is relatively "autonomous", this system can accommodate a variety of different uses, a large diversity of databases, more suitable for a large range of database integration.
Distributed databases are more complex, in this no detailed use and description, just for example, now distributed database for more user-partitioned system, if a national chain, the general design for each branch has its own sales and inventory information, the headquarters will need to have employees, suppliers, store information and other databases , this type of branch database can be completely consistent, many systems can also cause inconsistencies, so that each chain store data is stored locally, thereby increasing the impact speed, reducing communication costs, and distributing data across different sites, with multiple replicas, even if the individual site fails, without causing the entire system to be paralyzed. But he also brings a lot of problems, such as: data consistency problem, the implementation of remote data transmission, the reduction of communication costs, which makes the development of distributed database system more complex, just let everyone understand its principle, the specific use of the way does not do a detailed introduction.
Optimizing ④: Organizing Database Fragmentation
If your table has already been indexed, but the performance is still bad, it's likely to be an index fragment that you need to defragment.
What is an index fragment?
Because of excessive inserts, modifications, and deletions on the table, index pages are fragmented, and if the index is fragmented, the time to scan the index becomes longer and even the index is unavailable, so the data retrieval operation slows down.
How do I know if an index fragment has occurred?
In the SQL Server database, check the index fragmentation by dbccshowcontig or dbccshowcontig (table name) to guide us in a timed rebuild.
Through the scanning density (too low), scanning debris (too high) results analysis, determine whether the need for index reconstruction, mainly look at the following two:
scandensity[bestcount:actualcount]-scan Density [best value: Actual value]:dbccshowcontig Returns the most useful percentage. This is the ratio of the best value and the actual value of the extents. The percentage should be as close to 100% as possible. Low indicates an external fragment.
logicalscanfragmentation-Logical Scan Fragmentation: The percentage of unordered pages. The percentage should be between 0% and 10%, and high indicates an external fragment.
resolution Method:
One is to use Dbccindexdefrag to organize index fragments
The second is to use Dbccdbreindex to reconstruct the index.
The difference between the two calls to Microsoft is as follows:
The Dbccindexdefrag command is online, so the index is available only when the command is running, and you can break the operation without losing the completed work. The disadvantage of this approach is that there is no valid drop/re-create operation of the clustered index in the organization of the data.
Recreating the clustered index will rearrange the data so that the data page fills up. The level of fullness can be configured using the FILLFACTOR option. The disadvantage of this approach is that the index is offline during the drop/recreate cycle and the operation is atomic. If you break the index creation, the index is not re-created. In other words, to achieve good results, you have to use the Rebuild index, so you decide to rebuild the index.