SQL Server Database Optimization

Source: Internet
Author: User
Keywords Network programming MSSQL tutorials
Tags access address address space analysis application applications based cache
Designing 1 applications doesn't seem to be difficult, but it's not easy to achieve the optimal performance of the system. There are many choices in development tools, database design, application structure, query design, interface selection, and so on, depending on the specific application requirements and the skills of the development team. This article takes SQL Server as an example, discusses the application performance optimization techniques from the perspective of the background database, and gives some useful suggestions.


1 Database Design


to achieve optimal performance in a good SQL Server scenario, the key is to have a good database design scheme. In practice, many SQL Server scenarios tend to be poorly designed to cause poor performance. Therefore, to achieve good database design must consider these issues.


1.1 Logical Library Normalization problem


in general, the logical database design meets the normalized Top 3 standards:


1.1th specification: Columns without duplicate groups or multiple values.


2.2nd specification: Each non-critical field must rely on the primary key and cannot rely on some of the 1 modular primary keywords.


3.3rd specification: 1 non-critical fields cannot be dependent on another 1 non-critical fields.


the design that follows these rules produces fewer columns and more tables, thereby reducing data redundancy and reducing the number of pages used to store data. However, table relationships may need to be handled through complex mergers, which can degrade system performance. To some extent, the non-standard can improve the performance of the system, the non-standard process can be based on the performance of different considerations in a number of different methods, but the following methods are often validated by practice to improve performance.


1. You can consider adding duplicate attributes (columns) to a Database entity (table) if the normalized design produces many 4-way or more road-merge relationships.


2. Commonly used computed fields, such as totals, maximum values, and so on, can be considered for storage in a database entity.


For example, a program management system for a project has a schedule in the field: Project number, the plan of the year, two plans, adjustment plans, replenishment plans, and the total number of plans (the beginning of the plan + two plans + adjustment plan + replenishment plan) are often used in queries and reports, when the table records a large number of times, It is necessary to add the total number of plans to the table as 1 separate fields. Triggers can be used here to maintain data consistency at the client.


3. Redefine the entity to reduce the expense of external property data or row data. The corresponding non-canonical type is:


(1) Divides 1 entities (tables) into 2 tables (dividing all attributes into 2 groups). This separates the frequently accessed data from the less-visited data. This method requires that the primary keyword be copied in each table. The resulting design facilitates parallel processing and produces a table with fewer columns.


(2) divides 1 entities (tables) into 2 tables (dividing all rows into 2 groups). This approach applies to entities that will contain large amounts of data (tables). History is often preserved in applications, but history is rarely used. It is therefore possible to separate the frequently accessed data from the less visited historical data. And if a data row is accessed as a subset by a logical workgroup (department, Sales partition, geographic area, and so on), this approach can also be beneficial.


1.2 generates a physical database


to choose the basic physical implementation strategy correctly, you must understand the operational characteristics of the database access format and hardware resources, mainly memory and disk subsystem I/O. This is a wide range of topics, but the following guidelines may help.


1. The data types associated with each table column should reflect the minimum storage space required for the data, especially for indexed columns. For example, you can use the smallint type not to use the integer type, so that the index field can be read faster, and can be placed on 1 data pages more rows of data, thus reducing I/O operations.


2. Put 1 tables on a physical device, and then use the SQL Server segment to place its nonclustered index on 1 different physical devices, which can improve performance. In particular, the system uses a number of intelligent disk controllers and data separation technology, the benefits are more obvious.


3. Split a frequently used large table with a SQL Server segment and place it on a database device of 2 separate intelligent disk controllers, which can also improve performance. Data separation can also improve performance because multiple heads are being looked up.


4. Using SQL Server segments to store data for text or image columns on 1 separate physical devices can improve performance. 1 dedicated intelligent controllers can further improve performance.


2 hardware systems related to SQL Server


SQL Server-related hardware designs include system processors, memory, disk subsystems, and networks, which basically form the hardware platform that Windows NT and SQL Server run on.


2.1 System Processor (CPU)


to determine the CPU structure according to their specific needs is the process of estimating the CPU workload on the hardware platform. From previous experience, the CPU configuration should be at least 1 80586/100 processors. This is sufficient if you have only 2~3 users, but if you intend to support more users and critical applications, it is recommended to use Pentium Pro or Pⅱ level CPUs.


2.2 RAM (RAM)


determining the appropriate memory settings for the SQL Server scenario is critical to achieving good performance. SQL Server uses memory for process caching, data and index item caching, static server spending, and setup expenses. SQL Server can use up to 2GB of virtual memory, which is also the maximum set value. It is also important to consider that Windows NT and all of its associated services also occupy memory.


Windows NT provides 4GB of virtual address space for each WIN32 application. This virtual address space is mapped to physical memory by the Windows NT Virtual Memory Manager (VMM) and can reach 4GB on some hardware platforms. SQL Server applications know only virtual addresses, so they do not have direct access to physical memory, which is controlled by VMM. Windows NT allows virtual address space beyond the available physical memory, which reduces SQL Server performance when more virtual memory is allocated to SQL Server than is available for physical memory.


These address spaces are set specifically for the SQL Server system, so if you have other software on the same hardware platform (such as file and print sharing, application services, and so on) running, you should take into account that they also occupy a portion of the memory. In general, hardware platforms are configured with at least 32MB of memory, with Windows NT at least 16MB occupied. The 1 simple rule is to add 100KB of RAM to each concurrent user. For example, if you have 100 concurrent users, you need to 32mb+100 user *100KB=42MB memory at least, and the actual amount of usage needs to be adjusted according to the actual operation. It can be said that increasing memory is the most economical way to improve system performance.


2.3 Disk Subsystem


Designing 1 Good disk I/O systems is an important aspect of achieving a good SQL Server scenario. The disk subsystem discussed here has at least 1 disk control devices and one or more hard disk units, as well as disk setup and file system considerations. Intelligent SCSI-2 Disk controller or disk group controller is a good choice, its characteristics are as follows:


(1) Controller cache.

The
(2) bus board has a processor that can reduce the disruption to the system's CPU.


(3) asynchronous read and write support.


(4) 32-bit RAID support.


(5) Fast SCSI-2 drive.


(6) Read-ahead cache (at least 1 tracks).


3 Search Strategy


carefully selected the hardware platform, and achieved 1 good database solutions, and has the user needs and application knowledge, now should design the query and index. There are 2 aspects that are important for good query and indexing performance on SQL Server, the 1th is to generate queries and indexes based on knowledge of the SQL Server optimizer, and the 2nd is to leverage the performance characteristics of SQL Server to enhance data access operations.


3.1 SQL Server optimizer


the Microsoft SQL Server database kernel automatically optimizes data query operations submitted to SQL with 1 cost-based query optimizer. Data manipulation queries are queries that support the SQL keyword where or having, such as SELECT, Delete, and update. Cost estimates of the cost-based query optimizer generating clauses based on statistical information.

An easy way to understand the optimizer's data processing process is to detect the output of the SHOWPLAN command (
). If you use a character-based tool (such as isql), you can get the output of the SHOWPLAN command by typing show SHOWPLAN on. If you use a graphical query, such as a query tool or isql in SQL Enterprise Manager, you can set configuration options to provide this information.


SQL Server optimization is done in 3 phases: Query analysis, index selection, and merge selection.





1. Query Analysis


in the query analysis phase, the SQL Server optimizer looks at each clause represented by a regular query tree and determines whether it can be optimized. SQL Server generally tries to optimize the clauses that restrict scanning. For example, search and/or merge clauses. But not all legitimate SQL syntax can be divided into optimized clauses, such as clauses containing SQL inequality "<>". Because "<>" is a 1-repulsive operator, rather than a sex-containing operator, it is not possible to determine the scope of a clause before the entire table is scanned. When 1 relational queries contain a clause that is not optimized, the execution plan uses a table scan to access this part of the query, and the optimizer performs an index selection for the optimized SQL Server clause in the query tree.


2. Index selection


for each of the optimized clauses, the optimizer looks at the database system tables to determine if there are related indexes that can be used to access the data. This index is considered useful only if the 1 prefixes of the columns in the index exactly match the columns in the query clause. Because indexes are constructed according to the order of the columns, matching is required to be an exact match. For clustered indexes, the original data is sorted according to the indexed column order. You want to access the data in the secondary column of the index, just as you would find all the entries in the phone book for a last name, the sort is basically useless because you still have to look at each row to see if it meets the criteria. If 1 clauses have an available index, the optimizer determines the selectivity for it.


so in the design process, according to the query design criteria to carefully check all the query, query optimization features as the basis for the design of the index.


(1) A narrower index is more efficient. For narrower indexes, more index rows can be stored on each page and fewer indexes. As a result, more index pages can be placed in the cache, which also reduces I/O operations.


(2) The SQL Server optimizer can analyze a large number of indexes and merge possibilities. As a result, more narrow indexes can provide more choices to the optimizer than fewer wide indexes. But do not keep unnecessary indexes because they increase the cost of storage and maintenance. For composite, combined, or multiple-column indexes, the SQL Server optimizer retains only the distribution statistics of the most important columns, so that the 1th column of the index should be highly selective.


(3) Too many indexes on the table can affect the performance of update, insert, and delete because all indexes must be adjusted accordingly. In addition, all paging operations are recorded in the log, which also increases I/O operations.


(4) Indexing 1 frequently updated columns can severely affect performance.


(5) Because of storage expenses and I/O operations, a smaller group index has a better index performance. But its disadvantage is to maintain the columns from the group.


(6) Try to analyze the frequency of each important query so that you can find the most used indexes, and then you can optimize the indexes first.

Any column in the WHERE clause in the
(7) query is likely to be an indexed column because the optimizer focuses on this clause.


(8) Indexing small tables smaller than 1 ranges is not cost-effective because table scans tend to be faster and less expensive for small tables.


(9) A column used in conjunction with "ORDER BY" or "GROUP by" is generally suitable for a split-family index. If there is a clustered index on the column used in the order by command, then no more than 1 worksheets will be generated because the rows are sorted. The GROUP by command must produce 1 worksheets.


(10) Clustered indexes should not be constructed on frequently changing columns, as this can cause the entire row to move. This is especially true when implementing large transaction processing systems, where data tends to change frequently.


3. Merge Selection


the optimizer starts to perform a merge selection when the index selection ends and all clauses have a processing fee based on their access plan. The merge selection is used to find a valid order for merging clause access plans. To do this, the optimizer compares the different sorts of clauses and then selects the consolidation plan with the lowest processing cost from the point of view of physical disk I/O. Because the number of clause combinations grows exponentially with the complexity of queries, the SQL Server query optimizer uses tree pruning techniques to minimize the cost of these comparisons. When the merge selection phase ends, the SQL Server query optimizer has generated 1 cost-based query execution plans that take advantage of the available indexes and access the original data with minimal system overhead and good execution performance.


3.2 Efficient Query selection


From the 3 phases of query optimization above, it is easy to see that the design of physical I/O and logical I/O and the balance between processor time and I/O time are the main goals of efficient query design. In other words, you want to design a query that takes advantage of the index, the least disk read-write, and the most efficient use of memory and CPU resources.


The following recommendations are summarized from the SQL Server Optimizer's optimization strategy and are helpful for designing efficient queries.


1. If there is a unique index, then the WHERE clause with the "=" operator has the best performance, followed by the closed interval (range), and then the open interval.


2. From a database access point of view, a WHERE clause with discontinuous connectors (or and in) generally does not perform well. Therefore, the optimizer may adopt the R policy, which generates 1 worksheets that contain each of the executed identifiers that might match, and the optimizer sees the row markers (page numbers and line numbers) as "dynamic Indexes" that point to the matching rows in the 1 tables. The optimizer simply scans the worksheet, takes out each row marker, and then obtains the corresponding row from the datasheet, so the cost of the R policy is to generate the worksheet.


3 contains not, <>, or! = The WHERE clause is not useful for the optimizer's index selection. Because such clauses are repulsive rather than inclusive, it is not possible to determine the selectivity of clauses until the entire original data table is scanned.


4. Restricting data conversions and string operations, the optimizer generally does not generate index selections based on expressions and data conversions in the WHERE clause. For example:


paycheck * 12>36000 or substring (lastname,1,1) = "L"


If the table establishes an index for paycheck and LastName, it cannot be optimized with indexes, and the above conditional expression can be rewritten as follows:


PAYCHECK<36000/12 or LastName like "l%"

The local variables in the
5.WHERE clause are considered not to be known and considered by the optimizer, except for variables defined as input parameters to the reserve procedure.


6. If you do not have an index that contains a merge clause, the optimizer constructs 1 worksheets to hold the rows in the smallest table in the merge. You then construct 1 clustered indexes on this table to complete an efficient merge.  The cost of this approach is the production of worksheets and the subsequent generation of a reformatting index, which is called a process. You should pay attention to the size of the database tempdb in RAM or on disk (except for the SELECT INTO statement). Also, if these types of operations are common, putting tempdb in RAM is good for performance improvement.


4 Other considerations for performance tuning


The main factors that affect SQL Server are, in fact, much more than that. The operating system is also heavily impacted, and options such as file system selection, network protocol, open service, and SQL Server priority have also affected SQL Server performance in varying degrees under Windows NT.


the factors that affect performance are so many, and the application is different, it is unrealistic to find 1 general optimization schemes, which must be adjusted continuously in the process of system development and maintenance. In fact, most of the tuning and tuning work is done on a server that is independent of the client, and is therefore practical.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.