There are many choices in development tools, database design, application structure, query design, interface selection, and so on, depending on the specific application requirements and the skills of the development team. This article takes SQL Server as an example, discusses the application performance optimization techniques from the perspective of the background database, and gives some useful suggestions.
1 Database Design
To achieve optimal performance in a good SQL Server scenario, it is critical to have a good database design scenario. In practice, many SQL Server scenarios tend to be poorly designed to cause poor performance. Therefore, to achieve a good database design must consider these issues.
1.1 Normalization of logical libraries
In general, the logical database design meets the normalized top 3 standards:
1.1th specification: Columns that do not have duplicate groups or multiple values.
2.2nd specification: Each non-critical field must rely on the primary key and cannot rely on some parts of the 1 modular primary keywords.
3.3rd specification: 1 non-critical fields cannot be dependent on another 1 non-critical fields.
The design that adheres to these rules produces fewer columns and more tables, thereby reducing data redundancy and reducing the number of pages used to store data. However, table relationships may need to be handled through complex mergers, which can degrade system performance. To some extent, the non-standard can improve the performance of the system, the non-standard process can be based on the performance of different considerations in a number of different methods, but the following methods are often validated by practice to improve performance.
1. If a normalized design produces many 4-way or more road-merging relationships, consider adding duplicate attributes (columns) to the Database Entity (table).
2. Commonly used computational fields, such as totals, maximum values, and so on, can be considered for storage in a database entity.
For example, the plan management system of a project has a schedule, its field: project number, the plan of the year, two plans, adjustment plans, replenishment plans, and the total number of plans (the beginning of the plan + two plans + adjustment plan + replenishment plan) are often used in queries and reports, when the table records a large number of times, It is necessary to add the total number of plans to the table as 1 separate fields. Triggers can be used here to maintain data consistency at the client.
3. Redefine the entity to reduce the expense of external property data or row data. The corresponding non-canonical types are:
(1) Dividing 1 entities (tables) into 2 tables (dividing all the attributes into 2 groups). This separates the frequently accessed data from the less-visited data. This method requires that the primary keyword be copied in each table. The resulting design facilitates parallel processing and produces a table with fewer columns.
(2) Dividing 1 entities (tables) into 2 tables (dividing all rows into 2 groups). This approach applies to entities that will contain large amounts of data (tables). History is often preserved in applications, but history is rarely used. It is therefore possible to separate the frequently accessed data from the less visited historical data. And if a data row is accessed as a subset by a logical workgroup (department, Sales partition, geographic area, and so on), this approach can also be beneficial.
1.2 Generating the physical database
To choose the basic physical implementation strategy correctly, you must understand the operational characteristics of the database access format and hardware resources, mainly memory and disk subsystem I/O. This is a wide range of topics, but the following guidelines may help.
1. The data types associated with each table column should reflect the minimum storage space required for the data, especially for indexed columns. For example, you can use the smallint type not to use the integer type, so that the index field can be read faster, and can be placed on 1 data pages more rows of data, thus reducing I/O operations.
2. The performance can be improved by placing 1 tables on a physical device and then placing its nonclustered index on 1 different physical devices through the SQL Server segment. In particular, the system uses a number of intelligent disk controllers and data separation technology, the benefits are more obvious.
3. Split a frequently used large table with SQL Server segments and place it on a database device of 2 separate intelligent disk controllers, which can also improve performance. Data separation can also improve performance because multiple heads are being looked up.
4. Using SQL Server segments to store data from text or image columns on 1 separate physical devices can improve performance. 1 dedicated intelligent controllers can further improve performance.
2 hardware systems related to SQL Server
SQL Server-related hardware designs include system processors, memory, disk subsystems, and networks, which basically form the hardware platform on which Windows NT and SQL Server run.
2.1 System Processor (CPU)
The process of determining the CPU structure according to your own specific needs is to estimate the CPU workload on the hardware platform. From previous experience, the CPU configuration should be at least 1 80586/100 processors. This is sufficient if you have only 2~3 users, but if you intend to support more users and critical applications, it is recommended to use Pentium Pro or Pⅱ level CPUs.
2.2 Memory (RAM)
Determining the appropriate memory settings for your SQL Server scenario is critical to achieving good performance. SQL Server uses memory to do process caching, data and index item caching, static server expenses, and setup expenses. SQL Server can use up to 2GB of virtual memory, which is also the maximum set value. It is also important to consider that Windows NT and all of its associated services also occupy memory.
Windows NT provides 4GB of virtual address space for each WIN32 application. This virtual address space is mapped to physical memory by the Windows NT Virtual Memory Manager (VMM) and can reach 4GB on some hardware platforms. SQL Server applications know only virtual addresses, so they do not have direct access to physical memory, which is controlled by VMM. Windows NT allows virtual address space beyond the available physical memory, which reduces the performance of SQL Server when more virtual memory is allocated to SQL Server than is available for physical memory.
These address spaces are set specifically for SQL Server systems, so if you have other software on the same hardware platform (such as file and print sharing, application services, and so on) running, you should take into account that they also occupy a portion of the memory. In general, hardware platforms are configured with at least 32MB of memory, with Windows NT at least 16MB occupied. The 1 simple rule is to add 100KB of RAM to each concurrent user. For example, if you have 100 concurrent users, you need to 32mb+100 the user *100kb=42mb memory at least, and the actual amount of usage needs to be adjusted according to the actual operation. It can be said that increasing the memory is the most economical way to improve the performance of the system.
2.3 Disk Subsystem
Designing 1 Good disk I/O systems is an important aspect of implementing a good SQL Server scenario. The disk subsystem discussed here has at least 1 disk control devices and one or more hard disk units, as well as disk setup and file system considerations. The smart SCSI-2 disk controller or disk group controller is a good choice, with the following features:
(1) Controller cache.
(2) The bus board has a processor, can reduce the system CPU interruption.
(3) asynchronous read and write support.
(4) 32-bit RAID support.
(5) Fast SCSI-2 drive.
(6) Advanced read cache (at least 1 tracks).
3 Search Strategy
After carefully selected hardware platform, and achieved 1 good database scheme, and have the user needs and application knowledge, now should design the query and index. There are 2 aspects that are important for good query and indexing performance on SQL Server, the 1th is to generate queries and indexes based on knowledge of the SQL Server optimizer, and the 2nd is to leverage the performance characteristics of SQL Server to enhance data access operations.
3.1 SQL Server Optimizer
The Microsoft SQL Server database kernel automatically optimizes data query operations submitted to SQL with 1 cost-based query optimizer. Data manipulation queries are queries that support the SQL keyword where or having, such as SELECT, Delete, and update. Cost estimates of the cost-based query optimizer generating clauses based on statistical information.
An easy way to understand the optimizer's data processing process is to detect the output of the SHOWPLAN command. If you use a character-based tool (such as isql), you can get the output of the SHOWPLAN command by typing show SHOWPLAN on. If you use a graphical query, such as a query tool or isql/w in SQL Enterprise Manager, you can set configuration options to provide this information.
SQL Server optimization is done through 3 phases: Query analysis, index selection, and merge selection.
1. Query analysis
During the query analysis phase, the SQL Server optimizer looks at each clause represented by a regular query tree and determines whether it can be optimized. SQL Server generally tries to optimize the clauses that restrict scanning. For example, search and/or merge clauses. But not all legitimate SQL syntax can be grouped into an optimized clause, such as a clause that contains the SQL inequality character "<>". Because "<>" is a 1-repulsive operator, rather than a sex-containing operator, it is not possible to determine how much of a clause is selected before it is scanned across the table. When 1 relational queries contain a clause that is not optimized, the execution plan uses a table scan to access this part of the query, and the optimizer performs an index selection for the optimized SQL Server clause in the query tree.
2. Index selection
For each of the optimized clauses, the optimizer looks at the database system tables to determine if there are related indexes that can be used to access the data. This index is considered useful only if the 1 prefixes of the columns in the index exactly match the columns in the query clause. Because indexes are constructed according to the order of the columns, matching is required to be an exact match. For clustered indexes, the original data is sorted according to the indexed column order. You want to access the data in the secondary column of the index, just as you would find all the entries in the phone book for a last name, the sort is basically useless because you still have to look at each row to see if it meets the criteria. If 1 clauses have an available index, the optimizer determines the selectivity for it.
Therefore, in the design process, according to the query design criteria to carefully check all the query, query optimization features as the basis for the design of the index.
(1) The comparatively narrow index has high efficiency. For a narrower index, more index rows can be stored on each page, and the index level is lower. As a result, more index pages can be placed in the cache, which also reduces I/O operations.
(2) The SQL Server optimizer can analyze a large number of indexes and merge possibilities. As a result, more narrow indexes can provide more choices to the optimizer than fewer wide indexes. But do not keep unnecessary indexes because they increase the cost of storage and maintenance. For composite, combined, or multiple-column indexes, the SQL Server optimizer retains only the distribution statistics of the most important columns, so that the 1th column of the index should be highly selective.
(3) Too many indexes on the table can affect the performance of update, insert, and delete because all indexes must be adjusted accordingly. In addition, all paging operations are recorded in the log, which also increases I/O operations.
(4) Indexing of 1 frequently updated columns can have a severe impact on performance.
(5) Because of storage expenses and I/O operation reasons, a smaller group index has a better index performance. But its disadvantage is to maintain the columns from the group.
(6) Try to analyze the frequency of each important query, so that you can find the most used indexes, and then you can first of these indexes appropriate optimization.
(7) Any column in the WHERE clause in the query is likely to be an indexed column because the optimizer focuses on this clause.
(8) Indexing smaller tables smaller than 1 is not cost-effective because table scans tend to be faster and less expensive for small tables.
(9) A column that is used in conjunction with "ORDER BY" or "GROUP by" is generally suitable for a split-family index. If there is a clustered index on the column used in the order by command, then no more than 1 worksheets will be generated because the rows are sorted. The GROUP by command must produce 1 worksheets.
(10) A clustered index should not be constructed on frequently changing columns, as this can cause the entire row to move. This is especially true when implementing large transaction processing systems, where data tends to change frequently.
3. Merging options
When the index selection ends and all clauses have a processing fee based on their access plan, the optimizer begins to perform the merge selection. The merge selection is used to find a valid order for merging clause access plans. To do this, the optimizer compares the different sorts of clauses and then selects the consolidation plan with the lowest processing cost from the point of view of physical disk I/O. Because the number of clause combinations grows exponentially with the complexity of queries, the SQL Server query optimizer uses tree pruning techniques to minimize the cost of these comparisons. When the merge selection phase ends, the SQL Server query optimizer has generated 1 cost-based query execution plans that take advantage of the available indexes and access the original data with minimal system overhead and good execution performance.
3.2 Efficient Query Selection
From the 3 phases of query optimization above, it is easy to see that the design of physical I/O and logical I/O and the balance between processor time and I/O time are the main goals of efficient query design. In other words, you want to design a query that takes advantage of the index, the least disk read-write, and the most efficient use of memory and CPU resources.
The following recommendations are summarized from the SQL Server Optimizer's optimization strategy and are useful for designing efficient queries.
1. If there is a unique index, then the WHERE clause with the "=" operator has the best performance, followed by the closed interval (range), and then the open interval.
2. From the perspective of database access, a WHERE clause with discontinuous connectors (or and in) generally does not perform well. Therefore, the optimizer may adopt the R policy, which generates 1 worksheets that contain each of the executed identifiers that might match, and the optimizer sees the row markers (page numbers and line numbers) as "dynamic Indexes" that point to the matching rows in the 1 tables. The optimizer simply scans the worksheet, takes out each row marker, and then obtains the corresponding row from the datasheet, so the cost of the R policy is to generate the worksheet.
3. Contains not, <>, or! = The WHERE clause is not useful for the optimizer's index selection. Because such clauses are repulsive rather than inclusive, it is not possible to determine the selectivity of clauses until the entire original data table is scanned.
4. Restricting data conversions and string operations, the optimizer generally does not generate index selections based on expressions and data conversions in the WHERE clause. For example:
Paycheck * 12>36000 or substring (lastname,1,1) = "L"
If the table establishes an index for paycheck and LastName, it cannot be optimized with indexes, and the above conditional expression can be rewritten as follows:
PAYCHECK<36000/12 or LastName like "l%"
The local variables in the 5.WHERE clause are considered not to be known and considered by the optimizer, except for variables defined as input parameters to the reserve procedure.
6. If you do not have an index that contains a merge clause, the optimizer constructs 1 worksheets to hold the rows in the smallest table in the merge. You then construct 1 clustered indexes on this table to complete an efficient merge. The cost of this approach is the production of worksheets and the subsequent generation of a reformatting index, which is called a process. You should pay attention to the size of the database tempdb in RAM or on disk (except for the SELECT INTO statement). Also, if these types of operations are common, putting tempdb in RAM is good for performance improvement.
4 Other Considerations for performance optimization
The main factors that affect SQL Server are listed above, and are actually much more than that. The operating system is also heavily impacted, and options such as file system selection, network protocol, open service, and SQL Server priority have also affected SQL Server performance in varying degrees under Windows NT.
The factors that affect performance are so many, and the application is different, it is unrealistic to find 1 general optimization schemes, which must be adjusted continuously in the process of system development and maintenance. In fact, most of the tuning and tuning work is done on a server that is independent of the client, and is therefore practical.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.