Designing an application system does not seem difficult, but it is not easy to optimize the system performance. There are multiple options in terms of development tools, database design, application structure, query design, and interface selection, depending on the specific application requirements and the skills of the development team. This article takes SQL Server as an example to discuss application performance optimization techniques from the perspective of backend databases, and provides some useful suggestions.
1. Database Design
To achieve optimal performance in a good SQL Server solution, the most important thing is to have a good database design solution. In practice, many SQL server solutions tend to suffer poor performance due to poor database design. Therefore, to achieve a good database design, you must consider these issues.
1.1 logic library Standardization
In general, the logical database design meets the first three standards of standardization:
1. 1st specification: No repeated group or multi-value columns.
2. 2nd specification: each non-Keyword segment must depend on the primary keyword and cannot depend on some components of a combined primary keyword.
3. 3rd specification: one non-Keyword segment cannot depend on the other non-Keyword segment.
The design that complies with these rules produces fewer columns and more tables, which reduces data redundancy and data storage pages. However, the table relationship may need to be processed through complicated merging, which will reduce the system performance. To some extent, non-standardization can improve the performance of the system. The non-standardization process can be performed in a variety of different ways based on different performance considerations. However, the following methods have been verified to improve the performance.
1. If the Normalization Design produces many 4-or more-channel merging relationships, you can consider adding duplicate attributes (columns) to the database entity (table ).
2. Common calculated fields (such as total and maximum values) can be stored in database entities.
For example, a project's plan management system has a schedule, which includes the project number, the plan for the beginning of the year, the second plan, the adjustment plan, and the supplementary plan ..., The total number of plans (year-end plan + secondary plan + adjustment plan + column-filling plan) is frequently used by users in queries and reports. When the number of records in a table is large, it is necessary to add the total number of plans to the table as an independent field. A trigger can be used to maintain data consistency on the client.
3. Redefine entities to reduce spending on external attribute data or row data. The corresponding nonstandard types are:
(1) divide one entity (table) into two tables (all attributes are divided into two groups ). This separates frequently accessed data from less accessed data. This method requires that the primary keyword be copied in each table. This design facilitates parallel processing and generates tables with fewer columns.
(2) divide one entity (table) into two tables (all rows are divided into two groups ). This method applies to entities (tables) that will contain a large amount of data ). History is often retained in applications, but it is rarely used. Therefore, we can separate frequently accessed data from less accessed historical data. In addition, if the data row is accessed as a subset by a logical Working Group (department, sales partition, geographic region, etc.), this method is also very beneficial.
1.2 generate a physical database
To correctly select basic physical implementation policies, you must understand the Database Access format and operating features of hardware resources, mainly memory and disk subsystem I/O. This is a broad topic, but the following guidelines may be helpful.
1. Data Types related to each table column should reflect the minimum storage space required for data, especially for indexed columns. For example, if you can use the smallint type, do not use the integer type. In this way, index fields can be read more quickly and more data rows can be placed on one data page, therefore, I/O operations are reduced.
2. Put a table on a physical device, and then put its clustered index on a different physical device through the SQL Server segment, which improves performance. Especially when the system uses multiple smart disk controllers and data separation technologies, the benefits of doing so are even more obvious.
3. Use the SQL Server segment to split a frequently used large table and place it on two separate smart disk controller database devices. This can also improve performance. Because multiple heads are searching, data separation can also improve performance.
4. using SQL Server segments to store text or image column data on a separate physical device improves performance. A dedicated intelligent controller can further improve performance.
2. SQL Server-related hardware systems
The hardware design related to SQL Server includes the system processor, memory, disk subsystem, and network. These four parts basically constitute the hardware platform, where Windows NT and SQL Server Run on them.
2.1 system processor (CPU)
The process of determining the CPU Structure Based on your specific needs is the process of estimating the CPU usage workload on the hardware platform. From past experiences, the CPU configuration should be at least one 80586/100 processor. If only 2 ~ Three users, this is enough, but if you plan to support more users and key applications, we recommend using Pentium Pro or pⅱ CPU.
2.2 memory (RAM)
Determining appropriate memory settings for the SQL Server solution is crucial for achieving good performance. SQL Server uses memory for process caching, data and index item caching, static server spending, and set spending. SQL Server can use up to 2 GB of virtual memory, which is also the maximum setting value. Another thing to consider is that Windows NT and all its related services also occupy memory.
Windows NT provides a 4 GB virtual address space for each Win32 application. This virtual address space is mapped from the Windows NT virtual memory manager (vmm) to the physical memory, which can reach 4 GB on some hardware platforms. SQL Server applications only know virtual addresses, so they cannot directly access physical memory. This access is controlled by vmm. Windows NT allows the generation of virtual address space that exceeds the available physical memory. In this way, when the virtual memory allocated to SQL Server exceeds the available physical memory, the performance of SQL Server is reduced.
These address spaces are specially set for the SQL server system. Therefore, if other software (such as file and print sharing and application service) is running on the same hardware platform, consider that they also occupy part of the memory. Generally, the hardware platform requires at least 32 MB of memory, of which, Windows NT requires at least 16 Mb. A simple rule is to increase the memory size by kb for each concurrent user. For example, if there are 100 concurrent users, at least 32 MB + 100 users * kb = 42 MB memory is required. The actual usage needs to be adjusted according to the actual running conditions. It can be said that increasing memory is the most economical way to improve system performance.
2.3 Disk Subsystem
Designing a good disk I/O system is an important aspect of implementing a good SQL Server solution. The disk subsystem discussed here has at least one disk control device and one or more hard disk units, as well as considerations for disk settings and file systems. Intelligent SCSI-2 disk controller or disk group controller is a good choice, its features are as follows:
(1) Controller High-speed cache.
(2) The bus motherboard has a processor, which can reduce the interruption to the system CPU.
(3) asynchronous read/write support.
(4) 32-bit raid support.
(5) Fast SCSI-2 drive.
(6) Advanced read cache (at least 1 track ).
3. Search Policy
After carefully selecting the hardware platform and implementing a good database solution, and having the knowledge of user needs and applications, we should design queries and indexes. Two aspects are important for obtaining good query and index performance on SQL Server. 1st is to generate query and index based on knowledge of SQL Server optimizer; 2nd is to enhance data access operations by taking advantage of the performance characteristics of SQL Server.
3.1 SQL Server Optimizer
The Microsoft SQL Server database kernel uses a cost-based query optimizer to automatically optimize the data query operations submitted to SQL. Data operation query is a query that supports SQL keywords where or having, such as select, delete, and update. The cost-based Query Optimizer estimates the cost of the clause generated based on the statistical information.
An easy way to understand the optimizer's data processing process is to check the output of the showplan command. If you use a character-based tool (such as iSQL), you can enter show showplan on to get the output of the showplan command. If you use graphical queries, such as SQL Enterprise Manager's query tool or iSQL/W, you can set configuration options to provide this information.
SQL Server is optimized in three phases: query analysis, index selection, and merge selection.
1. query and Analysis
In the query and analysis phase, the SQL Server optimizer checks every clause represented by a regular query tree and determines whether it can be optimized. SQL Server usually tries its best to optimize the clauses that restrict scanning. For example, search and/or merge clauses. However, not all valid SQL syntaxes can be divided into optimized clauses, such as clauses containing SQL unequal relational characters "<>. Because "<>" is an exclusive operator, rather than an inclusive operator, it is impossible to determine the scope of the clause selection before scanning the entire table. When a relational query contains an unoptimized clause, the execution plan uses a table scan to access this part of the query. For the SQL Server clause that can be optimized in the query tree, then the optimizer executes the index selection.
2. index selection
For each optimized clause, the optimizer checks the database system table to determine whether there are related indexes that can be used to access data. This index is considered useful only when the first prefix of a column in the index exactly matches the column in the query clause. Because the index is constructed based on the column sequence, exact matching is required. For clustered indexes, the original data is also sorted in the order of index columns. If you want to access data using secondary indexed columns, just like you want to find all the entries with the last name as a last name in the phone book, sorting is basically useless, because you still have to check each row to determine whether it meets the conditions. If one clause has an available index, the optimizer determines its selectivity.
Therefore, during the design process, you should carefully check all the queries based on the query design principles, and design indexes based on the query optimization features.
(1) narrow indexes have high efficiency. For narrow indexes, a large number of index rows can be stored on each page, and the index level is also small. Therefore, more index pages can be stored in the cache, which also reduces I/O operations.
(2) the SQL Server optimizer can analyze a large number of indexes and the possibility of merging. Therefore, compared with a small number of wide indexes, a large number of narrow indexes can provide more choices for the optimizer. However, do not keep unnecessary indexes because they will increase the cost of storage and maintenance. For composite indexes, composite indexes, or multi-column indexes, the SQL Server optimizer only retains the distribution statistics of the most important columns. In this way, the 1st columns of the index should be highly selective.
(3) Too many indexes on the table will affect the performance of update, insert, and delete, because all indexes must be adjusted accordingly. In addition, all paging operations are recorded in logs, which also increases I/O operations.
(4) Creating an index for a column that is frequently updated will seriously affect the performance.
(5) because of storage costs and I/O operations, smaller self-configured indexes provide better index performance. However, it has the disadvantage of maintaining self-organized columns.
(6) try to analyze the usage frequency of each important query, so that you can find the most frequently used index, and then optimize these indexes first.
(7) Any column in The WHERE clause in the query may be an index column, because the optimizer focuses on this clause.
(8) It is not cost-effective to index small tables smaller than one range, because table scanning is often faster and cost-effective for small tables.
(9) columns used with "order by" or "group by" are generally suitable for family-based indexing. If the Column Used in the "Order by" command has a clustered index, one worksheet will not be generated because the rows have been sorted. The "group by" command must generate one worksheet.
(10) Clustering indexes should not be constructed on frequently changing columns, because this will lead to moving of the entire row. Pay special attention to this when implementing large-scale transaction processing systems, because the data in these systems often changes frequently.
3. Merge and select
When the selection of indexes ends and all clauses have a processing fee based on their access plan, the optimizer starts to merge the selection. Merge selection is used to find an effective sequence used to merge clause access plans. To achieve this, the optimizer compares different sorting clauses and then selects the merge plan with the lowest processing cost from the perspective of physical disk I/O. Because the number of clause combinations increases extremely quickly as the query complexity increases, the SQL Server Query Optimizer uses the Tree Pruning technology to minimize the cost of these comparisons. At the end of the merge selection stage, the SQL Server Query Optimizer has generated a cost-based query execution plan, which makes full use of available indexes, and access the original data with minimal system expenditure and good execution performance.
3.2 efficient query Selection
From the three phases of the above query optimization, it is not difficult to see that the solution with the least physical I/O and logical I/O is designed, and the balance between the processor time and I/O time is well grasped, it is the main goal of efficient query design. That is to say, we hope to design such a query: Make full use of the index, the disk reads and writes the least, and the most efficient use of memory and CPU resources.
The following suggestions are summarized from the optimization policies of the SQL Server optimizer, which is helpful for designing efficient queries.
1. If there is a unique index, the where clause with the "=" operator has the best performance, followed by closed intervals (ranges) and then open intervals.
2. From the perspective of database access, where clauses containing discontinuous connectors (or and in) generally do not have good performance. Therefore, the Optimizer may adopt the r policy, which will generate a worksheet containing each identifier that may match the execution. The optimizer will mark these row operators (page numbers and row numbers) it is regarded as a "Dynamic Index" pointing to the matched rows in one table ". The optimizer only needs to scan the worksheet, retrieve each row identifier, and then obtain the corresponding row from the data table. Therefore, the cost of the r policy is to generate a worksheet.
3. Contains not, <>, or! The where clause = is of no use for the optimizer's index selection. Because such clauses are exclusive rather than inclusive, the selectivity of clauses cannot be determined before scanning the original data table.
4. Restrict data conversion and string operations. The optimizer generally does not generate index selection based on the expressions and data conversion types in the WHERE clause. For example:
Paycheck * 12> 36000 or substring (lastname,) = "L" |
If the table has an index for paycheck and lastname, the index cannot be used for optimization. You can rewrite the above conditional expression:
Paycheck <36000/12 or lastname like "L %" |
5. The local variables in the WHERE clause are considered not to be known and taken into consideration by the optimizer. The exception is the variables defined as input parameters in the reserve process.
6. If there is no Index containing the merge clause, the optimizer constructs a worksheet to store rows in the smallest table in the merge. Then, create a clustered index on the table to complete an efficient merge. The cost of this approach is the worksheet generation and the generation of the next sub-family index. This process is called reformatting.
Therefore, pay attention to the size of the database tempdb in Ram or on the disk (except the select into Statement ). In addition, if these types of operations are common, putting tempdb in Ram is very good for improving performance.
4. Other considerations for Performance Optimization
The main factors that affect SQL Server are listed above. The impact of the operating system is also great. in Windows NT, the selection of file systems, network protocols, enabled services, and SQL Server priority also affect SQL server performance to varying degrees. There are so many factors that affect performance, and the applications are different. It is unrealistic to find a general optimization solution, in the process of system development and maintenance, you must constantly adjust the running conditions. In fact, most of the optimization and adjustment work is performed on servers independent from the client, so it is also feasible.