The advantage of parallel query is that it can process query jobs through multiple threads, thus improving the query efficiency. The SQL Server database provides the parallel query function for database servers with multiple CPUs to optimize the performance of query jobs. That is to say, as long as the database server has multiple CPUs, the database system can use multiple operating system processes to execute query operations in parallel to accelerate query tasks.
1. Perform parallel query in three steps.
Parallel query jobs are mainly performed in the database in three steps.
First, the database determines whether parallel queries are required. There is a query optimizer in the database that optimizes the SQL statements before the database executes the query statements. When the queryer optimizes the SQL statements, one of the actions is to determine whether to optimize the SQL statements. That is to say, not all SQL query statements can obtain benefits from parallel queries. If the query optimizer deems that the query statement can get the benefit from the parallel query, it inserts the exchange operator into the query execution plan to prepare for the parallel query. Therefore, you do not need to concern the database administrator about the statements that require parallel queries and those that do not. The database query optimizer will help the Administrator make this decision.
The database administrator must be clear about the situations in which the database SQL optimizer determines that parallel queries are not suitable. Generally, no parallel query is executed if any of the following conditions is met.
First, for specific queries, the query optimizer considers the execution plan of the serial query to be faster than any possible parallel execution plan.
Second, the serial execution cost of queries is not high, and parallel queries are not required.
Third, the query contains scalar operators or Relational operators that cannot be run in parallel.
From the perspective of the database administrator, the third condition has the greatest impact on us. When the database is expected to improve the database performance by using parallel queries in the future, you should avoid using operators that cannot be used in the parallel query function during database design. Some Relational operators or logical operators may require that the query plan be performed in serial mode, or some of them must be performed in serial mode. In this case, the query optimizer will not use the parallel query function to improve the query statement performance. This is a detailed issue that must be taken into account by the database administrator during database design.
Secondly, determine the number of parallel processes. When the query optimizer inserts a crossover operator in a query statement, the database executes parallel queries. Parallel query can use multiple threads to execute a plan. Now, another problem occurs. How many process operations will the database divide the query job? In this case, the database administrator needs to know what the degree of parallelism is. Actually. When processing parallel queries, the data needs to know the maximum processes available and actually used processes. The maximum available process is the degree of parallelism.
The value of this degree of parallelism is set at the server level, or can be modified through the system stored procedure. However, the maximum number of processes that can be used is not necessarily equal to the actual number of processes used. The actual number of processes is determined when the database initializes the query plan. That is to say, no additional settings are required for the database administrator. The database system automatically determines a reasonable number of processes based on the complexity of the plan. Of course, the actual number of processes used cannot exceed the degree of parallelism, that is, the maximum number of processes that can be used.
Finally, execute the query. After the preceding content is determined, the database executes the specific query statement. In this step, pay attention to one problem. The database administrator can also specify the MAXDOP query prompt in the query statement to modify the progress value. That is to say, if the database administrator of a query job thinks it may take a long time, you can set a large progress value for the query job. When you use the MAXDOP query prompt to set the parallel progress value, it overwrites the preset default value. This allows you to set additional progress values for a single query statement to improve the performance of some special query jobs.
2. Notes for parallel queries.
Note 1: Pay attention to hardware restrictions.
Parallel query is a powerful way for databases to improve query performance. However, it is often subject to relatively large constraints. For example, some of the above improvements are based on cost considerations and there are some hard restrictions. For example, in normal cases, the database will consider executing parallel queries only when the database server has multiple microprocessor (CPU. That is, only computers with multiple CPUs can use parallel queries. This is a hard constraint.
In addition, during the execution of the query plan, the database will determine whether there are enough threads available at that time. Each query operation requires a certain number of threads for execution, and execution of parallel plans requires more threads than execution of serial plans, the number of required threads will also increase with the increase of the degree of parallelism. If the database server does not have enough threads to use the parallel plan during the execution of the parallel plan, the database engine will automatically reduce the degree of parallelism and even discard the parallel query and change it to the serial plan. Therefore, whether a database can execute parallel queries is subject to hardware restrictions. To this end, if enterprises really need to improve database performance through parallel queries, the administrator needs to adjust the hardware configuration as needed.
NOTE 2: Parallel queries are not recommended for all queries.
Under normal circumstances, I think it is best to apply the parallel query function only to join queries for large tables, aggregate operations for a large amount of data, and repeated sorting operations for large result sets. If you execute parallel queries on these operations, the database performance improvement is very obvious. On the contrary, if parallel queries are executed for simple queries, the extra coordination work required for parallel queries may be greater than the potential performance improvement. Therefore, the database administrator should be careful when determining whether to execute the parallel query function.
I suggest that you do not set parallel queries at the database server level. Set the degree of parallelism to 1 or a smaller value. Then, for some special query operations, use the MAXDOP query prompt to set the maximum number of processes that can be used. In this case, it may be more reasonable. If the Database Administrator does not know whether to use the parallel query function, the database administrator can determine whether to use the statistical function provided by the database. To determine whether a parallel query plan has benefited from parallel queries, the database engine can compare the estimated overhead of executed queries with the overhead threshold value of parallel queries. A parallel plan is generally more useful only for queries that take a long time, because its performance advantage will offset the additional time overhead required to initialize, synchronize, and terminate the parallel plan.
NOTE 3: the database determines whether to perform parallel queries based on the number of rows involved in the query.
As mentioned above, it is best to apply the parallel query function to join queries for large tables, aggregate operations for large amounts of data, repeated sorting of large result sets, and so on. Only in this way can the benefits of parallel queries exceed the cost. However, this does not mean that concurrent queries are suitable for connection queries, aggregate operations, sorting, and other jobs. When the database is considering a parallel query plan, the query optimizer determines the number of rows involved. If the number of rows involved is small, parallel query plans will not be considered. The query statement is executed in serial mode.
In this case, the overhead of startup, distribution, and coordination can be avoided, greatly exceeding the benefits of parallel execution of jobs. This is a good design, but it also brings some trouble to the database administrator. If the database administrator wants to test the extent to which parallel queries can affect query operations, it is a little troublesome. Because of its limited data size. If the database administrator needs to perform this test, he must first import enough data into the database system. This limits the test operations of the database administrator. However, this mechanism is still good. Because the Database Administrator does not need to consider how large the database is to use parallel queries.
Note 4: different processes are used for the same operation at different times.
As mentioned above, the number of processes used in parallel query is not only related to the complexity of operations, but also directly related to the current server status, such as whether there are enough processes. Therefore, the number of processes used for parallel query may vary with the data and operations at different times. The time required is different. Because the database engine collects the workload of the current system, such as the number of processes, and other configuration information, only when parallel queries are actually performed, then the database determines the optimal number of parallel processes.
The number of processes will be used from the start of the query to the end of the query job. If you want to continue the query next time, the database engine will continue to collect the information. At this point, if the system workload is improved, more processes may be used in the database to execute this query. Therefore, the query job has a higher performance. On the contrary, if the system load is heavier than the previous query, the database may use fewer processes to process the job. At this point, the second query is slower. Therefore, if other applications are deployed on the database server at the same time, the amount of system resources occupied by other applications will have an unpredictable impact on parallel execution.