The advantage of parallel query is that it can handle query job through multiple threads, which improves the efficiency of query. The SQL Server database provides a parallel query for database servers with multiple CPUs to optimize the performance of query jobs. That is, as long as the database server has more than one CPU, the database system can use multiple operating system processes to perform query operations in parallel to expedite the completion of query jobs.
First, the parallel query three steps away.
Parallel query jobs in the database, mainly through three steps.
First, the database determines whether parallel queries are required. There is a query optimizer in the database that optimizes the SQL statement before the database executes the query statement. And this query in the query optimization of SQL statements, one of the actions is to determine whether the SQL statements need to query optimization. In other words, not all SQL query statements can derive benefits from parallel queries. If the query optimizer considers that a query statement can derive revenue from a parallel query, the exchange operator is inserted into the query execution plan to prepare for the parallel query. So which statements need to use parallel queries, which do not need, this does not concern the database administrator. The database query optimizer helps the administrator make this decision.
The database administrator needs to be clear about the circumstances in which the database SQL Optimizer considers it inappropriate to use parallel queries. In general, parallel queries are not executed as long as any one of the following conditions is true.
First, for a particular query, the query optimizer considers the serial query execution plan to be faster than any possible parallel execution plan.
The second is that the serial execution cost of the query is not high and no parallel query is needed.
Third, the query contains scalar operators or relational operators that cannot run in parallel.
From the database administrator's point of view, the third condition has the greatest impact on us. When the database is expected to use parallel queries to improve database performance in the future, you need to be careful when designing your database to avoid using operators that cannot be used in parallel query capabilities. Because some relational operators or logical operators may require that the query plan be made in serial mode, or that it be partially done in serial mode. In this way, the query optimizer does not use the parallel query function to improve the performance of query statements. This is a detail issue that the database administrator must take into account when designing the database.
Second, determine the number of concurrent processes. When the query optimizer inserts a crossover operator in a query statement, the database executes a parallel query. Parallel queries can use multiple threads when executing a plan. At this point, there is a problem, the database will be the query job into several process operations? At this point, the database administrator needs to know what is called the degree of parallelism. In fact. When dealing with parallel queries, the data needs to know the most available processes and the processes that are actually being used. The most accessible process is called parallelism.
The value of this degree of parallelism is set at the server level, or it can be modified through system stored procedures. However, the maximum number of processes that can be used is not necessarily equal to the actual number of processes. The actual number of processes is determined by the time the database is initialized when the query plan is executed. In other words, this does not need to be a database administrator to add additional settings. The database system automatically determines the reasonable number of processes based on the complexity of the plan. Of course, the actual number of processes used can not exceed the degree of parallelism, that is, the maximum number of processes can be used.
The last query is executed. When the above content is confirmed, the database executes the specific query statement. In this step, you need to be aware of a problem. The database administrator can also modify this progress value by specifying MAXDOP query hints in the query statement. That is, if a query job database administrator thinks it may take a long time, you can set a larger progress value for the query job. When this parallel progress value is set by using the MAXDOP query hint, it overrides the preset default value. This enables the implementation of additional progress values for individual query statements to improve the performance of some special query jobs.
Second, the content needing attention in the parallel query.
Note Point one: you need to be aware of hardware limitations.
Parallel query is a powerful measure to improve query performance of database. However, it is often subject to greater constraints. As with some of the cost considerations raised above, there are some hard limits. In general, the database will only consider executing parallel queries if the database server has more than one microprocessor (CPU). That is, only computers with multiple CPUs can use parallel queries. This is a rigid constraint.
In addition, during the execution of the query plan, the database also determines whether there are enough threads available at that time. Each query operation requires a certain number of threads to execute, and a parallel plan requires more threads than executing the serial plan, and the number of threads required increases with parallelism. If the database server does not have enough threads to use the parallel plan while the parallel plan is executing, the database engine automatically reduces the degree of parallelism and even discards the parallel query to the serial plan. Therefore, whether a database can execute parallel queries is limited by its hardware. For this reason, if the enterprise really needs to improve the performance of the database through parallel queries, then the administrator will need to adjust the hardware configuration according to the situation.
Note Point two: it is not recommended to use parallel queries for all queries.
Usually, the author thinks it is best to apply the function of parallel query only to the connection query of large table, the aggregation operation of large amount of data, the repeated sorting of large result set and so on. If parallel queries are executed on these operations, the effect of improving database performance is obvious. Conversely, if parallel queries are executed for simple queries, the additional coordination required to perform parallel queries is greater than the potential performance boost. Therefore, database administrators need to be cautious in determining whether they need to perform parallel query functions.
The author's suggestion is that at the database server level, it is best not to set up parallel queries. That is, set the degree of parallelism to 1 or a smaller value. Then for some special query operations, use the MAXDOP query hint to set the maximum number of available processes. If so, it may be more reasonable. If sometimes the database administrator does not know whether to use the parallel query function, it can be judged by the statistical function of the database itself. To distinguish whether parallel query plans benefit from parallel queries, the database engine can compare the estimated cost of executing a query with the cost thresholds of a parallel query. A parallel plan is usually more useful for queries that require longer time, because its performance benefits offset the extra time overhead required to initialize, synchronize, and terminate a parallel plan.
Note Three: The database determines whether to query in parallel according to the number of rows involved in the query.
As mentioned above, it is best to connect queries to large tables, aggregate operations on large amounts of data, repeat sorting of large result sets, and so on to apply the function of parallel queries. Because of this, the benefits of parallel queries outweigh the costs. However, it is not said that connection query, aggregation operation, sorting and other jobs are suitable for parallel query. When the database is considering parallel query plans, the query optimizer also determines the number of rows involved. If the number of rows involved is small, the parallel query plan will not be considered. The query statement is executed in a serial manner.
In this way, you can avoid the cost of starting, distributing, and coordinating much more than the benefits of running a job in parallel. This would have been a good design, but it will also bring some trouble to the database administrator. If the database administrator wants to test the extent to which the parallel query can affect the query operation, there is a problem. Because it has a limited amount of data. If the database administrator needs to do this test, you will have to import enough data into the database system before you can. This limits the database administrator's test actions. But then again, the mechanism is still good. Because the database administrator does not have to consider, when the size of the database to the extent of the use of parallel queries.
Note Four: The same operation takes a different number of processes at different times.
As mentioned above, parallel queries to the number of processes used in addition to the complexity of the operation, but also directly related to the state of the server at that time, such as whether there is enough process number and so on. So, at different times, even the same data, the same operations, the number of processes used in parallel queries may also be different. And the time it takes is different. Because the database engine collects the workload of the current system, such as the number of processes, and other configuration information, then the database determines the optimal number of concurrent processes only when the parallel query is actually in progress.
From the beginning of the query to the end of the query job, the number of processes will always be used. The database engine continues to collect this information the next time you want to continue the query. At this point, if the system workload improves, more processes may be used in the database to execute the query. Therefore, the query job performance will be even higher. Conversely, if the load on the system is heavier than the previous query, the database might take a smaller process to handle the job. At this point, the second query is slower. Therefore, if other applications are deployed in the database server at the same time, the amount of system resources consumed by other applications can have an unpredictable impact on parallel execution.