Recently, a project has been made to change all tables and indexes under the entire Data Warehouse platform to page-level data compression. A workload in the test environment was found to be running slower than usual yesterday. In the end, we have positioned the workload to the point where the performance may be degraded, in fact, to locate an SQL statement. This statement is a merge statement. We copy out the other two tables (the target table in the merge statement). So we have three tables: one is page data_compression, one is row data_compression, the last one is the None data_compression. The move is to get the overall performance statistics.
The data volume for the table is more than 23 million, and the clustered index plus four nonclustered indexes. The server has 58 processing cores and 256G of memory.
No. |
Nondcp |
DCP (PAGE) |
DCP (ROW) |
Duration |
CPU time |
Elapse time |
Duration |
CPU time |
Elapse time |
Duration |
CPU time |
Elapse time |
(MM:SS) |
(MS) |
(MS) |
(MM:SS) |
(MS) |
(MS) |
(MM:SS) |
(MS) |
(MS) |
1 |
2:03 |
176,442 |
123,660 |
4:18 |
1,122,304 |
257,814 |
5:03 |
1,067,797 |
303,248 |
2 |
1:05 |
217,218 |
65,448 |
5:56 |
1,093,818 |
355,671 |
4:45 |
1,056,553 |
285,461 |
In a situation where everyone is affecting 448211 rows of records
The result is that a table with no compression takes only 1-2 minutes to complete the process, while page and row compression takes 4-6 minutes to complete the task. I was amazed at the big difference. After all, the data is the same. So to find root cause, I need to borrow two queries to collect data from the running process, and as much as possible to derive an actual execution plan to analyze whether it will cause significant changes in the execution plan due to compression.
--bring the thread number of the running statement into the wordsSELECTStatus asStat* fromSys.sysprocesseswherespidinch(314,563,1090)--call Sys.dm_os_waiting_tasks to tell us exactly what resources the running process is waiting for.SELECTdm_ws.wait_duration_ms,dm_ws.wait_type,dm_es.status,dm_ws.session_id,dm_es.cpu_time,dm_es.memory_usage,dm_ Es.logical_reads,dm_es.total_elapsed_time,dm_es.program_name,db_name(dm_r.database_id) DatabaseName,--Optional ColumnsDm_ws.blocking_session_id,dm_r.wait_resource,dm_es.login_name,dm_r.command,dm_r.last_wait_type fromsys.dm_os_waiting_tasks Dm_wsINNER JOINSys.dm_exec_requests Dm_r ondm_ws.session_id=dm_r.session_idINNER JOINSys.dm_exec_sessions Dm_es ondm_es.session_id=dm_r.session_idWHEREDm_es.is_user_process= 1 anddm_ws.session_idinch(314,563,1090)ORDER byWait_duration_msDESC
The whole process I see is that most of the thread wait state type is Cxpacket, followed by Sos_scheduler_yield, and Pageiolatch_sh. The maximum number of parallel threads is 35-40 threads.
The execution plan looks similar to everybody, the graph execution plan shows that 97% of the overhead is spent on the clustered Index merge operator
It seems that the execution plan does not give too much useful information or the problem itself is not in the execution plan.
I turned my mind to the data I collected earlier and imagined the wait for some kind of resource. I suspect that the focus is on CPU resources, because the difference between several tables is whether the data page is compressed. Compression technology can actually be understood as CPU consumption in exchange for IO and memory pressure.
From the sys.sysprocesses system view, the page compressed table corresponding to the merge statement starts when the 43 threads are running, where you can see that there are nearly 10 threads in runnable and the previous wait state is Sos_scheduler_yield, In this case, many threads may be in a cxpacket wait state. I checked the server. Max Dop is 8. Assuming that the entire execution plan is now operating in a certain step, SQL Server gives it 8 threads, so if one of the threads is in front of runnable, it means that the remaining 7 threads may be waiting for it to complete the task. Because its last wait state is Sos_scheduler_yield, it shows that the SCHEDULER that was originally assigned to it is given to a queue of SQL SERVER Thread/worker that is in the runnable state. This time it goes back to the waiting queue inside the suspended state, waiting to be moved to the runnable queue, and for a short time all the other parallel threads have to wait patiently if they are done with their things. This can be clearly seen from the waittime column of the sys.sysprocesses system view.
That from the other view sys.dm_os_waiting_tasks can be seen, in fact, very multithreaded are out in the cxpacket waiting state.
Cxpacket itself is not a problem. However, like this example, I still have the heart of this idea, the return is Max Dop set low?
So I went through an article I wrote a few days ago to find statistics on the current wait time and type of SQL Server: http://www.cnblogs.com/jenrrychen/p/4610231.html
The results showed that Cxpacket and Sos_scheduler_yield ranked 2 and 3, of course, because this environment is a test environment, the data may not be so reliable. So I went to the pre-production environment to check the next. They were both in the top two. Cxpacket accounted for 70% of the waiting time.
Results in the test environment
Results in pre-production environment
So my first thought is, what happens if I add option (maxdop=12) to this merge statement?
The result is not what I expected, whether it is row COMPRESSION or page COMPRESSION, the running time from the original 4-6 minutes reduced to 1 minute 30 seconds or so.
This value is obtained after many tests to feel more appropriate. Testing 12, 16, 24, 32, in fact, performance is not because of the larger number of parallel threads, the better. More stable.
Therefore, the decision to test the server-level MAXDOP from 8 to 12 to see the performance change of the entire Dataware House platform, observing the completion of workload to determine whether to apply to production.
Reference:
https://support.microsoft.com/en-us/kb/2806535
https://technet.microsoft.com/en-us/library/cc879317 (v=sql.105). aspx
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/01/05/ Why-do-i-see-more-threads-per-spid-in-sysprocesses-than-maxdop.aspx
SQL SERVER->> cxpacket wait type