VITAM POST mortem–analyzing deadlocked schedulers MINI DUMP from SQL SERVER

Source: Internet
Author: User
Tags server error log

https://gennadny.wordpress.com/2014/11/

Since SQL Server 7.0, SQL Server has its own scheduling mechanism, in SQL 7.0 and it is called UMS (User Mode schedu Ling) and later was renamed to SOS (SQL OS Scheduler).  Ums\sos is Non-preemptive\cooperative Scheduler, which means it relying on threads to voluntarily five up control–or yield –to the next thread waiting in line vs. preemptively taking control from the threads.

    • preemptive scheduling: The preemptive scheduling is prioritized. The highest priority process should always be the process, which is currently utilized.

    • non-preemptive scheduling: When a process enters the state of running, the state of that process is not deleted F Rom the scheduler until it finishes its service time.

That means, there is locations in the SQL Server code where the Microsoft developer built in yield points, causing Execution to ' pause ' and gracefully let another citizen ' in the SQL kingdom to exercise their right to execute. of  course, once a thread "leaves" SQL Server SOS Kingdom, that thread was still handed off to the Windows preemptive Scheduler. The goal of SOS is for SQL Server to expose-one  thread at a time per CPU and thus minimize competition among SQ L threads exposed to the OS. The ultimate goal is to reduce the very expensive Kernel-mode context switching from one thread to another.

Since SQL Server uses a cooperative scheduler, it relies on the good heart of all developer who  writes code in SQL Server to call a Yield () function of some kind that prevents the thread from monopolizing the CPU. That's doesn ' t always happen despite best intentions, moreover as there is calls to resources outside SQL Server (Disk IO b EST example) a SQL thread could is at the mercy of some external component. Because of that, the SOS scheduler have a dedicated Thread-scheduler Monitor–that periodically checks the  state of Each scheduler and reports any "irregularities". Examples of irregularities include a thread not yielding voluntarily (a non-yielding scheduler) or all schedulers is "Stu CK "Not processing any requests (deadlocked schedulers). Typically this  Scheduler Monitor thread would report problems every, seconds it though the monitors thread th At accepts connections every seconds. A very good white paper are actually written on SQL Server SchedulinG and can be found here– http://technet.microsoft.com/en-us/library/cc917684.aspx

So, deadlocked scheduler. A Deadlocked Scheduler scenario is reported if all SOS schedulers was reported to be "stuck". Technically, this means that since the last Scheduler Monitor Check–say about seconds earlier–nothing have changed And no work is being processed. Here are being checked by Scheduler Monitor:

    • Is there any tasks waiting to be processed? (check if sys.dm_os_schedulers.work_queue_count > 0)
    • Has any new worker threads been created since last check?
    • been processed since last check (i.e. are any new tasks been assigned to workers)?

If any of these conditions was met (not all), the Scheduler Monitor declares this schedulers is stuck–deadlocked. This is also somewhat NUMA dependent–sql Server can report deadlocked schedulers if all the SOS schedulers on a single NUMA node is stuck, even if other NUMA nodes is processing tasks just fine.

So away from theory to practice. As deadlocked scheduler condition is reported by scheduler Monitor SQL Server usually generates a mini dump and entry in E  Rror Log. Through the years I has received a fare amount of these dumps from customers for this condition, Non-yielding Scheduler,  Non-yielding IOCP Listener, etc. Here is some pointers on what to look at these.

The dumps generated by default for such condition is usually minidumps by SQL Server. A minidump file does not contain as much information as a full crash dump file, but it contains enough information to perf ORM Basic debugging operations. We can open this dump in Windows Debugger and granted we setup symbols correctly does some analysis on this dump. I won't go to the Windbg, symbol path to MS public symbol server, and other basics. However you can read excellent tutorial here–Http://www.codeproject.com/Articles/6084/Windows-Debuggers-Part-A-WinDbg-Tutorial and some resources here– http://msdn.microsoft.com/en-us/library/windows/desktop/ee416588 (v=vs.85). aspx,http://blogs.msdn.com/b/ Ntdebugging/archive/2008/08/28/basics-of-debugging-windows.aspx (Huge fan of ntdebugging blog myself)

So now I had a minidump and wonderful message in SQL Server Error Log like this:

2014-02-07 09:18:14.54 Server Using 'Dbghelp.dll ' version ' 4.0.5 ' 2014-02-07 09:18:14.55 Server **dump thread-spid = 0, EC = 0x00000000000000002014-02-07 09:18:14.55 Server ***stack Dump being sent to O:\MSSQL10_50.PRD_SQL_CLA_1\MSSQL\LOG\SQLDump0005.txt2014-02-07 09:18 : 14.55 Server * *******************************************************************************2014-02-07 09:18:14.55 Server *2014-02-07 09:18:14.55 server * BEGIN STACK dump:2014-02-07 09:18:14.55 Server * 02/0 7/14 09:18:14 spid 114162014-02-07 09:18:14.55 server *2014-02-07 09:18:14.55 server * deadlocked Schedulers2014 -02-07 09:18:14.55 Server *2014-02-07 09:18:14.55 Server * ***************************************************** 2014-02-07 09:18:14.55 Server *-------------------------------------------------------- -----------------------2014-02-07 09:18:14.55 Server * Short stack dump2014-02-07 09:18:14.61 server Stack Signa Ture for the dump is 0x00000000000002b4 

Once We open the dump in Debugger lets look for patterns. First lets take a look what we is working with here. Looks like we have threads

0:004> ~* 0 id:770.774 suspend:1 teb:7ffdd000 unfrozen start:sqlservr!wmaincrtstartup (024489f8) Prior ity:0 priority class:32 affinity:f 1 id:770.cc suspend:1 teb:7ffda000 unfrozen start:msvcr80!_threadstart EX (781329E1) priority:0 priority class:32 affinity:f 2 id:770.e0 suspend:1 teb:7ffd9000 unfrozen Sta Rt:msvcr80!_threadstartex (781329E1) priority:0 priority class:32 affinity:f 3 Id:770.e8 suspend:1 teb:7f fd8000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 affinity:f. 4 id:7   70.f0 suspend:1 teb:7ffd7000 unfrozen start:msvcr80!_threadstartex (781329e1) priority:1 priority class:32 Affinity:f 5 id:770.b8 suspend:1 teb:7ffd6000 unfrozen start:msvcr80!_threadstartex (781329e1) Priorit y:0 Priority class:32 affinity:f 6 id:770.108 suspend:1 teb:7ffd5000 unfrozen start:msvcr80!_threadstarte X (781329E1) Priority:0 Priority class:32 affinity:f 7 id:770.104 suspend:1 teb:7ffd4000 unfrozen start:msvcr80!_threadstart EX (781329E1) priority:0 priority class:32 Affinity:f 8 id:770.144 suspend:1 teb:7ffaf000 unfrozen St Art:msvcr80!_threadstartex (781329E1) priority:0 priority class:32 affinity:f 9 id:770.140 suspend:1 Teb:  7ffae000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 Affinity:f Id:  770.14c suspend:1 teb:7ffad000 unfrozen start:msvcr80!_threadstartex (781329E1) priority:0 Priority class: Affinity:f id:770.c0 suspend:1 teb:7ffac000 unfrozen start:msvcr80!_threadstartex (781329e1) Prio rity:0 priority class:32 affinity:f id:770.264 suspend:1 teb:7ffab000 unfrozen start:msvcr80!_threadsta      Rtex (781329E1) priority:0 priority class:32 affinity:f id:770.28c suspend:1 teb:7ffaa000 unfrozen Start:msvcr80!_threadstArtex (781329E1) priority:0 priority class:32 affinity:f id:770.2a8 suspend:1 teb:7ffa9000 unfrozen Start:msvcr80!_threadstartex (781329E1) priority:0 priority class:32 Affinity:f ID:770.2BC suspend:1 Te  b:7ffa8000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 Affinity:f 16 Id:770.1d4 suspend:1 teb:7ffa7000 unfrozen start:msvcr80!_threadstartex (781329E1) priority:0 priority CLA ss:32 affinity:f id:770.8ec suspend:1 teb:7ffa5000 unfrozen start:msdtcprx! Cconnectionmanager::timerproc (62a6ae8b) priority:0 priority class:32 Affinity:f id:770.8f0 suspend:1 Teb      : 7ffa4000 unfrozen priority:0 priority class:32 affinity:f id:770.8f4 suspend:1 teb:7ffa3000 Unfrozen start:msdtcprx! Csessionobject::maintainsession (62ab1061) priority:0 priority class:32 Affinity:f id:770.900 suspend:1 T eb:7ffa2000 unfrozen Priority:0 priority class:32 affinity:f id:770.940 suspend:1 teb:7ffa1000 unfrozen start:msvcr80!_threadstarte X (781329E1) priority:0 priority class:32 affinity:f id:770.954 suspend:1 teb:7ff9f000 unfrozen Sta Rt:msvcr80!_threadstartex (781329E1) priority:2 priority class:32 Affinity:f id:770.95c suspend:1 teb:7 ff9d000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 Affinity:f Id: 770.960 suspend:1 teb:7ff9c000 unfrozen start:msvcr80!_threadstartex (781329E1) priority:0 Priority class: Affinity:f id:770.aa8 suspend:1 teb:7ff9b000 unfrozen start:msvcr80!_threadstartex (781329e1) Prio rity:0 priority class:32 affinity:f id:770.ad0 suspend:1 teb:7ff9a000 unfrozen start:msvcr80!_threadsta      Rtex (781329E1) priority:0 priority class:32 affinity:f id:770.c80 suspend:1 teb:7ff91000 unfrozen start:netapi32! NetbioswAiter (71c5110d) priority:0 priority class:32 Affinity:f-id:770.e54 suspend:1 teb:7ff82000 unfrozen Start:msvcr80!_threadstart (7813286e) priority:0 priority class:32 Affinity:f id:770.10cc suspend:1 Teb : 7ff81000 unfrozen start:00070cbe priority:0 priority class:32 affinity:f-id:770.e2c suspend:1 Teb : 7ffa6000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 Affinity:f d:770.360 suspend:1 teb:7ff92000 unfrozen start:msvcr80!_threadstartex (781329e1) priority:0 priority Clas      s:32 affinity:f id:770.12b4 suspend:1 teb:7ff93000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 affinity:f id:770.2b8 suspend:1 teb:7ff83000 unfrozen start:msvcr80!_threa      Dstartex (781329E1) priority:0 priority class:32 affinity:f id:770.7a8 suspend:1 teb:7ffdb000 unfrozen Start:msvcr80!_threAdstartex (781329E1) priority:0 priority class:32 affinity:f ID:770.1CFC suspend:1 teb:7ff96000 unfrozen Start:msvcr80!_threadstartex (781329E1) priority:0 priority class:32 Affinity:f id:770.1360 Suspend   : 1 teb:7ff97000 unfrozen Start:msvcr80!_threadstartex (781329e1) priority:0 priority class:32 Affinity:f Panax Notoginseng id:770.ef0 suspend:1 teb:7ff94000 unfrozen start:msvcr80!_threadstartex (781329e1) priority:0 Priori Ty class:32 affinity:f id:770.18c8 suspend:1 teb:7ff9e000 unfrozen Start:msvcr80!_threadstartex (781329e1 ) priority:0 priority class:32 affinity:f id:770.1c00 suspend:1 teb:7ff8e000 unfrozen start:msvcr8 0!_threadstartex (781329E1) priority:0 priority class:32 affinity:f id:770.1dc0 suspend:1 teb:7ffa0000 U Nfrozen Start:msvcr80!_threadstartex (781329E1) priority:0 priority class:32 Affinity:f

Next I'll dump stacks on the threads and look for patterns here:

~*kl100

For sake of brevity I won't list my output here, but majority of my threads has following calls on the stacks, as this Example with thread 40:

ID:770.1DC0 suspend:1 teb:7ffa0000 unfrozenchildebp retaddr 4dd6f460 7c827a19 ntdll! kifastsystemcallret4dd6f464 77ea4a4c ntdll! Ntsignalandwaitforsingleobject+0xc4dd6f4d8 01008721 kernel32! SIGNALOBJECTANDWAIT+0XAF4DD6F4FC 01002166 sqlservr! sos_scheduler::switch+0x814dd6f678 010028b2 sqlservr! sos_scheduler::switchcontext+0x3374dd6f694 01001b7a sqlservr! Sos_scheduler::suspendnonpreemptive+0xb54dd6f6ac 01002aea sqlservr! Sos_scheduler::suspend+0x2d4dd6f6d0 012d58ce sqlservr! sos_event::wait+0x14d4dd6f708 012d7c34 sqlservr! sos_event::waitallowprematurewakeup+0x714dd6f784 012d7fe9 sqlservr! xchngwait<31,10> >+0xfe4dd6f7a8 012d7d3b sqlservr! CXPORT::OPEN+0X13B4DD6F7BC 012dff7a sqlservr! Cqscanexchangenew::open+0x884dd6f7f4 012E0AAF sqlservr! cqscanhash::consumebuild+0xca4dd6f818 012e1246 sqlservr! CQSCANHASH::ITERATE+0X434DD6F83C 012E2A15 sqlservr! cqscanhash::open+0x6a4dd6f85c 012e04fe sqlservr! cqscanhash::consumeprobe+0x1284dd6f884 012e1246 sqlservr! Cqscanhash::iterate+0x544dd6f8a8 01025745 sqlservr! CQSCANHASH::OPEN+0X6A4DD6F8BC 012d7d8f sqlservr! cqscannew::openhelper+0x1f4dd6f930 012d7e7d sqlservr! cqscanxproducernew::open+0x494dd6f950 012d7a17 sqlservr! FNPRODUCEROPEN+0X5B4DD6FA10 012d754f sqlservr! FNPRODUCERTHREAD+0X3814DD6FD4C 010069f3 sqlservr! SUBPROCENTRYPOINT+0X5CC4DD6FDBC 01006b19 sqlservr! Sos_task::P aram::execute+0xe24dd6fdf0 01006829 sqlservr! Sos_scheduler::runtask+0xb94dd6fe30 0145ca28 sqlservr! Sos_scheduler::P rocesstasks+0x1414dd6fe98 0145c948 sqlservr! Schedulermanager::workerentrypoint+0x1ad4dd6feb0 0145c610 sqlservr! SYSTEMTHREAD::RUNWORKER+0X7F4DD6FF10 0145C7AD sqlservr! Systemthreaddispatcher::P rocessworker+0x2464dd6ff78 781329bb sqlservr! Schedulermanager::threadentrypoint+0x1434dd6ffb0 78132a47 msvcr80!_callthreadstartex+0x1b4dd6ffb8 77e6482f msvcr80 !_threadstartex+0x66

Note calls to sqlservr! Cqscanxproducernew::open above. This or it related calls like sqlservr! Fnproduceropen, sqlservr! Fnproducerthread, sqlservr! Subprocentrypoint point to excessive parallelism waits in SQL Server (Cxpacket wait) as a culprit in my case.

There is other patterns that I could has seen in the dump as well that would has pointed out other issues that can Caus E deadlocked schedulers Condition:

    • Blocking. Usually majority of stacks on your threads would has lockowner::sleep () function and lck_lockinternal:
ntdll! zwsignalandwaitforsingleobjectkernel32! signalobjectandwaitsqlservr! sos_scheduler::switchcontextsqlservr! sos_scheduler::suspendsqlservr! sos_event::waitsqlservr! lockowner::sleepsqlservr!lck_lockinternalsqlservr! Getlock
    • Buffer Pool Memory Issue or Lazy Writer IO waits–look for sqlservr! Bpool::steal calls
sqlservr! bpool::stealsqlservr! sqlsinglepageallocator::allocatepagessqlservr! memorynode::allocatepagesinternalsqlservr! memoryclerkinternal::allocatepagessqlservr! Imemobj::P bgetnewpagessqlservr! Cslotpagemgr::P ballocate
    • Other reasons for the condition may include super slow disk IO, very liberal use of SQL Server Profiler to capture profiles R traces, etc.

Check out very detailed article on Deadlock Scheduler analysis on Mssqlwiki–http://mssqlwiki.com/2010/06/15/how-to-analy ze-deadlocked-schedulers-dumps/, troubleshooting SQL blog–http://troubleshootingsql.com/2011/08/16/ Debugging-the-deadlock-but-for-the-scheduler-part-2-of-2/, as well ashttps://blog.basefarm.com/blog/ sql-server-deadlocked-schedulers-leads-to-unexpected-cluster-failoverpart-1/

VITAM POST mortem–analyzing deadlocked schedulers MINI DUMP from SQL SERVER

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.