Introduction
On Windows, SQL Server OS is a user-level operating system level used to serve SQL Server. It abstracts some functions of the operating system from the entire SQL Server engine and forms a separate layer to provide services for the storage engine. SQL Server OS provides multiple functions such as task scheduling, memory allocation, Deadlock Detection, resource detection, lock management, and Buffer Pool management. This article focuses on the task scheduling mechanism provided in SQL OS.
Preemptive scheduling and non-Preemptive Scheduling
The source of Database-level task scheduling is the "Operating System Support for Database Management" in ACM ". However, for Windows, adding task scheduling that supports databases at the operating system level is not as good as abstracting a layer of scheduling in SQL Server, since task scheduling at the database layer can be abstracted, why not manage the memory and I/O at this abstraction layer? This idea is the origin of SQL Server OS.
After Windows NT4, Windows Task Scheduling is preemptible, that is, Windows tasks are determined based on the task priority and time slice. If the time slice of a task is used up or a task with a higher priority is waiting, the operating system can forcibly deprive the running thread (the thread is the basic unit of task scheduling) of the CPU occupied, assign CPU resources to other threads.
However, for SQL Server, this non-cooperative time slice-based task scheduling mechanism is not suitable. If SQL Server uses the job scheduling mechanism in Windows for Job Scheduling, Windows will not optimize the job scheduling mechanism based on SQL Server, but will interrupt the thread based on the time slice and priority, this causes the following two defects:
Windows does not know the optimal breakpoint for tasks in SQL Server (that is, tasks in SQL OS, which will be discussed later in this article, this will inevitably lead to more Context switches (the price of Context switches is very high and requires conversion between the thread user State and the core State ), because Windows scheduling is not decided by the thread itself, but by Windows. Windows does not know whether the corresponding thread in the current database is performing a key task, but it will only randomly capture the CPU of the thread. The connection to SQL Server cannot be executed all the time, and there will be a lot of idle time between each Batch. If each connection needs to occupy one thread separately, it is unwise for SQL Server to maintain these threads and consume additional resources.
For SQL Server OS, thread scheduling adopts the cooperative mode instead of the preemption mode. This is because all the tasks in these databases are within the SandBox of SQL Server. SQL Server fully believes in its threads, so unless the thread voluntarily gives up the CPU, SQL Server OS will not forcibly deprive the CPU of the thread. In this way, although the Worker switchover is still performed through the Context Switch of Windows, this cooperation mode will greatly reduce the number of Context switches required.
SQL Server determines which thread is running at which point in time. It is implemented through something called Scheduler. Let's take a look at Scheduler.
Scheduler
In SQL Server, each logical CPU has a corresponding Scheduler. Only tasks with the ownership of Scheduler can be executed. schedlos can be considered as a logical CPU of SQLOS. You can view all Scheduler in the system through the sys. dm_ OS _schedulers DMV, as shown in 1.
Figure 1. View sys. dm_ OS _schedulers
My notebook is an i7 quad-core 8-thread CPU, corresponding to, you can see that in addition to DAC and HIDDEN Scheduler running system tasks, there are a total of 8 schedcpu, each corresponding to a logical CPU, used to process internal tasks. Of course, you can set Affinity to include some Scheduler Offline, as shown in figure 2. Note that this process is online and can be achieved without restarting SQL Server.
Figure 2. Set Affinity
In this case, you can see that four Scheduler instances are Offline without restarting the instance, as shown in Figure 3:
Figure 3. Four online Offline schedges
Generally, you do not need to control Affinity unless your server runs other instances or programs.
In Figure 1, we also note that apart from the Visible schedible, there are some special Scheduler whose IDs are greater than 255, and these schedid are used inside the system, for example, resource management, DAC, backup and restoration operations. In addition, although the number of schedcpu and logical CPUs is the same, it does not mean that schedcpu is bound to a fixed logical CPU, but Scheduler can run on any CPU, scheduler is fixed on a CPU only when Affinity Mask is set. One advantage is that when a Scheduler is very busy, it may not lead to only one physical CPU busy, because Scheduler will move between multiple CPUs, so that the CPU usage tends to average.
This means that for a long query, the first half can be executed on CPU0, And the last half can be executed on cpu1.
In addition, on each Scheduler, only one Worker can run at a time. If all resources are ready but no Scheduler is obtained, the Worker is in the Runnable state. Let's take a look at Worker.
Worker
Each Worker can be considered to correspond to a thread (or fiber). sched does not directly schedule the thread, but schedules the Worker. The number of workers increases as the load increases. In other words, the number of workers increases on demand until the maximum number is reached. In SQL Server, the maximum number of workers by default is managed by SQL Server. Set the maximum Worker based on 32-bit or 64-bit, and the number of cpus. For the specific formula, see BOL: http://msdn.microsoft.com/zh-cn/library/ms187024 (v = SQL .105). aspx. You can also set the maximum number of workers, as shown in figure 4.
Figure 4. Set the maximum number of workers
For automatic configuration, the maximum number of running threads of SQL Server can be seen in sys. dm_ OS _sys_info, as shown in Figure 5.
Figure 5. view the maximum number of automatically configured workers
In general, you do not need to set this value, but in some cases, you need to set this value. That is, the Worker thread is used up. In addition to DAC, you cannot even connect to SQL Server.
The Worker actually corresponds to a thread on Windows and is bound to a specific Scheduler. Every Worker will never give up the Task unless the Task is completed, if a Task is waiting due to locks, IO, etc. during running, the Worker will actually be waiting.
In addition, multiple batchs in the same connection tend to use the same Worker. For example, if the first Batch uses Worker 100, the second Batch also tends to use Worker 100, but this is not absolute.
The running task is a Worker. We can use DMV sys. dm_exec_requests: view the running Task. In the Task_Address column, you can view the running Task, and then use sys. dm_ OS _tasks Worker_Address to view the corresponding Worker.
SQL Server reserves about 2 MB of memory for each Worker. The maximum number of Worker workers on each Scheduler is the maximum number of Worker workers on the Server/online scheder, the Worker bound to each Scheduler will form a Worker pool, which means that when each Scheduler requires a Worker, it first searches for idle Worker in the Worker pool. If there is no idle Worker, to create a new Worker. This behavior will be similar to the connection pool.
When a Scheduler is idle for more than 15 minutes, or Windows is under memory pressure. SQL Server tries to Trim the Worker pool to release the memory occupied by the Worker.
Task
A Task is the smallest Task unit running on a Worker. Only tasks of the Worker can be run. Let's take a look at the following simple example, as shown in Code 1.
SELECT @@VERSION goSELECT @@SPID go
Code 1. Two batchs connected
Two batchs in code 1 belong to one connection, and each Batch is a simple Task. As we mentioned earlier, these two tasks tend to reuse the same Worker because they belong to the same connection. However, it is also possible that the two tasks use different workers or even different Scheduler.
In addition to the tasks used by users, there are also some permanent system tasks which occupy the Worker forever. These tasks include Deadlock Detection and Lazy Writer.
Average assignment of tasks on Scheduler
The new Task will also try to distribute evenly among schedul. you can see a load_factor column through sys. dm_ OS _schedulers. The value of this column is used for the Task to allocate to Scheduler for reference.
Each time a new Task enters the Node, Scheduler with the least load will be selected. However, if you make a choice every time, it will cause a bottleneck when the Task is in the queue (this bottleneck is similar to the TempDB SGAM page competition ). Therefore, for each connection, SQL OS will remember the Scheduler ID that was last run as a prompt (Hint) when the new Task enters ). However, if a Scheduler load is greater than 20% of the average value of all Scheduler, this prompt is ignored. The load can be seen from the load_factor column mentioned above. If a Task runs for a long time, it is likely that the Task allocation on Scheduler is uneven.
Worker's Yield
SQL Server is a non-preemptible scheduling Task, so it is impossible for the Worker to occupy Scheduler for a long time to complete a Task. In this case, the Worker in Runnable will be hungry, which is not conducive to a large number of concurrent jobs and violates the original intention of SQL OS scheduling.
Therefore, it is critical to make Scheduler available at the right time. The process in which a Worker allows other workers to run the CPU is called yield. Yield can be divided into two types: one is the so-called "natural yield", which means that the Worker is locked during running or some resources are blocked, this Worker will let out the Scheduler to run other workers. In another case, the Worker does not encounter congestion, but after the time slice arrives, schedeld is automatically released. This is the so-called "voluntarily yield", which is the origin of the SOS_SCHEDULER_YIELD waiting type, the process of switching a Worker from the RUNNING state to the WAITING state is called switching. One basic idea of SQL OS is to implement switching to ensure high concurrency. The following describes several common yield scenarios:
The voluntarily yield based on time slice makes the Worker yield every 4 seconds. This value can be seen in the quantum_length_us column of sys. dm_ OS _schedulers. Yield is performed every 64 K result sets. Statement complie, yield. When reading the data page, yield is performed once every sentence in the batch. If the client cannot remove the data in time, the worker will also perform yield.
Preemptible Task Scheduling in SQL Server OS
For some code, SQL Server has some preemptible code. If you see the "PREEMPTIVE _ *" type wait in the waiting type, it indicates that the code is running in the preemptible task scheduling mode. Such tasks include extended stored procedures, Windows API calls, and log growth (0 for logs ). We know that cooperative task scheduling requires the Yield of the task itself, but this type of code is out of SQL Server. If you want them to run in the SandBox of cooperative task scheduling, if this type of code is not yield, scheduler will be used forever. This is very dangerous.
Therefore, before entering the preemptible mode, you must first give control of Scheduler to the next Worker in the Runable queue. In this case, the code running in preemptible mode is no longer controlled by SQL OS, but controlled by the Windows Task Scheduling System. Therefore, if the lifecycle of a Task is upgraded to the preemptible Task scheduling mode6..
Figure 6. complete lifecycle of a Task
Scheduling of each Scheduler task
A Simple Model For Scheduler Scheduling7..
Figure 7. A Scheduler scheduling cycle model
Summary
On Windows, SQL Server OS abstracts a non-preemptible task scheduling mechanism, which reduces the Context Switch. At the same time, there is also a set of thread-based yield mechanism. Compared with Windows's random preemption of the threads in the database, enabling the thread to come to yield will greatly reduce the Context Switch, thus improving concurrency.