Brief Introduction
SQL Server OS is a user-level operating system hierarchy on top of Windows that serves SQL Server. It abstracts the functionality of the operating system part from the entire SQL Server engine and forms a separate layer to serve the storage engine. SQL Server OS mainly provides a variety of functions such as task scheduling, memory allocation, deadlock detection, resource detection, lock management, Buffer pool management, etc. This article is mainly about the task scheduling mechanism provided in SQL OS.
preemptive (preemptive) scheduling and non-preemptive (non-preemptive) scheduling
The origin of task scheduling at the database level is an ACM article called "Operating System Support for database Management". But for Windows, at the operating system level to participate in supporting the database task scheduling, rather than in the SQL Server to abstract out a layer of scheduling, since you can abstract a layer of database-level task scheduling, so why not in this abstraction layer for memory and IO management? The idea is the origin of the SQL Server OS.
After Windows NT4, Windows task scheduling is preemptive, meaning that Windows tasks are determined based on priority and time slices for tasks. If a task's time slice runs out, or a task with a higher priority is waiting, the operating system can force the CPU that is occupied by the running thread (the basic unit of the Task Scheduler) to cede CPU resources to other threads.
However, for SQL Server, this kind of non cooperative, time based task scheduling mechanism is not so appropriate. If SQL Server uses the task scheduling mechanism within Windows to schedule tasks, Windows does not optimize according to SQL Server's scheduling mechanism, but breaks threads in terms of time slices and priorities, which results in the following two defects:
Windows does not know the best point of interruption in the tasks in SQL Server (that is, the task in SQL OS, which is described later in the article), which is bound to result in more context switch (the context switch is very, very expensive, Requires a conversion between the thread character user state and the kernel mentality, because Windows scheduling is not the thread itself deciding whether to sell the CPU, but is determined by Windows. Windows does not know whether the corresponding threads in the current database are doing critical tasks, and only indiscriminately captures the thread's CPU. The connection to SQL Server cannot be performed all the time, and there will be a lot of slack between each batch. If each connection needs to occupy a single thread, it is unwise for SQL Server to maintain these threads by consuming additional resources.
In the case of SQL Server OS, the mode of collaboration used in thread scheduling is not preemption mode. This is because the tasks within these databases are within the sandbox of SQL Server, and SQL Server is fully confident of its internal thread, so unless the thread voluntarily abandons the Cpu,sql Server OS does not force the thread to be stripped of its CPU. This way, although the switch between the worker is still through the Windows context switch, this mode of collaboration will significantly reduce the number of context switches required.
SQL Server determines which thread runs at which point in time, through a thing called scheduler, let's look at scheduler.
Scheduler
Each logical CPU in SQL Server has a corresponding scheduler, only the task that gets scheduler ownership is allowed to be executed, scheduler can be regarded as a logical CPU of a team Sqlos. You can look at all the scheduler in the system by sys.dm_os_schedulers this DMV, as shown in Figure 1.
Figure 1. View Sys.dm_os_schedulers
My notebook is a i7 four-core 8-thread CPU, corresponding to the hidden Scheduler that can be seen in addition to the DAC and run system tasks, the remaining Scheduler altogether 8, each corresponding to a logical CPU, for handling internal tasks. Of course, you can also set affinity to some scheduler Offline, as shown in Figure 2. Note that this process is online and can be implemented without restarting SQL Server.
Figure 2. Setting up affinity
At this point, you can see 4 scheduler being offline without restarting the instance, as shown in Figure 3:
Figure 3. Online Offline 4 Scheduler
In general, you do not need to control affinity unless you are running other instances or programs on your server.
In Figure 1, we also note that in addition to the visible scheduler, there are some special scheduler, these scheduler IDs are greater than 255, such scheduler are used for internal use of the system, such as resource management, DAC, Backup restore operations, and so on. In addition, although the number of scheduler and logical CPUs is the same, this does not mean that scheduler is bound to a fixed logical CPU, but scheduler can run on any CPU, only after you set up affinity mask The scheduler will be fixed on a CPU. One advantage of this is that when a scheduler is very busy, it may not cause only one physical CPU to be busy, because scheduler will move between multiple CPUs, making CPU usage tend to average.
This means that for a longer query, the first half can be executed on the CPU0, while the latter part executes on the CPU1.
In addition, on each scheduler, the same time can only have a worker running, all the resources are ready but did not get scheduler, then this worker is in runnable state. Now let's take a look at the worker.
Worker
Each worker can be considered as a corresponding thread (or fiber), scheduler does not directly dispatch the thread, but dispatches the worker. The worker increases as the load increases, in other words, the worker increases on demand until the maximum number is added. In SQL Server, the default maximum number of worker is managed by SQL Server. Depending on the 32-bit or 64-bit, and the number of CPUs to set the maximum worker, the specific calculation formula, you can refer to bol:http://msdn.microsoft.com/zh-cn/library/ms187024 (v=sql.105). aspx Of course, you can also set the maximum number of worker numbers, as shown in Figure 4.
Figure 4. Set maximum number of worker
If it is autoconfiguration, then the maximum number of worker threads for SQL Server can be seen in Sys.dm_os_sys_info, as shown in Figure 5.
Figure 5. View the maximum number of worker for automatic configuration
In general, you do not need to set this value, but there are some cases where you need to set this value. That is, the worker thread is exhausted and you cannot even connect to SQL Server except the DAC.
The worker actually corresponds to a thread on windows and binds to a particular scheduler, and every worker starts to execute a task unless the task completes, or the worker never abandons the task if a task is running due to a lock , Io and so on wait, in fact, the worker will fall into wait.
In addition, multiple batch in the same connection tend to use the same worker, such as the first batch using worker 100, then the second batch is also inclined to use worker 100, but this is not absolute.
The running task is used by the worker, we can view the running task through the DMV sys.dm_exec_requests, where the task_address column can see the task that is running, and then pass Sys.dm_os_ The Tasks worker_address to view the corresponding worker.
SQL Server retains approximately 2M of memory for each worker, and the number of worker that can be on each scheduler is the server's maximum number of worker/online scheduler, Each scheduler binds to a worker pool, which means that every scheduler needs a worker, finds the idle worker in the worker pool first, and creates a new worker if there is no idle worker. This behavior is similar to the connection pool.
So when a scheduler is idle for more than 15 minutes, or if Windows is under memory pressure. SQL Server will attempt to trim this worker pool to release the memory occupied by the worker.
Task
A task is the smallest task unit that runs on a worker. A task that only gets a worker can run. We can look at a simple example below, as shown in code 1.
SELECT @ @VERSION goselect @ @SPID Go
code 1. Two batch on a connection
The two batch in code 1 belong to a connection, and each batch is a simple task, as we mentioned earlier, these two tasks tend to reuse the same worker because they belong to the same connection. But it is also possible that these two tasks use different worker, or even different scheduler.
In addition to the tasks used by users, there are permanent system tasks that occupy the worker forever, including deadlock detection, Lazy writer, and so on.
The average distribution of tasks on scheduler
The new task will also try to distribute evenly between the scheduler, and you can see a load_factor column in Sys.dm_os_schedulers to refer to when the task is allocated to the scheduler.
Each time a new task enters node, it selects the scheduler with the least load. However, if you make a choice each time, you create a bottleneck in the task team (this bottleneck is similar to the tempdb Sgam page scramble). Therefore, for each connection, the SQL OS remembers the scheduler ID that was last run, as a hint when a new task enters (Hint). However, if a scheduler load is greater than 20% of all scheduler averages, this hint is ignored. Load can be seen in the Load_factor column mentioned above, for a task to run for a long time, it is very likely to cause scheduler task assignment uneven.
the yield of the worker
Because SQL Server is a preemptive dispatch, it is not possible to have the worker Occupy scheduler running to complete a task. If so, then the worker in Runnable will be hungry, which is not conducive to a lot of concurrency, but also violates the original intent of SQL OS scheduling.
Therefore, at the right time to give up scheduler is the key. The process by which the worker gives up the CPU makes it possible for other worker to run the call yield. Yield can be roughly divided into two kinds, one is called "natural yield", this way is the worker in the process of being locked or some resources blocked, at this time, the worker will give up scheduler to let other worker run. Another situation is that the worker did not encounter congestion, but after the time slice to the initiative to let SCHEDULER, this is the so-called "voluntarily yield", which is the origin of Sos_scheduler_yield wait type, The process by which a worker transfers from a running state to a waiting state is called a switching. One of the basic ideas of SQL OS is to do more switching to ensure high concurrency. Let's look at several common yield scenarios:
Voluntarily yield, based on time slices, probably makes the worker yield once every 4 seconds. This value can be seen through the sys.dm_os_schedulers quantum_length_us column.
Every 64K result set is sorted, do one yield.
Statement Complie, will do yield.
Every word in the batch is done, and it will be yield.
If the client can not take the data in time, the worker will also do yield.
Preemptive task scheduling in SQL Server OS
For some code, there are some preemption code for SQL Server. If you see the "preemptive_*" type of wait in the wait type, the code inside it is running in preemptive task scheduling mode. Such tasks include extended stored procedures, invoking Windows APIs, log growth (log fill 0). We know that collaborative task scheduling requires the task itself to yield, but such code is outside of SQL Server, and if it is allowed to run within the sandbox of the cooperative Task Scheduler, such code will always occupy scheduler if it is not yield. This is very dangerous.
Therefore, before entering preemption mode, you first need to give control of scheduler to the next worker in the runable queue. At this point, the code that is running in preemptive mode is no longer controlled by the SQL OS and is controlled by the Windows Task Scheduler system. So the lifecycle of a task, if added to the preemption task scheduling pattern, is shown in Figure 6 .
Figure 6. The complete lifecycle of a task
task scheduling for each scheduler
For each scheduler schedule, a simple model is shown in Figure 7 .
Figure 7. A Scheduler scheduling cycle model
Summary
SQL Server OS Abstracts a set of non preemptive task scheduling mechanisms on top of windows, reducing the context Switch. At the same time, there is a set of threads of its own yield mechanism, compared to Windows random preemption of the database within the thread, so that the thread itself to yield will significantly reduce the context Switch, thereby enhancing concurrency.