[Sorting]-UMS Internals of Relational Engine

Source: Internet
Author: User
Tags apc
SQL Server 6.5 uses Windows scheduling and management multithreading. Like other Windows applications, it uses Windows Standard APIs and does not use any hidden APIs, this makes the SQL Server worker thread identical to other multi-threaded Windows programs, without any special priority, equivalent to Windows scheduling. After SQL Server 7.0, it must be able to process thousands of concurrent users. However, the Windows Thread Scheduling Method restricts the efficiency of SQL Server in multi-thread processing, therefore, SQL Team decided to add its own Thread Scheduling Method to SQL Server 7.0. In this case, the UMS of SQL Server is born.

Preemptive Scheduling and Cooperative Scheduling)
In terms of task scheduling, Windows 95 and 98 use collaborative, Windows NT series, 2000, and so on, and use preemptible methods. In a collaborative manner, after the operating system assigns a processor to a process, the processor is completely controlled by the process itself, including when it is released for other processes. In this way, if the process that currently occupies the processor crashes and cannot release the processor, no other process can obtain the processor right, and the entire operating system will crash. Therefore, Windows 95 and 98 often crash, and the system does not respond. Only restart the system can solve this problem. In preemptible mode, the processor is completely under the process scheduling control of the operating system. The operating system divides the processor into Slice in time and assigns a time Slice to a process each time. If the time Slice ends, no matter whether the worker Process has completed its work, the operating system will suspend the process and distribute the subsequent time slices to other processes in the process priority order. In this way, as long as the operating system kernel has no bugs and the hardware driver does not go down, the system will never crash, because as long as the allocated time slice ends, the process scheduling of the operating system gets control of the processor. For example, after Windows 2000, process scheduling is performed in preemptible mode, which rarely causes the operating system to crash and no response due to applications. However, in this way, a process can be suspended at any time to wake up other processes. This frequent inter-process switchover occurs, especially when processes need to communicate and collaborate, many additional mechanisms and overhead are required to reduce the overall utilization of the processor.
SQL Server uses multithreading. In the preemptible mode, parallel threads (of course, parallel threads here refer to applications and users) in terms of time sequence, the application is uncertain and managed by the operating system. Data sharing between parallel threads, inter-thread communication, and collaboration all require various mechanisms. These mechanisms will reduce the efficiency of SQL Server Applications with high processing capacity and high real-time requirements. To avoid this, SQL Server UMS uses a collaborative approach to manage its own multithreading.

How to schedule UMS
Under the preemptible scheduling policy of the operating system, how does SQL Server avoid being scheduled by the operating system by multiple threads? In this regard, SQL Server uses a spoofing method. Windows considers that some threads are not active and they are in a permanent waiting state. For such threads, Windows does not consider scheduling them. With this mechanism, SQL Server ensures that there is only one active working thread at any time, and other threads are in the permanent waiting state, the operating system will not consider the multi-thread scheduling of SQL Server, and the multi-thread scheduling is completely under the control of UMS (note that the current active SQL Server worker thread is still scheduled by Windows, in this case, Windows considers SQL Server as a single-threaded application ). All SQL Server threads have a related event object. By passing in the INFINITE parameter to call the WaitForSingleObject method of the relevant event object, this thread can enter the permanent waiting state; by sending a signal to the relevant event object, the thread can be awakened to enter the execution state. When the active thread stops working, it will enter the permanent waiting state, and send signals to the event objects related to the subsequent threads to wake the subsequent threads into the active state. SQL Server uses this mechanism to ensure that there is only one active thread on each processor.
Compared with the operating system, SQL Server is more clear about what tasks each thread is running, how should tasks be coordinated, and how can the system's processing efficiency be maximized, therefore, the efficiency of SQL Server multithreading can be effectively improved under UMS scheduling.
This set of mechanisms for UMS to manage scheduling processes is called UMS Scheduler (UMS Scheduler ). When SQL Server is started, an UMS scheduler is created for each processor. For example, in a four-processor environment, SQL Server may create four UMS scheduler at startup. If the maximum working thread is 255, each UMS scheduler has a maximum of 64 threads that can be scheduled.
Of course, the UMS of SQL Server is not completely rewritten, replacing the scheduling algorithm of Windows. UMS only schedules the SQL Server thread according to its own needs, more effective management of asynchronous I/O. Windows-based thread management is equivalent to implementing a user-defined multi-thread Scheduler for SQL Server. UMS can use the thread mode or the fiber mode introduced in Windows NT 4.0. The basic management of threads and fiber is implemented using Windows.

UMS Scheduler
Each UMS scheduler consists of five components: a worker list, a runnable list, a waiter list, an I/O list, and a timer list.

The Worker List
Worker refers to the SQL Server Worker process, that is, the threads or fibers of SQL Server. The workers in the worker list is the currently available worker process.

The Connection Process
The connection of each client is allocated to an UMS scheduler. The allocation method is very simple. When there is a new connection request, which UMS scheduler has the least connection, the new connection will be allocated to this UMS scheduler. Once this connection is assigned, it is always managed by the specified UMS scheduler. Even if this UMS scheduler is very busy, there are other idle schedulers in the system.
For example, if the server has two CPUs and the application creates four connections, the possible cause is, the two connections allocated to the UMS scheduler of cpu a require 90% CPU usage, while the two connections allocated to the UMS scheduler of cpu B only require 10% CPU usage. In this case, you will find that one of the CPUs remains at a very high usage, while the other CPUs are basically idle. SQL Server cannot handle this type of load balancing. We can only balance the load at the application connection layer.
After a Connection is assigned to a scheduler, if the Worker List contains a worker available, a request will be taken to execute the connection. If the worker List does not have a Worker available, if the number of workers in the system has not reached the set value of the maximum number of workers, the scheduler creates a new worker to execute the request. If the number of workers has reached the maximum number, the request will be placed in the waiter list, waiting for the available worker to be allocated in FIFO mode.

Work Requests
Each work requests will be handed over to a worker for execution. When a worker executes a task, the worker does not accept other work requests until the task is completed. After the current work request is processed, the worker accepts the next work request, or enters the idle loop to wait for the next work request. A worker completes a work request from beginning to end. This processing method does not require frequent switching between workers for communication.
After a new work request is submitted, if a worker is executing the task, another worker is assigned to the request and enters the runnable list for execution. When many concurrent requests exist, SQL Server can easily reach the maximum number of workers. In this case, the new connection can only be accepted when other workers are idle.
If SQL Server is working in thread mode, a worker will be destroyed after being idle for 15 minutes. Of course, the system will ensure the minimum number of workers.

The "Runnable" List
Workers in the Runnable list are Worker Process queues that have been assigned a work request and are waiting for execution.
Workers in the Runnabel list are in a permanent waiting state and are not scheduled by windows. They are active only after the event object receives the signal, so that they can be scheduled by windows. So who sends signals to these workers?
As you can see from the previous introduction to the cooperative scheduling method of the operating system, if a task takes a long time in a collaborative manner, the computer cannot do anything else during this period, you can only wait for the process to finish processing. To solve this problem, the scheduler can intercept some other calls, such as system interruptions and operating system methods, during processing of each process, complete some necessary tasks, and then return the control to the process. UMS provides many such calling methods. The currently executed worker will call the method provided by UMS as needed based on its own running status to temporarily release the control, for other workers to complete some necessary operations. When these UMS methods are called, they will check the workers in the runnable list and send a signal to a worker-related event object to wake up the worker and start execution.
Although the UMS scheduler thread will be created after SQL Server is started, in fact, there is no dedicated thread in UMS's working process to be responsible for overall scheduling and all scheduling control, all the methods provided by UMS are called by each worker. Under this mechanism, global scheduling tasks are implemented.
Therefore, if you want to answer the previous question, the worker in the permanent waiting state, it may be awakened by a worker in the execution by calling the UMS method (during a long working process or at the end of the current job ), the UMS method is also called during or after the worker is executed to wake up other workers.

The Waiter List
The workers list in the Waiter list is the workers waiting for a resource. When an UMS worker requests a resource that is occupied by other workers, the worker is placed in the waiter list. When a worker that occupies a resource releases the resource, it is responsible for checking the waiter list and transferring the worker waiting for the resource to the runnable list. After the worker of the current job is executed, or when the UMS method is called halfway to release the control, the worker transferred to the runnable list may be awakened to start execution. For example, if the SQL Server has reached the maximum number of worker processes, and a new work request is submitted, the work requests will be placed in the waiter list, because no worker is available to them.

The I/O List
Maintain the UMS asynchronous I/O Request Object list in the I/O list.
Windows 98/ME does not support asynchronous I/O, and Windows NT series start to support. Asynchronous I/O is implemented by using Asynchronous Procedure CALS (APCs, APCs enables programs and system components to execute code in the context of a specific thread (that is, the kernel mode and user mode of the operating system) and transmit data in the address space of different processes. In asynchronous I/O, the I/O thread asynchronously reads data from the disk to the memory, transmits the read data to the thread that sends an asynchronous I/O Request and allows the I/O Request thread to process subsequent tasks. When a thread requests asynchronous I/O operations, it needs to perform asynchronous I/O initialization, create and set an OVERLAPPED structure, and pass the structure to the ReadFile/ReadFileEx or WriteFile/WriteFileEx function. During asynchronous operations, the I/O thread sets the State member inside the structure to STATUS_PENDING to indicate that the asynchronous I/O operation is in progress, the I/O Request thread can call HasOverlappedIoCompleted to check whether the asynchronous I/O operation is complete. The OVERLAPPED structure contains an APC function (created and set to the OVERLAPPED structure during asynchronous I/O initialization ), this function copies the results of I/O operations and related status information from the system memory space to the virtual address space of the I/O Request thread. After the asynchronous I/O operation is complete, the I/O Request thread queries HasOverlappedIoCompleted to true, and calls the APC function to convert the read data and status information to its own virtual address space, for subsequent processing.
The UMS asynchronous I/O Request object encapsulates the I/O Request Information and OVERLAPPED structure.
When UMS receives an I/O request, if it is Win98/ME, It initializes a synchronous I/O operation. If it is a Windows NT series, it initializes an asynchronous I/O operation. Then, call the asynchronous I/O Method of the operating system to put the asynchronous I/O Request object into the I/O list. Check the I/O list when any worker stops or calls the UMS method temporarily. It traverses the I/O list and uses the OVERLAPPED Member of the asynchronous I/O Request object as the parameter to call the HasOverlappedIoCompleted method. If an asynchronous I/O operation has ended, the request object is removed from the I/O list and the APC function ended with the asynchronous I/O request is called. We can see from this that the asynchronous I/O operation Of UMS is different from that of the operating system, the I/O Request thread monitors its status. After the I/O operation is completed, the I/O Request thread still calls the APC function. During the UMS and asynchronous I/O operations, the current worker is responsible for monitoring the status. After the I/O operation is complete, it is also the worker that detects the end signal. The APC function is called through the UMS method, subsequent operations for processing asynchronous I/O requests. In the Win98/ME environment, because I/O operations are synchronized, the I/O Request object will not be put into the I/O list, the I/O Request thread calls the Win32 I/o api and immediately calls the I/O request termination function after the I/O API returns.

The Timer List
Timer list maintains the Timer request. For example, if a worker needs to wait for a specified time for a resource, the worker will be placed in the timer list. When the current worker process calls the UMS method to interrupt, check the timer list after I/O list is processed. If a request exceeds the specified wait time, the worker process removes the worker from the timer list and places it in the runnable list.

The Idle Loop
The UMS method is called after the current worker process ends. If runnable list and I/O list are empty, timer list is empty, or the request has not reached the execution time, the idle loop is entered.
An Idle loop method: When the UMS scheduler is created, a windows event object is created and assigned to the UMS scheduler. If the current working process needs to enter the idle loop, it will use the event object of the UMS scheduler, and a time interval value, call the WaitForSingleObject method. In this way, when the time interval reaches, the operating system will wake up the event object of the UMS scheduler to enter the execution status, so that UMS starts to perform various checks according to the UMS mechanism.
Another method is to wake the UMS scheduler from the idle loop and execute the end APC function of asynchronous I/O after the asynchronous I/O operation is completed. Therefore, during the idle loop, either the asynchronous I/O operation is terminated or the wait interval is reached. when either of the two events occurs, will wake the UMS scheduler from the idle loop.
UMS's idle loop method ensures that UMS does not occupy any CPU resources during idle time.

Going Preemtive
In some cases, UMS needs to work in the preemptible mode of the operating system, such as extended stored procedures. In the UMS collaborative mode, the worker calls the UMS method at a specific point or interval to ensure the SQL Server runs smoothly. The development of extended stored procedures cannot meet such requirements. ODS APIs do not provide related functions using the UMS mechanism. Therefore, the execution of extended stored procedures is scheduled by the operating system and is not managed by UMS. The worker that executes the extended stored procedure is ignored by UMS. Therefore, multiple UMS workers run on the same processor when the extended stored procedure is executed.
If the extended stored procedure is improperly developed, a fatal error may cause the SQL Server process to crash.
In addition, the Linked server and distributed query processes are executed under the scheduling management of the operating system, just like the extended storage process.

Fiber Mode
Windows NT supports fiber mode. Fiber can be seen as a lightweight thread. In the fiber mode, multiple fiber share one thread. This processing method utilizes the advantages of multithreading, on the other hand, It also avoids some additional overhead for inter-thread context switching to a certain extent.
When SQL Server is working in fiber mode, a worker is a fiber, so multiple workers share one thread. Some operations in this mode are different from those in the thread mode, such as extending the stored procedure and using the linked server. In fiber mode, multiple workers share one thread, while a separate thread is required for execution of extended stored procedures and is scheduled by windows. If the worker is still processed in normal fiber mode, when the extended storage process is executed, other fiber related to this thread will be blocked. Therefore, in fiber mode, SQL Server uses a hidden scheduler (a scheduler that cannot be queried by DBCC SQLPERF) to separately process extended stored procedures and other operations.

Hide Scheduler
Operations such as backup and recovery require a large number of asynchronous I/O operations, which take a long time and consume a large amount of resources. If UMS processes these operations properly, the performance of all workers related to these UMS schedulers will be affected.
SQL Server creates some hidden schedulers for processing these types of operations. In this way, when SQL Server performs these operations, the operating system can coordinate among multiple processors, thus avoiding blocks of individual UMS schedulers.
Dbcc sqlperf (umsstats)
Unpublished commands are used to view information about UMS schedulers, including the total number of users of related schedulers, the number of workers in the workers list, and so on.
 
Reference
Inside the SQL Server 2000 User Mode Scheduler
Microsoft Windows Internals-Fouth Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.