Original address: http://www.cnblogs.com/lyhabc/p/3367274.html
Before reading this article, you can read the following article first
SQL Server's unique task scheduling algorithm "Sqlos"
Task Scheduler for SQL Server Sqlos [go]
Translated from:
http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query/
Http://www.codeproject.com/Articles/630346/Understanding-how-SQL-Server-executes-a-query
I do not know which article to copy which article, regardless of him, I also stole their article, quack quack ...
I will use all my skill and combine the knowledge of SQL Server to translate this article.
I hope you have a lot of support O (∩_∩) O, in fact, I also stand on the shoulders of giants, hehe ~
Special note: In order to save space, the text will be deleted, the deletion of the parts are some unimportant part
Here thank Youdao Dictionary (I am not selling ads AH!!!) )
Official start
Connection methods and requests
If you are a developer and your program uses SQL Server to do the database
You'll want to know what actually happened when you used your program to execute a query.
I hope this article will help you write a better database application and help you gain a deeper understanding of the database performance issues you are experiencing
SQL Server is a platform for the C/S model. The only way to interact with the database is to send requests that contain database commands to the database server side.
The Protocol for client and database communication uses a protocol called TDS (tabular Data Sream)
Articles in the garden:
Tabular Data Stream Protocol TDS
TDS Protocol resolution (reprint)
If you use Microsoft's Network Monitor tool to crawl the networking packets between SQL Server and the client
You will see that the TDS protocol is used
In the Description column
tds:response,version=7.1 ...
tds:sqlbatch,version=7.1 .....
The four SSL connections are encrypted connections made before the client logs on to SQL Server (here, regardless of whether you have encrypted data transfer with SSL, SQL Server encrypts it before logging on
The user name and password sent by the user, and then the SSL certificate that you configured is used to encrypt the data between the client and SQL Server.
SQL Server encrypts the user name and password sent by the client (using SQL authentication is not using Windows authentication).
You can take a look at the log of SQL errorlog at the time of server startup
will see: A self-generated certificate was sccessfully loaded for encryption
By default, SQL Server automatically generates a certificate and uses the certificate to encrypt the connection to SQL Server when the client logs on to SQL Server, and does not encrypt the data that is connected/transmitted.
and SQL Server automatically generates the certificate. Each time SQL Server starts, it automatically generates a different certificate.
MSDN is described in this way: the Tabular Data Stream protocol enables applications to use one of the following several drivers that have implemented the TDS protocol
Driver to connect to the database, including:
The CLR managed SqlClient
OLE DB
Odbc
Jdbc
PHP Driver for SQL Server
FreeTDS implementation of Open source
When your application commands the database, it sends a request to the database through the TDS protocol.
The request itself can carry information in the following formats
(1) Batch processing request
This type of request will only contain a batch of tsql text that needs to be executed. This type of request cannot have parameters, however, the TSQL batch script
can contain definitions of local variables. This type of request is typically sent using the SqlClient driver,
When you use the SqlCommand object to invoke any one of the following statements, and no arguments are passed in
SqlCommand.ExecuteReader ()
SqlCommand.ExecuteNonQuery ()
Sqlcommand.executescalar ()
Sqlcommand.executexmlreader ()
When you monitor with SQL Profiler you will see a: Sql:batchstarting event type
(2) Remote Procedure Call request
This request type contains a stored procedure with several parameters.
When you monitor with SQL Profiler you will see a: Rpc:starting event type
(3) Bulk Load Bulk Load Request
A bulk load request is a special request that uses the BULK insert operator,
For example, using
Bcp. EXE tool (we often speak of bcp command)
BULK INSERT statement
IRowsetFastLoad OLE DB interface
SqlBulkCopy class in C #
Bulk load requests are different from other types of requests,
As the request passes through the TDS protocol to SQL Server and the transfer is not complete, SQL Server begins to perform the requested action.
(Generally, the entire requested packet is sent to SQL Server, and SQL Server considers the complete packet to start executing the request)
However, the bulk load request is not the same, the packet contains a large amount of data, which is attached to the request, if the entire request is sent to complete
SQL Server only starts to execute the request, do not know to wait what year and month???
This allows SQL Server to start executing the request and start consuming the data inserted in the data stream
Here is a picture of a long time ago, you can refer to the picture content for the current SQL Server is not necessarily correct
Connection Module (Modular)
Tasks and workers (Workers)
When a full TDS request arrives at the SQL Server database engine, SQL Server creates a task to process the request
To query all requests in the current SQL Server, you can use the sys.dm_exec_requests DMV view
Task (Tasks)
The tasks mentioned above will be created to process the request until the request is processed.
For example, if the request is a request of the type of a batch request, the task executes the entire SQL batch and is not solely responsible for executing a single SQL statement in the SQL batch
A separate SQL statement in the SQL batch does not create a new task (tasks).
Of course, a separate SQL statement in the SQL batch may be executed in parallel (typically using MAXDOP, or degree of Parallelism)
In this case, the task will regenerate the new subtask (sub-tasks) to execute this separate SQL statement in parallel.
If the request returns the complete result set required by the batch, and the result set has been taken away from the SQL Server's result set cache by the client
And you dispose of SqlDataReader in the C # code, you'll see your request in the DMV view of Sys.dm_os_tasks.
The task (tasks) used.
When a new request arrives at the server side of SQL Servers and the task is created to handle the request
If the task is in a suspended (PENDING) state at this point, SQL Server does not yet know the actual content of the request.
The created task must first execute the request, and the database engine assigns a worker to handle the request.
Worker (Workers)
Worker (Workers) is a thread of SQL Server line constructor, some workers (Workers)/worker threads in SQL Server
is created when initialized, while others (Workers) are created on demand, when the number created reaches max worker threads
This configuration value can no longer be created, displayed as 0, he is not to say that the creation of unlimited workers (Workers)
The configuration of my machine is core i3, dual core four thread, then if max worker threads is configured to 0
The largest number of workers (Workers) can reach 256
In fact, only the worker (Workers) actually executes the SQL code.
Workers (Workers) wait for tasks that have been sent into SQL Server's request (tasks)
From a suspended (PENDING) state to a functioning state, each worker (Workers) processes and executes only one task (tasks)
At this time, the worker (Workers) will remain in working condition and be occupied until his work is completed (task finishes)
If there are currently no available workers (Workers) to use for tasks that are in a pending state, then this task (tasks)
You can only wait until the tasks that are already executing/running are completed, and the worker (Workers) finishes processing a
Task (tasks) will continue to process the next pending task (tasks).
For a SQL batch request, the worker (Workers) handles the task that carries the SQL batch (tasks)
Each SQL statement inside the SQL batch is executed.
Some people will ask: SQL in a batch of SQL statements is not parallel execution?
(= = Requests the request and Task task "), a batch request comes in, and multiple workers process each SQL statement in the batch request,
This is obviously the concurrency SQL statement that is being processed by SQL Server.
Many people have this idea, which is actually wrong, and actually these SQL statements are executed serially , and these SQL statements are executed only by the
a separate thread (worker worker) to execute, and the thread (worker worker) executes an SQL statement before executing the next SQL statement.
When the SQL statement inside the SQL batch uses the parallel hint maxdop>1 to execute the SQL statement, this results in creating subtasks (sub-tasks),
Each sub-task (sub-tasks) is executed by the loop above: The task is created and then suspended .
The other (worker worker) has to deal with this subtask (sub-tasks)
You will see the current list of worker workers in SQL Server and their current status in the DMV view of Sys.dm_os_workers.
Explanation (parsing) and compilation (compilation)
Once a task starts executing a request, the first thing to do is to understand the contents of the request
In this step, SQL Server behaves more like a virtual machine (similar to a JVM) that is interpreted by the code: the TSQL code in the request will be interpreted individually
and an abstract syntax tree is generated to handle this request. The entire batch request is interpreted and compiled, and if an error occurs in this step,
SQL Server will give you a hint to compile/interpret the error, the request will be terminated and will not be executed, the task and worker (worker) will be released.
The freed worker will continue to process the next suspended task.
SQL language and TSQL (called tsql,oracle plsql in SQL Server) language is an advanced descriptive language
When a SQL statement is complicated, imagine that a SELECT statement is accompanied by multiple join
1 use [Gposdb] 2 GO 3 SELECT * FROM [dbo]. [Ct_append] As a 4 INNER JOIN 5 [dbo].[ Ct_fuelingdata] as B 6 on A.[vc_a_cardno]=b.[vc_fd_cardno] 7 INNER JOIN 8 [dbo].[ Ct_dis_fuelingdata] as D 9 on a.[vc_a_cardno]=d.[vc_fd_cardno]10 INNER JOIN one [dbo]. [Ct_inhousecard] As E12 on e.[vc_ic_cardno]=d.[vc_fd_cardno]13 INNER JOIN14 [dbo]. [Ct_outercard] As F15 on F.[vc_oc_cardno]=a.[vc_a_cardno]
A compiled TSQL batch does not produce executable code (executable code, like an executable binary EXE file),
This is more like a local CPU directive, or even a CLI directive like C # or a Java JVM bytecode
However, there is an execution plan (query plans) for accessing table data, which describes how to access tables and indexes,
How to search and locate the row data inside the table, how to do the data operation according to SQL statements in SQL batch processing.
For example, an execution plan would describe a data access path-"Access the index idx1on the T-table , and locate the row record with the keyword ' k ',
Finally, both column A and column B columns are returned.
Also: Developers often make a common mistake.
In a TSQL statement to write a lot of conditional selection, usually these conditions will be used in a WHERE clause with or
For example:[email protected] OR @parameter is NULL
This is something that must be avoided for developers.
At this point, the compilation must come up with a generic execution plan that will produce optimal results regardless of any parameters being placed into the execution plan.
Parameterization in TSQL (Dynamic Search Conditions)
For example, the following SQL statement:
1 SET STATISTICS profile ON2 GO3 INSERT into [dbo]. [Systempara] ([Paravalue], [Name], [Description]) 4 VALUES (' 2 ',---Paravalue-varchar (5) ' 3 ',--Name-varchar () 6 ' 4 ' --Description-varchar (7)
When you open the Set STATISTICS profile on switch, you will see in the Argument column and the Definedvalues column
SQL Server assigns the value entered 2,3,4 to the three variables in expr1004,expr1005,expr1006
And do some type conversions, Expr1004=convert_implicit (VARCHAR (50), [@1],0)
2 This value will be entered into the @1 variable and then assigned to Expr1004 by the type conversion.
Recordno This column is also, through the Getidentity ((277576027), (), NULL) function to obtain self-increment
and assign it to Expr1003.
So, in SQL Server's execution plan, you can imagine the following
1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (expr1003,expr1004,expr1005,expr1006)
Assigns the actual value to the @1,@2,@3,@4 and assigns the value to expr1003,expr1004,expr1005,expr1006 by type conversion
expr1003= type conversion (@1)
expr1004= type conversion (@2)
expr1005= type conversion (@3)
expr1006= type conversion (@4)
Why does SQL Server not use the following execution plan directly?
1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (1,2,3,4)
Also type conversion, parameter generation in this trouble, SQL Server is not sick???
This involves performing plan reuse, and if you use the above execution plan, the compilation time is very fast, but
If the value I inserted is: 9,8,6,7
1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (9,8,6,7)
SQL Server cannot reuse the last execution plan and rebuild the execution plan, you say such efficiency ...
Optimization (optimization)
Just talking about choosing a data access Path (execution plan), now continue to the next step in the life cycle of a request: optimizing
In SQL Server, optimization means selecting the best data access path from multiple selection criteria.
Consider that if you have a simple join query involving two tables, each table has an additional index.
There are 4 optional execution options to access the data in the table
Because there are so many alternatives, query complexity is already high, and if the index in the table continues to increase, query complexity may grow exponentially
Plus join joins are three ways to join:nested loops join, merge join, hash join
As you can imagine, how important it is to optimize this noun in SQL Server, SQL Server uses a query optimizer to estimate how much time is spent in the middle, io,cpu
The query optimizer considers a variety of execution scenarios, and SQL Server makes every effort to evaluate it based on the cost of each execution scenario, and then chooses as much as possible one of the lowest-cost
Implementation of the programme. SQL Server first calculates how much overhead each of the various execution scenarios requires under the existing table data volume. In order to elect a least-cost execution plan,
SQL Server needs to know the amount of data for each table to be joined and the distribution of the data for each field in the table, which relies on statistical information ,
Because statistics are meant to be used to count these data. Another factor to consider is the CPU and memory consumption required for each execution scenario
Combined with all of these factors, Sqlsrver will work out a cost value in each of the execution scenarios.
SQL Server will select an execution plan with the lowest cost in these execution plans as execution
Let's see, SQL Server has to consider all the above factors, and it takes time to think about it, so why SQL Server
The execution plan needs to be cached in memory in order to continue using the execution plan in the future in order to save compilation time
The same requests go into SQL Server in the future, and these requests can find a compiled and optimized execution plan in the cache
They'll be able to skip the optimization phase of the query optimizer
It is important to note that the same request comes in SQL Server, regardless of whether there are any reusable execution plans in the cache, SQL Server needs
The SQL statement in the request is parsed , so I said above: just to save compilation time and not to save parsing/compiling time
Interpreting and compiling modules (modular)
Execution (execution)
Once the query optimizer chooses an execution plan, the request is ready to execute. The execution plan is translated into an actual execution tree
Each tree node is an operator, and all operators implement an abstract interface with 3 methods, open (), next (), Close ()
If you are a C # programmer or a Java programmer, it is not difficult to understand what is an interface, what is a method, what an abstract interface
There are relevant information on MSDN:Showplan logical operator and physical operator reference
A query plan is a tree (the execution tree) that consists of physical operators
logical operators
The logical operators describe the relational algebra operations used to process statements. In other words, a logical operator conceptually describes what needs to be done.
Physical operators
The physical operator implements the operation described by the logical operator. Each physical operator is an object or routine that performs an action. For example, some physical operators can access columns or rows in a table, index, or view. Other physical operators perform other operations, such as calculations, aggregations, data integrity checks, or joins. Physical operators have the overhead associated with them.
The physical operator initializes, collects data, and then shuts down. Specifically, the physical operator can respond to the following three method invocations:
Init (): The init () method initializes the physical operator and sets all required data structures. Although a physical operator typically receives only one Init () call, it can also receive many calls.
GetNext (): The GetNext () method causes the physical operator to obtain the first or subsequent row of the data. Physical operators can not receive GetNext () calls, or they can receive many calls.
The close (): Close () method causes the physical operator to perform some cleanup operations and then close. A physical operator receives only one Close () call.
The GetNext () method returns a data row whose number of calls is displayed as actualrows in the Showplan output generated using the set STATISTICS profile on or set STATISTICS XML on. For more information about these set options, see Set STATISTICS profile (Transact-SQL) and set STATISTICS XML (Transact-SQL).
The operator in this article actually refers to the physical operator: three methods refer to open () =init (), Next () =getnext (), Close () =close ()
Each physical operator is a three method that calls itself
During SQL Server execution of the request, the root node of the execution tree is repeatedly called open () and then repeated calls to next () until the value of false is returned
Finally call Close (). The operator of the root node of the tree calls the same operator of his child node in turn, and the child node calls the same operator of his child node in turn
has been called down. Leaf nodes in a tree are generally the physical operators that read table data or table indexes. The middle node of the execution tree is generally a number of operators that implement different data operations
For example: Filter table data, join joins, sort data. Those queries that use parallelism use a special operator called Exchange oprators (Exchange operation)
The exchange operator operator uses multithreading (tasks = workers) during execution, invoking each thread to execute the subtree's execution plan.
The output of these operators is then aggregated and the typical (multi-producer "-" one consumer pattern) is used in this process.
Refer to this article for Exchange oprators (Exchange Operations):bitmap operators in parallel processing and execution plans in SQL Server 2000
We can see the execution tree using SET STATISTICS profile, here are some of the names of the columns, not to mention in more detail, there is a lot of information on the Internet
NodeId: Tree node
Parent: Father Node
Physicalop: physical operator
LogicalOp: Logical operators
The execution model of this execution tree is not only applied to query, insert, delete, update execution is also executed using the execution tree.
Inserting records, deleting records, and updating records have the appropriate operators
A situation in which an execution tree has no subtrees
A situation in which an execution tree has subtrees
If the execution tree has a subtree, he executes it from the leaf node of the subtree and executes to the root node of the tree.
In particular, these operators also have a stop-and-continue behavior feature, meaning that unless their child node operators have absorbed all of their input, they can generate input
For example: the sort operator, which does not return any results when the next () function is initially called because his child nodes have not yet read all the data,
This is the time to stop executing the next () function (each operator will call the next function when it is created) until his child has read all the data.
To sort (continue calling the next () function), take out the result set and sort
If the data is already cached in memory, SQL Server does not need to go to the disk to fetch data, directly in the memory of the data, in the memory of this space,
SQL Server official term is called: Buffer pool
In memory, the official term for SQL Server is to cache the space in the execution plan: plan cache
Module of Execution (modular)
Results (Results)
After execution, Sqlerver returns the result set to the client application
When executing to the root node of the execution tree, the root node is usually responsible for writing the result set to the network buffer (buffers)
These result sets are then sent back to the client. A complete result set has not been created, and some of the results are first stored in intermediate storage (memory or disk)
Then piecemeal to the client, such as a SQL statement query results need to return 10 records, 3 records have been generated, can be returned to the client
SQL Server first places these 3 records in intermediate storage (memory or disk), or it can be called a network buffer, and so on the client to fetch the 3 records, and so on.
When the result set is returned to the client, SQL Server uses the network flow Control protocol.
If the client is not actively taking these result sets (for example, calling Sqldatareader.read ()). Will eventually cause the network flow control component to block
The result set send side and suspends execution of the query.
Only the network flow control component coordinates and alleviates the need for network resources (the network is not blocked), the query recovers, and the result set continues to be generated
Don't know if you've ever met a wait type: Async_network_io waiting
, the client two waits, in the Sqlsrver query will show the async_network_io type of wait
Interestingly, the output parameter returns, and the value of the output parameter is inserted into the network data stream that is returned to the client's result set.
When the request is complete, the output parameter value can only be written to the result set at the end of the query execution, which is why the output parameter value
The value of the output parameter can be checked only if all result sets are returned
Memory to be assigned during query execution (query execution Memories Grant)
Some operators require a fixed amount of memory to perform their work. Sort operator in order to sort requires memory to store data entered into the sort operator
Hash join and hash aggregation must build large hash tables to perform their work. The execution plan knows how much memory is required for those outstanding operators
Depending on the type of operator, the estimated row record, the operator must handle the size of the field that the statistic provides to his table.
The total memory required by those operators in the execution plan is often referred to as memory-giving .
Imagine that when very many concurrent queries are executed, because of the large number of expensive operators (these operators generally require a lot of memory, so called expensive)
Need to request memory, at the same time they can run out of computer memory.
To prevent this from happening, SQL Server uses something called a " resource semaphore ." This will ensure that the total memory allocation of the query being executed will not exceed
The sum of memory in the current computer. When the total memory allocation is running out of available memory on the current server, the queries that are executing must wait for those to be executed quickly.
Queries to release the memory they have.
You can query sys.dm_exec_query_memory_grants this DMV view to get the current memory allocation (requested memory, allocated memory)
When a query must wait for memory to be assigned/allocated, you can see the execution Warnings event type in SQL Profiler
Execution Warnings Event type indicates a memory allocation warning during SQL statement or stored procedure execution
This event type is capable of monitoring certain queries that must wait for one second or more of memory, or for queries that fail to get memory
In SQL Profiler, some memory-related event types
Exchange Spill Event Types
Sort Warnings Event Type: Insufficient memory required when sorting
Hash Warning Event Type
Related statements
1 SELECT * FROM Sys.dm_exec_query_resource_semaphores2 3 4 select * FROM sys.dm_exec_query_memory_grants
1 SELECT [session_id], 2 [request_id], 3 [start_time], 4 [status], 5 [command], 6 [Wait_type], 7 [text_size], 8 [language] , 9 [transaction_isolation_level],10 [row_count],11 [granted_query_memory],12] [executing_managed_code]13 from Sys. [Dm_exec_requests]
How do I take advantage of this information (how can I use all this information)
The above information may help you resolve performance troubleshooting problems (performance issues)
Once you understand that your client is sending multiple requests to sqlserver,sqlserver in creating multiple tasks (Task) to process
You send him a request, the performance of the puzzle can be easily solved: many times, your task is not executing (occupied CPU) is waiting
Each time you wait, SQL Server relies on internal waiting statistics to collect waiting information (wait for what and how long).
It's a great way to use the collected statistics to solve performance bottlenecks.
Enclosed are two complete graphs
Summarize
The text seems to be missing scheduler.
Scheduler
For each logical cpu,sqlserver there will be a scheduler corresponding to the CPU object on the SQL plane,
Only the worker who gets scheduler ownership can run on this logical CPU.
The translation has ended ~
If it is helpful to you, I hope to give you a recommendation O (∩_∩) o
Related articles:
SQL Server Performance Issues-Wait Resource_semaphore
How to know how long the TSQL statement has been running
SQL Server connection encryption (1)--SQL Server Connection encyption
If there is a wrong place, welcome everyone to shoot brick O (∩_∩) o
-----------------------------------------------------------------------------------------
2013-10-26 Supplement
About Time Statistics
1 SET STATISTICS time on 2 GO
1 SQL Server parse and compile time: 2 CPU time = 0 milliseconds, occupied time = 58 milliseconds. 3 4 SQL Server Execution time: 5 CPU time = 0 milliseconds, occupied time = 1 milliseconds.
I think the data statistics that SQL Server displays should be the insertion of time statistics code at the beginning and end of the parsing and compiling modules and executing modules.
To make a count of the time used.
About Time Statistics range
The red circle part of the figure I think is where the SQL Server team inserts the time statistics code
Parse and compile time: Regardless of the plan cache, from the start of the command parser to the left of the query optimizer end
Execution time: starting from the query executor, leave the query executor end
Some people will think that it should be the result set is stored in the network buffer or the result set real to the client's hand is the end of execution time
-------------------------------------------------------------------------
But I don't think so. The result set has been generated before reaching the network buffer , indicating that the query execution is complete.
Query execution complete meaning: All the results have been generated, not to say that the customer to 10 records, Sir, 3 Records, the 3 records into the network buffer to the client to take away
This statistic is unscientific, it should be 10 records have been generated (as standard), and before transmission to the network buffer
The client does not have to take away and when the result set arrives at the client, SQL Server does not need to be concerned because of various situations, such as: Network blocking
This cannot be counted as the execution time of SQL Server.
It is not possible for the SQL Server team to write the time statistics code on the client side, and the client is not part of the Sqlserver,sqlserver team.
Time statistics code written in the client's application ah???
I wonder what your opinion is???
Takes you to understand how SQL Server executes a query.