Takes you to understand how SQL Server executes a query.

Source: Internet
Author: User
Tags bulk insert joins logical operators ole ssl certificate

Original address: http://www.cnblogs.com/lyhabc/p/3367274.html

Before reading this article, you can read the following article first

SQL Server's unique task scheduling algorithm "Sqlos"

Task Scheduler for SQL Server Sqlos [go]

Translated from:

http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query/

Http://www.codeproject.com/Articles/630346/Understanding-how-SQL-Server-executes-a-query

I do not know which article to copy which article, regardless of him, I also stole their article, quack quack ...

I will use all my skill and combine the knowledge of SQL Server to translate this article.

I hope you have a lot of support O (∩_∩) O, in fact, I also stand on the shoulders of giants, hehe ~

Special note: In order to save space, the text will be deleted, the deletion of the parts are some unimportant part

Here thank Youdao Dictionary (I am not selling ads AH!!!) )

Official start

Connection methods and requests

If you are a developer and your program uses SQL Server to do the database

You'll want to know what actually happened when you used your program to execute a query.

I hope this article will help you write a better database application and help you gain a deeper understanding of the database performance issues you are experiencing

SQL Server is a platform for the C/S model. The only way to interact with the database is to send requests that contain database commands to the database server side.

The Protocol for client and database communication uses a protocol called TDS (tabular Data Sream)

Articles in the garden:

Tabular Data Stream Protocol TDS

TDS Protocol resolution (reprint)

If you use Microsoft's Network Monitor tool to crawl the networking packets between SQL Server and the client

You will see that the TDS protocol is used

In the Description column

tds:response,version=7.1 ...

tds:sqlbatch,version=7.1 .....

The four SSL connections are encrypted connections made before the client logs on to SQL Server (here, regardless of whether you have encrypted data transfer with SSL, SQL Server encrypts it before logging on

The user name and password sent by the user, and then the SSL certificate that you configured is used to encrypt the data between the client and SQL Server.

SQL Server encrypts the user name and password sent by the client (using SQL authentication is not using Windows authentication).

You can take a look at the log of SQL errorlog at the time of server startup

will see: A self-generated certificate was sccessfully loaded for encryption

By default, SQL Server automatically generates a certificate and uses the certificate to encrypt the connection to SQL Server when the client logs on to SQL Server, and does not encrypt the data that is connected/transmitted.

and SQL Server automatically generates the certificate. Each time SQL Server starts, it automatically generates a different certificate.

MSDN is described in this way: the Tabular Data Stream protocol enables applications to use one of the following several drivers that have implemented the TDS protocol

Driver to connect to the database, including:

The CLR managed SqlClient

OLE DB

Odbc

Jdbc

PHP Driver for SQL Server

FreeTDS implementation of Open source

When your application commands the database, it sends a request to the database through the TDS protocol.

The request itself can carry information in the following formats

(1) Batch processing request

This type of request will only contain a batch of tsql text that needs to be executed. This type of request cannot have parameters, however, the TSQL batch script

can contain definitions of local variables. This type of request is typically sent using the SqlClient driver,

When you use the SqlCommand object to invoke any one of the following statements, and no arguments are passed in

SqlCommand.ExecuteReader ()

SqlCommand.ExecuteNonQuery ()

Sqlcommand.executescalar ()

Sqlcommand.executexmlreader ()

When you monitor with SQL Profiler you will see a: Sql:batchstarting event type

(2) Remote Procedure Call request

This request type contains a stored procedure with several parameters.

When you monitor with SQL Profiler you will see a: Rpc:starting event type

(3) Bulk Load Bulk Load Request

A bulk load request is a special request that uses the BULK insert operator,

For example, using

Bcp. EXE tool (we often speak of bcp command)

BULK INSERT statement

IRowsetFastLoad OLE DB interface

SqlBulkCopy class in C #

Bulk load requests are different from other types of requests,

As the request passes through the TDS protocol to SQL Server and the transfer is not complete, SQL Server begins to perform the requested action.

(Generally, the entire requested packet is sent to SQL Server, and SQL Server considers the complete packet to start executing the request)

However, the bulk load request is not the same, the packet contains a large amount of data, which is attached to the request, if the entire request is sent to complete

SQL Server only starts to execute the request, do not know to wait what year and month???

This allows SQL Server to start executing the request and start consuming the data inserted in the data stream

Here is a picture of a long time ago, you can refer to the picture content for the current SQL Server is not necessarily correct

Connection Module (Modular)

Tasks and workers (Workers)

When a full TDS request arrives at the SQL Server database engine, SQL Server creates a task to process the request

To query all requests in the current SQL Server, you can use the sys.dm_exec_requests DMV view

Task (Tasks)

The tasks mentioned above will be created to process the request until the request is processed.

For example, if the request is a request of the type of a batch request, the task executes the entire SQL batch and is not solely responsible for executing a single SQL statement in the SQL batch

A separate SQL statement in the SQL batch does not create a new task (tasks).

Of course, a separate SQL statement in the SQL batch may be executed in parallel (typically using MAXDOP, or degree of Parallelism)

In this case, the task will regenerate the new subtask (sub-tasks) to execute this separate SQL statement in parallel.

If the request returns the complete result set required by the batch, and the result set has been taken away from the SQL Server's result set cache by the client

And you dispose of SqlDataReader in the C # code, you'll see your request in the DMV view of Sys.dm_os_tasks.

The task (tasks) used.

When a new request arrives at the server side of SQL Servers and the task is created to handle the request

If the task is in a suspended (PENDING) state at this point, SQL Server does not yet know the actual content of the request.

The created task must first execute the request, and the database engine assigns a worker to handle the request.

Worker (Workers)

Worker (Workers) is a thread of SQL Server line constructor, some workers (Workers)/worker threads in SQL Server

is created when initialized, while others (Workers) are created on demand, when the number created reaches max worker threads

This configuration value can no longer be created, displayed as 0, he is not to say that the creation of unlimited workers (Workers)

The configuration of my machine is core i3, dual core four thread, then if max worker threads is configured to 0

The largest number of workers (Workers) can reach 256

In fact, only the worker (Workers) actually executes the SQL code.

Workers (Workers) wait for tasks that have been sent into SQL Server's request (tasks)

From a suspended (PENDING) state to a functioning state, each worker (Workers) processes and executes only one task (tasks)

At this time, the worker (Workers) will remain in working condition and be occupied until his work is completed (task finishes)

If there are currently no available workers (Workers) to use for tasks that are in a pending state, then this task (tasks)

You can only wait until the tasks that are already executing/running are completed, and the worker (Workers) finishes processing a

Task (tasks) will continue to process the next pending task (tasks).

For a SQL batch request, the worker (Workers) handles the task that carries the SQL batch (tasks)

Each SQL statement inside the SQL batch is executed.

Some people will ask: SQL in a batch of SQL statements is not parallel execution?

(= = Requests the request and Task task "), a batch request comes in, and multiple workers process each SQL statement in the batch request,

This is obviously the concurrency SQL statement that is being processed by SQL Server.

Many people have this idea, which is actually wrong, and actually these SQL statements are executed serially , and these SQL statements are executed only by the

a separate thread (worker worker) to execute, and the thread (worker worker) executes an SQL statement before executing the next SQL statement.

When the SQL statement inside the SQL batch uses the parallel hint maxdop>1 to execute the SQL statement, this results in creating subtasks (sub-tasks),

Each sub-task (sub-tasks) is executed by the loop above: The task is created and then suspended .

The other (worker worker) has to deal with this subtask (sub-tasks)

You will see the current list of worker workers in SQL Server and their current status in the DMV view of Sys.dm_os_workers.

Explanation (parsing) and compilation (compilation)

Once a task starts executing a request, the first thing to do is to understand the contents of the request

In this step, SQL Server behaves more like a virtual machine (similar to a JVM) that is interpreted by the code: the TSQL code in the request will be interpreted individually

and an abstract syntax tree is generated to handle this request. The entire batch request is interpreted and compiled, and if an error occurs in this step,

SQL Server will give you a hint to compile/interpret the error, the request will be terminated and will not be executed, the task and worker (worker) will be released.

The freed worker will continue to process the next suspended task.

SQL language and TSQL (called tsql,oracle plsql in SQL Server) language is an advanced descriptive language

When a SQL statement is complicated, imagine that a SELECT statement is accompanied by multiple join

1 use [Gposdb] 2 GO 3 SELECT * FROM [dbo]. [Ct_append] As a 4 INNER JOIN  5 [dbo].[ Ct_fuelingdata] as B  6 on A.[vc_a_cardno]=b.[vc_fd_cardno] 7 INNER JOIN 8 [dbo].[ Ct_dis_fuelingdata] as D 9 on a.[vc_a_cardno]=d.[vc_fd_cardno]10 INNER JOIN one [dbo]. [Ct_inhousecard] As E12 on e.[vc_ic_cardno]=d.[vc_fd_cardno]13 INNER JOIN14 [dbo]. [Ct_outercard] As F15 on F.[vc_oc_cardno]=a.[vc_a_cardno]


A compiled TSQL batch does not produce executable code (executable code, like an executable binary EXE file),

This is more like a local CPU directive, or even a CLI directive like C # or a Java JVM bytecode

However, there is an execution plan (query plans) for accessing table data, which describes how to access tables and indexes,

How to search and locate the row data inside the table, how to do the data operation according to SQL statements in SQL batch processing.

For example, an execution plan would describe a data access path-"Access the index idx1on the T-table , and locate the row record with the keyword ' k ',

Finally, both column A and column B columns are returned.

Also: Developers often make a common mistake.

In a TSQL statement to write a lot of conditional selection, usually these conditions will be used in a WHERE clause with or

For example:[email protected] OR @parameter is NULL

This is something that must be avoided for developers.

At this point, the compilation must come up with a generic execution plan that will produce optimal results regardless of any parameters being placed into the execution plan.

Parameterization in TSQL (Dynamic Search Conditions)

For example, the following SQL statement:

1 SET STATISTICS profile ON2 GO3 INSERT into [dbo]. [Systempara] ([Paravalue], [Name], [Description]) 4 VALUES  (' 2 ',---Paravalue-varchar (5)           ' 3 ',--Name-varchar () 6           ' 4 '  --Description-varchar (7)           

When you open the Set STATISTICS profile on switch, you will see in the Argument column and the Definedvalues column

SQL Server assigns the value entered 2,3,4 to the three variables in expr1004,expr1005,expr1006

And do some type conversions, Expr1004=convert_implicit (VARCHAR (50), [@1],0)

2 This value will be entered into the @1 variable and then assigned to Expr1004 by the type conversion.

Recordno This column is also, through the Getidentity ((277576027), (), NULL) function to obtain self-increment

and assign it to Expr1003.

So, in SQL Server's execution plan, you can imagine the following

1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (expr1003,expr1004,expr1005,expr1006)

Assigns the actual value to the @1,@2,@3,@4 and assigns the value to expr1003,expr1004,expr1005,expr1006 by type conversion

expr1003= type conversion (@1)

expr1004= type conversion (@2)

expr1005= type conversion (@3)

expr1006= type conversion (@4)

Why does SQL Server not use the following execution plan directly?

1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (1,2,3,4)

Also type conversion, parameter generation in this trouble, SQL Server is not sick???

This involves performing plan reuse, and if you use the above execution plan, the compilation time is very fast, but

If the value I inserted is: 9,8,6,7

1 INSERT into [dbo]. [Systempara] ([Recordno], [Paravalue], [Name], [Description]) 2 VALUES (9,8,6,7)

SQL Server cannot reuse the last execution plan and rebuild the execution plan, you say such efficiency ...

Optimization (optimization)

Just talking about choosing a data access Path (execution plan), now continue to the next step in the life cycle of a request: optimizing

In SQL Server, optimization means selecting the best data access path from multiple selection criteria.

Consider that if you have a simple join query involving two tables, each table has an additional index.

There are 4 optional execution options to access the data in the table

Because there are so many alternatives, query complexity is already high, and if the index in the table continues to increase, query complexity may grow exponentially

Plus join joins are three ways to join:nested loops join, merge join, hash join

As you can imagine, how important it is to optimize this noun in SQL Server, SQL Server uses a query optimizer to estimate how much time is spent in the middle, io,cpu

The query optimizer considers a variety of execution scenarios, and SQL Server makes every effort to evaluate it based on the cost of each execution scenario, and then chooses as much as possible one of the lowest-cost

Implementation of the programme. SQL Server first calculates how much overhead each of the various execution scenarios requires under the existing table data volume. In order to elect a least-cost execution plan,

SQL Server needs to know the amount of data for each table to be joined and the distribution of the data for each field in the table, which relies on statistical information ,

Because statistics are meant to be used to count these data. Another factor to consider is the CPU and memory consumption required for each execution scenario

Combined with all of these factors, Sqlsrver will work out a cost value in each of the execution scenarios.

SQL Server will select an execution plan with the lowest cost in these execution plans as execution

Let's see, SQL Server has to consider all the above factors, and it takes time to think about it, so why SQL Server

The execution plan needs to be cached in memory in order to continue using the execution plan in the future in order to save compilation time

The same requests go into SQL Server in the future, and these requests can find a compiled and optimized execution plan in the cache

They'll be able to skip the optimization phase of the query optimizer

It is important to note that the same request comes in SQL Server, regardless of whether there are any reusable execution plans in the cache, SQL Server needs

The SQL statement in the request is parsed , so I said above: just to save compilation time and not to save parsing/compiling time

Interpreting and compiling modules (modular)

Execution (execution)

Once the query optimizer chooses an execution plan, the request is ready to execute. The execution plan is translated into an actual execution tree

Each tree node is an operator, and all operators implement an abstract interface with 3 methods, open (), next (), Close ()

If you are a C # programmer or a Java programmer, it is not difficult to understand what is an interface, what is a method, what an abstract interface

There are relevant information on MSDN:Showplan logical operator and physical operator reference

A query plan is a tree (the execution tree) that consists of physical operators


logical operators
The logical operators describe the relational algebra operations used to process statements. In other words, a logical operator conceptually describes what needs to be done.


Physical operators
The physical operator implements the operation described by the logical operator. Each physical operator is an object or routine that performs an action. For example, some physical operators can access columns or rows in a table, index, or view. Other physical operators perform other operations, such as calculations, aggregations, data integrity checks, or joins. Physical operators have the overhead associated with them.
The physical operator initializes, collects data, and then shuts down. Specifically, the physical operator can respond to the following three method invocations:
Init (): The init () method initializes the physical operator and sets all required data structures. Although a physical operator typically receives only one Init () call, it can also receive many calls.
GetNext (): The GetNext () method causes the physical operator to obtain the first or subsequent row of the data. Physical operators can not receive GetNext () calls, or they can receive many calls.
The close (): Close () method causes the physical operator to perform some cleanup operations and then close. A physical operator receives only one Close () call.
The GetNext () method returns a data row whose number of calls is displayed as actualrows in the Showplan output generated using the set STATISTICS profile on or set STATISTICS XML on. For more information about these set options, see Set STATISTICS profile (Transact-SQL) and set STATISTICS XML (Transact-SQL).

The operator in this article actually refers to the physical operator: three methods refer to open () =init (), Next () =getnext (), Close () =close ()

Each physical operator is a three method that calls itself

During SQL Server execution of the request, the root node of the execution tree is repeatedly called open () and then repeated calls to next () until the value of false is returned

Finally call Close (). The operator of the root node of the tree calls the same operator of his child node in turn, and the child node calls the same operator of his child node in turn

has been called down. Leaf nodes in a tree are generally the physical operators that read table data or table indexes. The middle node of the execution tree is generally a number of operators that implement different data operations

For example: Filter table data, join joins, sort data. Those queries that use parallelism use a special operator called Exchange oprators (Exchange operation)

The exchange operator operator uses multithreading (tasks = workers) during execution, invoking each thread to execute the subtree's execution plan.

The output of these operators is then aggregated and the typical (multi-producer "-" one consumer pattern) is used in this process.

Refer to this article for Exchange oprators (Exchange Operations):bitmap operators in parallel processing and execution plans in SQL Server 2000

We can see the execution tree using SET STATISTICS profile, here are some of the names of the columns, not to mention in more detail, there is a lot of information on the Internet

NodeId: Tree node

Parent: Father Node

Physicalop: physical operator

LogicalOp: Logical operators

The execution model of this execution tree is not only applied to query, insert, delete, update execution is also executed using the execution tree.

Inserting records, deleting records, and updating records have the appropriate operators

A situation in which an execution tree has no subtrees

A situation in which an execution tree has subtrees

If the execution tree has a subtree, he executes it from the leaf node of the subtree and executes to the root node of the tree.

In particular, these operators also have a stop-and-continue behavior feature, meaning that unless their child node operators have absorbed all of their input, they can generate input

For example: the sort operator, which does not return any results when the next () function is initially called because his child nodes have not yet read all the data,

This is the time to stop executing the next () function (each operator will call the next function when it is created) until his child has read all the data.

To sort (continue calling the next () function), take out the result set and sort

If the data is already cached in memory, SQL Server does not need to go to the disk to fetch data, directly in the memory of the data, in the memory of this space,

SQL Server official term is called: Buffer pool

In memory, the official term for SQL Server is to cache the space in the execution plan: plan cache

Module of Execution (modular)

Results (Results)

After execution, Sqlerver returns the result set to the client application

When executing to the root node of the execution tree, the root node is usually responsible for writing the result set to the network buffer (buffers)

These result sets are then sent back to the client. A complete result set has not been created, and some of the results are first stored in intermediate storage (memory or disk)

Then piecemeal to the client, such as a SQL statement query results need to return 10 records, 3 records have been generated, can be returned to the client

SQL Server first places these 3 records in intermediate storage (memory or disk), or it can be called a network buffer, and so on the client to fetch the 3 records, and so on.

When the result set is returned to the client, SQL Server uses the network flow Control protocol.

If the client is not actively taking these result sets (for example, calling Sqldatareader.read ()). Will eventually cause the network flow control component to block

The result set send side and suspends execution of the query.

Only the network flow control component coordinates and alleviates the need for network resources (the network is not blocked), the query recovers, and the result set continues to be generated

Don't know if you've ever met a wait type: Async_network_io waiting

, the client two waits, in the Sqlsrver query will show the async_network_io type of wait

Interestingly, the output parameter returns, and the value of the output parameter is inserted into the network data stream that is returned to the client's result set.

When the request is complete, the output parameter value can only be written to the result set at the end of the query execution, which is why the output parameter value

The value of the output parameter can be checked only if all result sets are returned

Memory to be assigned during query execution (query execution Memories Grant)

Some operators require a fixed amount of memory to perform their work. Sort operator in order to sort requires memory to store data entered into the sort operator

Hash join and hash aggregation must build large hash tables to perform their work. The execution plan knows how much memory is required for those outstanding operators

Depending on the type of operator, the estimated row record, the operator must handle the size of the field that the statistic provides to his table.

The total memory required by those operators in the execution plan is often referred to as memory-giving .

Imagine that when very many concurrent queries are executed, because of the large number of expensive operators (these operators generally require a lot of memory, so called expensive)

Need to request memory, at the same time they can run out of computer memory.

To prevent this from happening, SQL Server uses something called a " resource semaphore ." This will ensure that the total memory allocation of the query being executed will not exceed

The sum of memory in the current computer. When the total memory allocation is running out of available memory on the current server, the queries that are executing must wait for those to be executed quickly.

Queries to release the memory they have.

You can query sys.dm_exec_query_memory_grants this DMV view to get the current memory allocation (requested memory, allocated memory)

When a query must wait for memory to be assigned/allocated, you can see the execution Warnings event type in SQL Profiler

Execution Warnings Event type indicates a memory allocation warning during SQL statement or stored procedure execution

This event type is capable of monitoring certain queries that must wait for one second or more of memory, or for queries that fail to get memory

In SQL Profiler, some memory-related event types

Exchange Spill Event Types

Sort Warnings Event Type: Insufficient memory required when sorting

Hash Warning Event Type

Related statements

1 SELECT * FROM Sys.dm_exec_query_resource_semaphores2 3 4 select * FROM sys.dm_exec_query_memory_grants

1 SELECT [session_id], 2 [request_id], 3 [start_time], 4 [status], 5 [command], 6 [Wait_type], 7 [text_size], 8 [language] , 9 [transaction_isolation_level],10 [row_count],11 [granted_query_memory],12] [executing_managed_code]13 from Sys. [Dm_exec_requests]

How do I take advantage of this information (how can I use all this information)

The above information may help you resolve performance troubleshooting problems (performance issues)

Once you understand that your client is sending multiple requests to sqlserver,sqlserver in creating multiple tasks (Task) to process

You send him a request, the performance of the puzzle can be easily solved: many times, your task is not executing (occupied CPU) is waiting

Each time you wait, SQL Server relies on internal waiting statistics to collect waiting information (wait for what and how long).

It's a great way to use the collected statistics to solve performance bottlenecks.

Enclosed are two complete graphs

Summarize

The text seems to be missing scheduler.

Scheduler

For each logical cpu,sqlserver there will be a scheduler corresponding to the CPU object on the SQL plane,

Only the worker who gets scheduler ownership can run on this logical CPU.

The translation has ended ~

If it is helpful to you, I hope to give you a recommendation O (∩_∩) o

Related articles:

SQL Server Performance Issues-Wait Resource_semaphore

How to know how long the TSQL statement has been running

SQL Server connection encryption (1)--SQL Server Connection encyption

If there is a wrong place, welcome everyone to shoot brick O (∩_∩) o

-----------------------------------------------------------------------------------------

2013-10-26 Supplement

About Time Statistics

1 SET STATISTICS time on 2 GO
1 SQL Server parse and compile time: 2    CPU time = 0 milliseconds, occupied time = 58 milliseconds. 3 4 SQL Server Execution time: 5    CPU time = 0 milliseconds, occupied time = 1 milliseconds.

I think the data statistics that SQL Server displays should be the insertion of time statistics code at the beginning and end of the parsing and compiling modules and executing modules.

To make a count of the time used.


About Time Statistics range

The red circle part of the figure I think is where the SQL Server team inserts the time statistics code

Parse and compile time: Regardless of the plan cache, from the start of the command parser to the left of the query optimizer end

Execution time: starting from the query executor, leave the query executor end

Some people will think that it should be the result set is stored in the network buffer or the result set real to the client's hand is the end of execution time

-------------------------------------------------------------------------

But I don't think so. The result set has been generated before reaching the network buffer , indicating that the query execution is complete.

Query execution complete meaning: All the results have been generated, not to say that the customer to 10 records, Sir, 3 Records, the 3 records into the network buffer to the client to take away

This statistic is unscientific, it should be 10 records have been generated (as standard), and before transmission to the network buffer

The client does not have to take away and when the result set arrives at the client, SQL Server does not need to be concerned because of various situations, such as: Network blocking

This cannot be counted as the execution time of SQL Server.

It is not possible for the SQL Server team to write the time statistics code on the client side, and the client is not part of the Sqlserver,sqlserver team.

Time statistics code written in the client's application ah???

I wonder what your opinion is???

Takes you to understand how SQL Server executes a query.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.