Understanding how SQL Server executes queries (1/3)

Source: Internet
Author: User
Tags bulk insert ole

General diagram of query execution:

Each section is described in detail according to the process of the general diagram:

1. Requests (Request)

SQL Server is the platform for the C/s architecture. The only way to interact with it is to send a request that contains a database command. The communication protocol before the application and database is called the TDS (Tabular Data Stream) protocol. Applications can use the following types of clients that implement the TDS protocol:

    • The CLR managed SqlClient,
    • OLE DB,
    • Odbc
    • Jdbc
    • PHP Driver for SQL Server
    • Open-Source FreeTDS

TDS requests fall into the following categories:

    • Batches request (Batch requests)

This request includes only T-SQL text, contains no parameters, but can contain local variables. Execute SqlCommand.ExecuteReader () on SqlCommand object with empty argument list in SqlClient, ExecuteNonQuery (), ExecuteScalar () or ExecuteXmlReader (), it will be a batch request. The Sql:batchstarting event is observed through profile.

    • RPC Request (Remote Procedure call request)

It contains the procedure identifier (Procedure Identifier, used for) and arbitrary mathematical parameters. Different process identifiers represent different system stored procedures. This is the type of request when executing a SqlCommand object with a non-null argument list. The Rpc:starting event can be observed through profile.

    • Bulk load requests (Bulk load request)

Bulk loading is a special type of request used by BULK INSERT operations . For example, bcp tool, OLE DB IRowsetFastLoad interface, and SqlBulkCopy class. It is the only request that can begin execution without the need to complete the packet delivery in the TDS protocol. Once executed, you can use the data in the data flow for the insert operation.

2. Tasks (Task)

When the full request reaches the database engine, SQL Server creates a task to process the request. The request can be observed through sys.dm_exec_requests. A task represents a complete request, not a part of the request statement. Similarly, a new task is not created for some of the statements in the request. Some of the statements in the request are executed in parallel, and the task generates sub-task processing parallelism. When the client takes the data from the result set returned by all requests, the task is completed.

Task conditions can be observed through sys.dm_os_tasks.

3. Working process (Workers)

The initial state of the task created according to the new request is pending. At this stage, SQL Server does not know the content of the request. The task needs to execute the request, and the engine assigns the worker to execute it. ( is assigned, not created )

Workers is the thread pool for SQL Server. A certain number of workers are initialized during SQL Server startup. The maximum number of threads can be configured as needed through the Max_worker_threads parameter. Only the worker executes the code. When there is no idle worker, the task becomes the pending state. When a worker completes a task, it becomes available to select the task execution of the pending state.

In the SQL batch request, theworker selects a task and executes each statement in the batch request . Obviously, the statement in the batch request is executed serially. The previous one is completed before the next is started. Some statements in a batch are executed in parallel, which is done by task creation Sub-task. Each sub-task will experience the same process as a task (such as waiting for the available worker to pick it up and execute it).

You can view the worker and its status through Sys.dm_os_workers.

4. Parsing and Compiling (parsing & compilation)

When a task starts executing, it first needs to figure out the specific content of the request. This phase of SQL Server parses the T-SQL text in the request and generates an abstract syntax tree that represents the request (abstract syntax tree). The entire request will be parsed and compiled. If an error is generated at this stage, a compilation error is returned, and the task and worker are released as a result.

Compiling T-SQL does not produce executable code like local CPU instructions, but it also produces something like bytecode. It produces a query plan. The query plan describes the path to the data access and how to access the object.

5. Optimization (optimization)

Optimization is the best one to choose from a number of query plans. SQL Server employs a cost-based optimizer. It estimates the cost of all possible (most) query plans and chooses the one with the lowest cost. The cost is primarily calculated by calculating the size of the data that the query plan needs to read (the value). To know the size of the data, SQL Server needs to know the size of each table and the distribution of column values (through statistics data). The cost also takes into account CPU and memory usage. The data is then combined into a cost value by a formula, and then the execution plan with the lowest cost value is selected.

The optimization process consumes time and CPU, so when a query plan is eventually generated, it is cached in the plan cache for reuse.

6. Implementation (execution)

Once the optimizer has selected an execution plan, the request can begin execution. The execution plan is converted into the actual execution tree. Each node in the tree is an operator. All operators implement three abstract interfaces: Open (), next (), and Close (). Loop execution involves invoking the open () of the root node, and then calling next () progressively until it returns false, and then calls Close ().

Leaf-level nodes are usually some physical data access operators (access to actual data and indexes), and intermediate nodes are often operators that implement data manipulation such as data filtering, sorting, and linking. Parallel execution has a special operator: the exchange operator. The exchange operator issues multiple threads, each executing a query plan subtree, and then aggregating the output of all child threads using the Multiple-producers-one-consumer method.

The data modification operation also applies to this execution mode.

Some operators are very simple, such as top (N). When it is called next (), it will go to call the child node next () and record the data. When executed n times repeatedly, it returns false and terminates the invocation of the child node and the iteration of the corresponding branch subtree.

Some operators are very complex, such as the nested loop operator. This requires tracking the position of the inner and outer child node loop iterations, calling the outer node's next (), the value rewind the node and constantly calling the inner node's next () until the matching value is found.

Some operators need to wait until the output data of all its child nodes is acquired to produce their own output data. This behavior is also called Stop-and-go. Like the sort operator, it calls Netxt () for the first time, does not return data, and needs to wait until all the data is returned and sorted to return the data.

A HASH join is a very complex and stop-and-go type of operator. In order to construct the hash table, it calls next () of the build side (build side) node until false is returned. Then call the probe side (probe side) next () until it finds a matching value in the hash table, and then returns. Repeat the operation of the probe side until next () returns false.

7. Return results (Results)

Once the query starts executing, it can start returning data to the client program. When the execution tree begins to produce the returned data, the topmost operator is responsible for writing the data to the network cache and sending it to the client. The results returned from execution are not cached anywhere, but are returned to the client as soon as they are generated.

Obviously, returning data over the network to the client is constrained by the network traffic Control Protocol. If the client fails to take the returned data in time, it eventually blocks the sending behavior of the data sender and causes the query execution to be suspended. When the data reception ability of the client is normal, the sending behavior and query execution of the sender are reset, and the returned result data is normally produced.

The output value of the outputs parameter can be written to the data flow only after the execution plan has completed. So it can only be read after all the returned results have been taken away by the client.

Summarize:

1. This is a translation and the plan is divided into 3 parts. The use of learning, non-verbatim translation, many are combined with their own understanding of the translation, compared with the original content, there are some additions and deletions.

2. Original address: Understanding how SQL Server executes a query

Understanding how SQL Server executes queries (1/3)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.