Some summary of SQL Server index "go"

Source: Internet
Author: User
Tags create index filegroup generator prev

1.1.1 Summary
If we want to optimize the database, we can mainly use the following five methods to optimize the database system.

1. Computer Hardware tuning
2. Application Tuning
3. Database Index Optimization
4. SQL statement Optimization
5. Transaction Handling Tuning

In this blog post, we will want to tell you about the index type and usage in the database, this article takes SQL Server as an example, for other technology platform friends also have reference value, just replace the corresponding code on the line!

Indexes enable the database engine to perform faster, targeted data retrieval, rather than simply full table scans.

In order to use a valid index, we have to understand the composition of the index, and we know that adding an index to the data table necessarily requires creating and maintaining an index table, so we want to globally measure whether adding an index can improve the query performance of the database system.

At the physical level, the database consists of data files that can form filegroups and then be stored on disk. Each file contains many extents, each with a size of 64K consisting of eight physically contiguous pages (one page 8K), we know that the page is the basic unit of the data store in the SQL Server database. The disk space allocated for the data file (. mdf or. ndf) in the database can be logically divided into pages (numbered 0 through n consecutively).

The types that are stored in the page are: data, indexes, and overflows.

Files and Filegroups
In SQL Server, the file that holds the data is managed through the logical object of the filegroup.

1.1.2 Body

At the physical level, the database consists of data files that can form filegroups and then be stored on disk. Each file contains many extents, each with a size of 64K consisting of eight physically contiguous pages (one page 8K), we know that the page is the basic unit of the data store in the SQL Server database. The disk space allocated for the data file (. mdf or. ndf) in the database can be logically divided into pages (numbered 0 through n consecutively).

The types that are stored in the page are: data, indexes, and overflows.

Files and Filegroups
In SQL Server, the file that holds the data is managed through the logical object of the filegroup.

Figure 1 Database File organization

At the top level is our database, because the database is composed of one or more filegroups, and the filegroup is composed of one or more files?? Logical group, so we can spread the filegroups across different disks, allowing user data to span as many devices as possible, multiple I/O running, avoiding I/O competition, thus balancing I/O loads and overcoming access bottlenecks.

Zones and Pages
2, the file is composed of the district, and the District is composed of eight physically contiguous pages, because the size of the area is 64K, so each time an additional zone file increases 64K.

Figure 2 File composition

The data types that are saved in the page are: Table data, index data, overflow data, allocation mappings, page free space, index allocations, and so on, as shown in:

Page type

Content

Data

When the text in row is set to ON, contains data rows for all data except text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data.

Index

The index entry.

Text/image

Large Object data types: text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data. Variable-length data type columns when data rows exceed 8 KB: varchar, nvarchar, varbinary, and sql_variant

Global Allocation map, Shared Global Allocation map

Information about whether the zone is assigned.

Page Free Space

Information about page assignments and the available space for the page.

Index Allocation Map

Information about the extents used by the table or index in each allocation unit.

Bulk Changed Map

Information about the extents modified by bulk operations in each allocation unit since the last BACKUP LOG statement.

Differential Changed Map

Information about the extents in each allocation unit that were changed since the last BACKUP DATABASE statement.

Data types saved in Table 1 page

On the data page, the data rows are placed in order immediately following the header (header), which contains the identity value, such as the page number or object ID of the object data, the data row holds the actual data, and finally, the end of the page is the row offset table, and for each row in the page, each row offset table contains an entry. Each entry records the distance of the first byte of the corresponding row from the page header, and the order of entries in the row offset table is reversed from the order of the rows in the page.

Figure 3 Data page

basic structure of the index
"Index" provides the speed of the query "This is the most basic explanation of the index, and next we will give you a more in-depth understanding of the index by introducing the composition of the index.

An index is a unique structure in a database, and because it holds database information, we need to allocate disk space and maintain index tables. Creating an index does not alter the data in the table, it simply creates a new data structure pointing to the datasheet; For example, when we use a dictionary to look up the word, we first need to know to query the starting letter of the word, and then go to the Directory page, and then find the exact page of the word, then our directory is the index table, and the

Of course, indexes are more complex than dictionary catalogs because the database must handle operations such as insertions, deletions, and updates, which will cause the index to change.

Leaf node

Assuming that the data on our disk is physically ordered, the database will inevitably cause changes in the data when it is inserted, deleted, and updated, and if we want to keep the data contiguous and orderly, then we need to move the physical location of the data, which will increase the disk I/O and make the entire database run very slowly The primary purpose of using an index is to keep the data logically organized so that the data is independent of the physically ordered storage.

In order to achieve logical order of the data, the index uses the data structure of the doubly linked list to maintain the logical sequence, if you want to insert a new node in two nodes, you only need to modify the predecessor and successor of the node, and do not need to modify the physical location of the new node.

The doubly linked list (doubly linked list), also known as a doubly linked list, is one of the linked lists, which has two pointers in each data node, pointing directly to successive and direct precursors respectively. So, starting from any node in a doubly linked list, it is easy to access its predecessor and successor nodes.

In theory, the time complexity of removing an element from a doubly linked list is O (1), and if you want to delete an element with a given keyword, then the worst case time complexity is O (n).

During the removal process, we only need to connect the front and back nodes of the node that will be deleted, and then set the front and back nodes of the node to be deleted to null.

Copy CodeThe code is as follows:
Pseudo code
Node.prev.next=node.next;
Node.next.prev=node.prev;
Node.prev=node.next=null;




Figure 4 The leaf nodes of the index and the corresponding table data

As shown in 4, the Index leaf node contains the index value and the corresponding RID (ROWID), and the leaf node is connected in an orderly way through a doubly linked list, while our main data table differs from the index leaf node, the data in the table is stored out of order, they are not all stored in the same block, and there is no connection between the blocks.

In general, the index holds the physical address value of the specific data.

type of index

We know that there are two types of indexes: Clustered indexes and nonclustered indexes.
Clustered index: Physical storage is sorted by index.
Nonclustered indexes: Physical storage is not sorted by index.

Clustered Index

The data pages of a clustered index are stored physically and sequentially, the data pages are the leaf nodes of the clustered index, and the data pages are connected in a doubly linked list, and the actual data is stored in the data page. When we add an index to a table, the data in the table is sorted according to the index.
Suppose we have a table T_pet, which contains four fields: Animal,name,sex and age, and uses animal as the index column, the SQL code is as follows:

Copy CodeThe code is as follows:
-----------------------------------------------------------
----Create t_pet table in tempdb.
-----------------------------------------------------------
Use tempdb
CREATE TABLE T_pet
(
Animal VARCHAR (20),
[Name] VARCHAR (20),
Sex CHAR (1),
Age INT
)
CREATE UNIQUE CLUSTERED INDEX t_petonanimal1_clteridx on T_pet (animal)


-----------------------------------------------------------
----Insert data into data table.
-----------------------------------------------------------

Copy CodeThe code is as follows:
DECLARE @i int
SET @i=0
while (@i<1000000)
BEGIN
INSERT into T_pet (
Animal
[Name],
Sex
Age
)
SELECT [Dbo].random_string (one) animal,
[Dbo].random_string (one) [name],
' F ' sex,
Cast (rand () as int) age
SET @[email protected]+1
END
INSERT into T_pet VALUES (' Aardark ', ' Hello ', ' F ', 1)
INSERT into T_pet VALUES (' Cat ', ' Kitty ', ' F ', 2)
INSERT into T_pet VALUES (' Horse ', ' Ma ', ' F ', 1)
INSERT into T_pet VALUES (' Turtles ', ' SiSi ', ' F ', 4)
INSERT into T_pet VALUES (' Dog ', ' tomma ', ' F ', 2)
INSERT into T_pet VALUES (' Donkey ', ' YoYo ', ' F ', 3)



Figure 5 Clustered Index

As shown in 5, the first and second layers from left to right are index pages, the third layer is the data page (leaf node), the data page is connected by a doubly linked list, and the data in the page is sorted by index; Suppose we want to find the name of Xnnbqba animal Ifcey, Here we use animal as the index of the table, so the database is first looked up based on the index, and when the index value animal = ' Ifcey is found, then the data page (leaf node) of the index is searched for the specific data. The specific query statements are as follows:

Copy CodeThe code is as follows:
SET STATISTICS profile on
SET STATISTICS time on

SELECT animal, [name], sex, age
From T_pet
WHERE animal = ' Ifcey '

SET STATISTICS Profile OFF
SET STATISTICS Time OFF


When we execute the SQL query plan, we put the mouse pointer on the "clustered index lookup", which appears as information, we can see an important information Logical operation--clustered Index SEEK,SQL query is directly based on the clustered index to get records , the fastest query.

Figure 6 Query plan

From the results of the query, we found that the query step was only 2 steps, first by clustered index seek to quickly find the indexes Ifcey, and then query the index of the leaf node (data page) to obtain data.

Query execution Time: CPU time = 0 milliseconds, elapsed time = 1 milliseconds.

Figure 7 Query Results

Now that we have deleted the index in the table and re-executed the query plan, we can see that the logical operation has become a table Scan, and the query speed is quite slow because there are 1 million rows of data in the table.

Figure 8 Query plan

From the query results, we found that the query step becomes 3 steps, first through the table scan to find animal = ' ifcey ', when executing the query, SQL Server automatically parses the SQL statement, and it estimates that our query is more time-consuming, So the database makes concurrent operations to speed up the query.

Query execution Time: CPU time = 329 milliseconds, elapsed time = 182 milliseconds.

Figure 9 Query Results

With the above clustered index and no comparison, we found a difference in query performance if the index database was used first to look up the index instead of the aimless full table traversal.

Nonclustered indexes

In the absence of a clustered index, the data pages in the table are stored in heap (heap), which is a table without a clustered index, and the heap store in SQL Server stores the new rows of data in the last page.

A nonclustered index is a physical store that is not sorted by index, and the leaf node (index leaf pages) of a nonclustered index contains a pointer or clustered index that points to a specific row of data, and no connection between the data pages is a relatively separate page.

Suppose we have a table T_pet, which contains four fields: Animal,name,sex and age, and uses animal as a non-indexed column, the SQL code is as follows:

Copy CodeThe code is as follows:
-----------------------------------------------------------
----Create T_pet table in tempdb with nonclustered INDEX.
-----------------------------------------------------------
Use tempdb
CREATE TABLE T_pet
(
Animal VARCHAR (20),
[Name] VARCHAR (20),
Sex CHAR (1),
Age INT
)
CREATE UNIQUE nonclustered INDEX t_petonanimal1_nonclteridx on T_pet (animal)





Figure 10 Nonclustered indexes

Next we want to query the table of animal = ' Cat ' pet information, the specific SQL code is as follows:

Copy CodeThe code is as follows:
SET STATISTICS profile on
SET STATISTICS time on

SELECT animal, [name], sex, age
From T_pet
WHERE animal = ' Cat '

SET STATISTICS Profile OFF
SET STATISTICS Time OFF


As shown, we found two steps to the far right of the query plan: RID and index lookups. Because these two lookups are slower relative to the clustered index lookup (Clustered index Seek).

Figure 11 Query plan

SQL Server first looks up the index value and then finds the data row based on the RID until it finds the results that match the query criteria.

Query execution Time: CPU time = 0 milliseconds, elapsed time = 1 milliseconds

Figure 12 Query Results

Heap Table non-clustered index

Because the heap is a table that does not contain a clustered index, the leaf nodes of the nonclustered index will contain pointers to specific data rows.

Taking the previous T_pet table as an example, assuming that T_pet uses the animal column as a nonclustered index, its heap table nonclustered index structure looks like this:

Figure 13 Heap Table nonclustered index

By, we find that the nonclustered index is connected by a doubly linked list, and the leaf node contains pointers to specific rows of data.

If we are looking for animal = ' Dog ' information, first we traverse the first level index, then the database determines that the dog belongs to the cat-scoped index, then traverses the second-level index, then finds the saved pointer information in the dog index, obtains the data from the corresponding data page according to the pointer information, Next, we'll show you through specific examples.

Now we create the table employees and then add the heap table nonclustered index to the table, with the following SQL code:

Copy CodeThe code is as follows:
Use tempdb
----creates a sample table.
CREATE TABLE Employees (
employee_id NUMERIC not NULL,
First_Name VARCHAR (+) not NULL,
Last_Name VARCHAR () not NULL,
Date_of_birth DATETIME,
Phone_number VARCHAR (+) not NULL,
Junk CHAR (1000),
CONSTRAINT employees_pk PRIMARY KEY nonclustered (employee_id)
);


Go now we look for employee information for employee_id = 29976.

Copy CodeThe code is as follows:
SELECT *
From Employees
WHERE employee_id = 29976


The query plan looks like this:

Figure 14 Query plan

First, the index value employee_id = ' 29976 ' is searched, and then the data rows that match the criteria are found based on the RID, so the query efficiency of the heap table index is not as good as the clustered table, and next we will describe the nonclustered index of the clustered table.

Clustered Table Nonclustered index

When a clustered index exists on a table, the leaf node of any nonclustered index no longer contains a pointer value, but rather contains the index value of the clustered index.

Take the previous T_pet table as an example, assuming that T_pet uses the animal column as a nonclustered index, then its index table nonclustered index structure looks like this:

Figure 15 Index Table nonclustered index

Through, we find that the nonclustered index is connected by a doubly linked list, and the leaf node contains the index value of the index table.

If we're looking for animal = ' Dog ' information, first we traverse the first level index, then the database determines that the dog is a cat-scoped index, then traverses the second-level index, then finds the saved index value in the dog index, then gets the data from the corresponding data page based on the index value.

Next we modify the previous Employees table, we first delete the previous heap table nonclustered index, and then increase the index table nonclustered index, the specific SQL code is as follows:

Copy CodeThe code is as follows:
ALTER TABLE Employees
DROP CONSTRAINT EMPLOYEES_PK

ALTER TABLE Employees
ADD CONSTRAINT employees_pk PRIMARY KEY CLUSTERED (employee_id)
GO

SELECT * FROM Employees
WHERE employee_id=29976

Figure 16 Query plan

Validity of the index
Every time SQL Server executes a query, it first checks to see if the query has an execution plan, and if not, to generate an execution plan, what is the execution plan? In short, it can help SQL Server develop an optimal query plan. (Please refer to the query plan here)

Below we will illustrate the use of indexes in SQL Server using specific examples, first we define a table Testindex, which contains three fields Testindex,bitvalue and filler, with the following SQL code:

Copy CodeThe code is as follows:
-----------------------------------------------------------
----Index Usefulness sample
-----------------------------------------------------------

CREATE TABLE Testindex
(
Testindex int identity (constraint) Pktestindex primary key,
Bitvalue bit,
Filler char (+) NOT NULL default (replicate (' A ', 2000))
)

CREATE INDEX xtestindex_bitvalue on Testindex (Bitvalue)
GO

INSERT into Testindex (bitvalue)
VALUES (0)
GO 20000--runs current batch 20000 times.

INSERT into Testindex (bitvalue)
VALUES (1)
GO--puts rows into table with value 1


We then queried the data rows of Bitvalue = 0 in the table, and the data in table Bitvalue = 0 had 2000 rows.

Copy CodeThe code is as follows:
SELECT *
From Testindex
WHERE bitvalue = 0


Figure 17 Query plan

Now we query the data row for bitvalue = 1.

SELECT *from Testindexwhere bitvalue = 1

Figure 18 Query plan

Now that we have noticed different data queries on the same table, what exactly is the reason for executing a very different query plan?

We can see the detailed usage of the indexes in the table by using DBCC SHOW_STATISTICS, with the following SQL code:

Copy CodeThe code is as follows:
UPDATE STATISTICS Dbo.testindex
DBCC show_statistics (' Dbo.testindex ', ' Xtestindex_bitvalue ')
With histogram



Figure 19 Histogram

Through the histogram above, we know that SQL Server estimates Bitvalue = 0 rows of data row has about 19989 rows, while Bitvalue = 1 estimated about 21;sql server optimizer based on the data volume estimates, take different execution plans, thus reaching the optimal query performance, Because Bitvalue = 0 data is large, SQL Server can only provide a scan of the clustered index to get the corresponding data row, while Bitvalue = 1 The actual data row only 10 rows, SQL Server first through the key to find Bitvalue = 1 data rows, The nested loops are then joined to the clustered index to obtain the remaining data rows.

Summarize the full instance code:

Copy CodeThe code is as follows:
-- =============================================
--Author:jkhuang
--Create date:04/20/2012
--Description:create Sample for Clustered and
--Nonclustered index.
-- =============================================

-----------------------------------------------------------
----Create T_pet table in tempdb with nonclustered INDEX.
-----------------------------------------------------------
Use tempdb
CREATE TABLE T_pet
(
Animal VARCHAR (20),
[Name] VARCHAR (20),
Sex CHAR (1),
Age INT
)
CREATE UNIQUE nonclustered INDEX t_petonanimal1_nonclteridx on T_pet (animal)
CREATE UNIQUE CLUSTERED INDEX t_petonanimal1_clteridx on T_pet (animal)
-----------------------------------------------------------
----Insert data into data table.
-----------------------------------------------------------
DECLARE @i int
SET @i=0
while (@i<1000000)
BEGIN
INSERT into T_pet (
Animal
[Name],
Sex
Age
)
SELECT [Dbo].random_string (one) animal,
[Dbo].random_string (one) [name],
' F ' sex,
Cast (rand () as int) age
SET @[email protected]+1
END
INSERT into T_pet VALUES (' Aardark ', ' Hello ', ' F ', 1)
INSERT into T_pet VALUES (' Cat ', ' Kitty ', ' F ', 2)
INSERT into T_pet VALUES (' Horse ', ' Ma ', ' F ', 1)
INSERT into T_pet VALUES (' Turtles ', ' SiSi ', ' F ', 4)
INSERT into T_pet VALUES (' Dog ', ' tomma ', ' F ', 2)
INSERT into T_pet VALUES (' Donkey ', ' YoYo ', ' F ', 3)

SET STATISTICS profile on
SET STATISTICS time on
SELECT animal, [name], sex, age
From T_pet
WHERE animal = ' Cat '
SET STATISTICS Profile OFF
SET STATISTICS Time OFF

-----------------------------------------------------------
----Create Employees table in tempdb.
-----------------------------------------------------------
CREATE TABLE Employees (

employee_id NUMERIC not NULL,
First_Name VARCHAR (+) not NULL,
Last_Name VARCHAR () not NULL,
Date_of_birth DATETIME,
Phone_number VARCHAR (+) not NULL,
Junk CHAR (1000),
--PK constraint defaults to clustered
CONSTRAINT employees_pk PRIMARY KEY (employee_id)
);
GO

-----------------------------------------------------------
----Insert data into data table.
-----------------------------------------------------------
CREATE VIEW Rand_helper as SELECT Rnd=rand ();
GO
----generates random string function.
CREATE FUNCTION random_string (@maxlen int) RETURNS VARCHAR (255)
As BEGIN
DECLARE @rv VARCHAR (255)
DECLARE @loop int
DECLARE @len int
SET @len = (SELECT CAST (RND * (@maxlen-3) as INT) +3
From Rand_helper)
SET @rv = ' '
SET @loop = 0
While @loop < @len BEGIN
SET @rv = @rv
+ CHAR (CAST ((SELECT rnd
From Rand_helper) * as INT) +97)
IF @loop = 0 BEGIN
SET @rv = UPPER (@rv)
END
SET @loop = @loop +1;
END
RETURN @rv
END
GO
----generates random date function.
CREATE FUNCTION random_date (@mindaysago int, @maxdaysago int)
RETURNS VARCHAR (255)
As BEGIN
DECLARE @rv datetime
SET @rv = (SELECT GetDate ()
-Rnd * (@[email protected])
-@mindaysago
From Rand_helper)
RETURN @rv
END
GO
----generates random int function.
CREATE FUNCTION random_int (@min int, @max int) RETURNS int
As BEGIN
DECLARE @rv INT
SET @rv = (SELECT rnd * (@max) + @min
From Rand_helper)
RETURN @rv
END
GO
----inserts data into Employees table.
With generator (n) as
(
Select 1
UNION ALL
Select N + 1 from generator
where N < 30000
)
INSERT into Employees (employee_id
, first_name, last_name
, Date_of_birth, Phone_number, junk)
Select N employee_id
, [Dbo].random_string (one) first_name
, [Dbo].random_string (one) last_name
, [Dbo].random_date (20*365, 60*365) DOB
, ' N/a ' phone
, ' junk ' junk
From generator
OPTION (Maxrecursion 30000)
-----------------------------------------------------------
----Index Usefulness sample
-----------------------------------------------------------
CREATE TABLE Testindex
(
Testindex int identity (constraint) Pktestindex primary key,
Bitvalue bit,
Filler char (+) NOT NULL default (replicate (' A ', 2000))
)
CREATE INDEX xtestindex_bitvalue on Testindex (Bitvalue)
GO
INSERT into Testindex (bitvalue)
VALUES (0)
GO 20000--runs current batch 20000 times.
INSERT into Testindex (bitvalue)
VALUES (1)
GO--puts rows into table with value 1
SELECT Filler
From Testindex
WHERE Bitvalue = 1
UPDATE STATISTICS Dbo.testindex
DBCC show_statistics (' Dbo.testindex ', ' Xtestindex_bitvalue ')
With histogram original address: http://www.jb51.net/article/30971.htm

Some summary of SQL Server index "go"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.