Database index: What do you need to know?

Last Update:2013-12-15 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Database Data Storage

1.1 file:

Once a database is created, two files are generated:

DataBaseName. mdf: master file, which is the final place where data in the database is stored.

DataBaseName. ldf: a log file, a series of log records generated by data operations.

1.2 partition:

The basic storage unit that allocates space for tables and indexes in a given file. One partition occupies 64 KB and consists of 8 consecutive pages. If a partition is full but a new record needs to be saved, this record will occupy the space of the entire new partition.

Page 1:

One allocation unit in the partition. This is where the actual data rows are finally stored. Page is used to store data rows.

SQL Server has multiple types of pages:

Data, Index, BLOB, GAM (Global Allocation Map), SGAM, PFS (Page Free Space), IAM (Index Allocation Map), BCM (Bulk Changed Map), and so on.

Recommended by BKJIA database channel: MySQL index: a double-edged sword in database performance

2. Index

2.1.1 Index

An index is a disk structure associated with a table or view, which can accelerate the speed of retrieving rows from a table or view. An index contains keys generated by one or more columns in a table or view. These keys are stored in a structure B tree), so that SQL Server can quickly and effectively find the rows associated with the key value.

In layman's terms, indexes are related to tables or views to accelerate retrieval. Indexes occupy storage space. data is stored in the form of B trees Through indexes. Therefore, the query speed is also accelerated.

2.1.2 clustered Index

The clustered index sorts and stores the data rows according to the key values of the data rows in the table or view. The index definition contains clustered index columns. Each table can have only one clustered index, because data rows can only be sorted in one order. Data rows in the table are stored in order only when the table contains clustered indexes. If a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in a unordered structure called a heap.

In layman's terms, the clustered index page stores actual data. Each table can only create a unique clustered index.

If a clustered index is created, data in the table is stored as a B-tree.

For example, the word orchestration of an English dictionary. The English dictionary word is A, B, C, D .... X, Y, and Z are arranged sequentially. If we look for Good words, we first locate G and then locate o-d. the final result of finding Good is where good actually exists.

Creating a clustered index requires at least 120% additional space for the table to store copies of the table and the intermediate index page.

2.1.3 non-clustered Index

Non-clustered indexes have a structure independent of data rows. A non-clustered index contains a non-clustered index key value, and each key value item has a pointer to a data row containing the key value.

A pointer from an index row in a non-clustered index to a data row is called a row locator. The structure of the row locator depends on whether the data page is stored in the heap or clustering table. For a heap, the row positioner is a pointer to a row. For clustered tables, the row locator is the clustered index key.

In general, non-clustered index pages store actual data, but actual data addresses. A table can have multiple non-clustered indexes. In SQL Server2005, a maximum of 249 non-clustered indexes can be created for each table, while in SQL server2008, a maximum of 999 non-clustered indexes can be created.

Understanding of Non-clustered indexes, that is, the "radicals" lookup method of Xinhua Dictionary. When you encounter a word you do not know and do not know its pronunciation, you cannot find the word you want to query according to the method just now, you need to find the word you are looking for based on the "radicals", and then directly go to a page based on the page number after the word to find the word you are looking. However, the words you find in combination with the "radical directory" and "word checking table" are not really the sorting method of the text. For example, you can query the word "Zhang, we can see that the page number of the "Zhang" in the word checking table after the department head is 672, and the "Zhang" in the word checking table is "Chi", but the page number is 63, under "Zhang" is the word "", and the page is 390 pages. Obviously, these words are not really in the upper and lower sides of the word "Zhang, the continuous "Chi, Zhang, and "words you see are actually their sorting in the non-clustered index, which is the ing of words in the dictionary body in the non-clustered index. We can find the words you need in this way, but it requires two steps: first find the results in the directory, and then flip to the page number you need. We refer to this directory as a directory, and the text as a non-clustered index ".

2.1.4 covering indexes:

A covered index is a non-clustered index that contains all the information required for search. This index is faster because the index page contains the necessary data for search, you do not need to access the data page. If a non-clustered index contains result data, the query speed is faster than that of the clustered index.

However, it takes up a large amount of space to cover a large number of index items. In addition, the update operation will change the index value. Therefore, if the potential overwrite query is not commonly used or is not critical, increasing the index Overwrite will reduce the performance.

2.1.5 primary key and Index

Primary Key: A table usually has one or more columns that uniquely identify the values of each row in the table. Such a column or multiple columns are called the primary key of the table, which is used to force the Object Integrity of the table. When creating or modifying a table, you can create a primary key by defining the primary key constraint. It is a unique index.

Below is a simple comparison table

	Primary Key	Clustered Index
Purpose	Forced table entity integrity	Sorts data rows for easy Query
Number of tables in a table	A table has at most one primary key.	A table can have at most one clustered index.
Can multiple fields be defined?	One primary key can be defined by multiple fields	One index can be defined by multiple fields.

Whether null data rows are allowed	If the data in the data column to be created is null, a primary key cannot be created. The specified primary key constraint column during table creation is implicitly converted to not null.	Columns that do not limit the creation of clustered indexes must not be null. That is, the column data can be null. See the last comparison.
Must the data be unique?	Data must be unique	Data can be unique or not unique. See the UNIQUE settings you have defined for this index. In this case, we need to look at the following comparison. Although your data columns may not be unique, the system will generate a unique column that you cannot see for you)

Created Logic	When the database creates a primary key, it automatically creates a unique index. If no clustered index exists in the table and the non-clustered index is not specified when the primary key is created, a unique clustered index is created when the primary key is created.	If the UNIQUE attribute is not used to create a clustered index, the database engine automatically adds a four-byte uniqueifier column to the table. If necessary, the database engine automatically adds a uniqueifier value to the row to make each key unique. This column and column value are used internally and cannot be viewed or accessed by users.

2.2 index Storage Structure

2.1.1 full table scan and index Scan

Full table scan and index scan are the only two ways for SQL Server databases to retrieve data. In addition, there is no third method for SQL Server to retrieve data.

Full table Scan

The most direct retrieval method. When SQL Server performs a table scan, it starts scanning from the header until the entire table ends. When a qualified record is found, the record is stored in the result set. This is a quick method for tables with small data volumes. If no index is created for the table, SQL server retrieves the data in this way.

Index Scanning

If an index is created for a table, the SQL Server optimizer selects the optimal index from the available index based on the query conditions before the query. When a record is retrieved, the B tree is traversed. When a qualified record is found, the record is stored in the result set. Therefore, using indexes to retrieve a table with a large amount of data can significantly improve the performance compared with the full table scan.

2.1.2 B-Tree

2.2.3 clustered Index

Leaf nodes store actual data. The index entry points are stored in master-> sys. indexes.

2.2.4 non-clustered Index

2.4.1 Non-clustered index on heap on the heap)

Similar to clustered indexes.

Differences:

Leaf nodes store not actual data, but pointer to actual data. The retrieval speed is very close to that of clustered indexes. Compared with clustered indexes, the retrieval speed is actually only one step from retrieving the actual data based on the pointer.

2.4.2 non-clustered indexes on Clustered tables

3. Manage Indexes

3.1 create

 
 
  
  CREATE [UNIQUE] [CLUSTERED|NONCLUSTERED]  
  
   
  
  INDEX <index name> ON <table or view name>(<column name> [ASC|DESC] [,...n])  
  
  INCLUDE (<column name> [, ...n])  
  
  [WITH 
  
  [PAD_INDEX = { ON | OFF }]  
  
  [[,] FILLFACTOR = <fillfactor>]  
  
  [[,] IGNORE_DUP_KEY = { ON | OFF }]  
  
  [[,] DROP_EXISTING = { ON | OFF }]  
  
  [[,] STATISTICS_NORECOMPUTE = { ON | OFF }]  
  
  [[,] SORT_IN_TEMPDB = { ON | OFF }]  
  
  [[,] ONLINE = { ON | OFF }  
  
  [[,] ALLOW_ROW_LOCKS = { ON | OFF }  
  
  [[,] ALLOW_PAGE_LOCKS = { ON | OFF }  
  
  [[,] MAXDOP = <maximum degree of parallelism>  
  
  ]  
  
  [ON {<filegroup> | <partition scheme name> | DEFAULT }]

3.2 modify

 
 
  
  ALTER INDEX { <name of index> | ALL }  
  
  ON <table or view name>  
  
  { REBUILD  
  
  [ [ WITH (  
  
  [ PAD_INDEX = { ON | OFF } ]  
  
  | [[,] FILLFACTOR = <fillfactor>  
  
  | [[,] SORT_IN_TEMPDB = { ON | OFF } ]  
  
  | [[,] IGNORE_DUP_KEY = { ON | OFF } ]  
  
  | [[,] STATISTICS_NORECOMPUTE = { ON | OFF } ]  
  
  | [[,] ONLINE = { ON | OFF } ]  
  
  | [[,] ALLOW_ROW_LOCKS = { ON | OFF } ]  
  
  | [[,] ALLOW_PAGE_LOCKS = { ON | OFF } ]  
  
  | [[,] MAXDOP = <max degree of parallelism>  
  
  ) ]  
  
  | [ PARTITION = <partition number>  
  
  [ WITH ( <partition rebuild index option>  
  
  [ ,...n ] ) ] ] ]  
  
  | DISABLE  
  
  | REORGANIZE  
  
  [ PARTITION = <partition number> ]  
  
  [ WITH ( LOB_COMPACTION = { ON | OFF } ) ]  
  
  | SET ([ ALLOW_ROW_LOCKS= { ON | OFF } ]  
  
  | [[,] ALLOW_PAGE_LOCKS = { ON | OFF } ]  
  
  | [[,] IGNORE_DUP_KEY = { ON | OFF } ]  
  
  | [[,] STATISTICS_NORECOMPUTE = { ON | OFF } ]  
  
  )  
  
  } [ ; ]

3.3 Delete

 
 
  
  DROP INDEX <table name>.<index name>

4. Should I pay attention to 10 items when using indexes?

1) clustered indexes are generally faster than non-clustered indexes.

2) check whether there is sufficient space to create an index. Indexes occupy space, and the database size is about 1.2 times on average.

3) create a clustered index on fields that are often used for queries or aggregation conditions. Such query conditions include between,>, <, group by, max, min, and count.

4) do not create clustered indexes on columns that are often inserted as unordered fields. Inserting data rows involves paging. rebuild indexes consume a lot of time. Refer to "An example of improper use of clustered indexes" at the end of this article ".

5) Create an index on the unique field of the value height. Indexes such as gender fields cannot be created.

6) This index only works when the first column of the index is included in the query condition.

For example, if we use a radicals + radicals to query Chinese characters, the radicals must first be included in the query conditions. Only the radicals must be located first, and then combined with the radicals, in order to make full use of the rapid effect of searching by radicals and radicals.

7) delete unused indexes. Especially for data tables that are frequently deleted and modified, you must consider how to optimize indexes.

Original article title: database index

Link: http://www.cnblogs.com/Ring1981/archive/2010/09/15/1826997.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More