Why the database should be indexed why do you want to set up an index _ database other

Source: Internet
Author: User
Tags create index documentation hash mysql manual one table sorts
Here you want to summarize the previous index learning notes:

First understand why the index will increase the speed, db in the execution of a SQL statement, the default way is based on the search criteria for a full table scan, encounter matching criteria to join the search results set. If we add an index to a field, the query will first go to the number of rows in the index list to locate a specific value, greatly reducing the number of rows that match, so it can significantly increase the speed of the query. Should it be indexed at all times? Here are a few counter examples: 1, if each need to take all the table records, in any case must be a full table scan, then it is not meaningful to add indexes. 2. For fields that are not unique, such as "sex", which is a large number of duplicate values, adding indexes doesn't make sense. 3. For tables with fewer records, adding indexes does not bring speed optimization instead of wasting storage space, because indexes require storage space, and a fatal disadvantage is that for each execution of Update/insert/delete, the index of the field must be recalculated for the update.

So when is the right time to add an index? Let's look at an example of a MySQL manual that has an SQL statement:

SELECT C.companyid, c.companyname from Companies C, User u WHERE C.companyid = U.fk_companyid and c.numemployees >= 0 A ND c.companyname like '%i% ' and U.groupid in (SELECT g.groupid from Groups g WHERE g.grouplabel = ' Executive ')

This statement involves 3 table joins, and includes many search criteria such as size comparisons, like matches, and so on. The number of scan lines that MySQL needs to perform without an index is 77721876 rows. And we'll only need 134 rows to scan the number of rows after we index the two fields on CompanyID and Grouplabel. You can view the number of scans in MySQL by Explain Select. It can be seen that in the case of such tables and complex search conditions, the performance improvement of the index is much more important than the disk space it occupies.

So how is the index implemented? Most DB vendors implement indexes based on a data structure--b tree. Because B-tree is suitable for the disk and other direct storage devices to organize dynamic lookup table. The B-tree is defined as this: a M (m>=3) Order B-Tree is a M-fork tree that satisfies the following conditions:

1. Each node includes the following scopes (j, P0, K1, p1, K2, p2, ... ki, pi) where j is the number of keywords, p is the child pointer

2. All leaf nodes are on the same layer, and the number of layers equals the height of the tree H

3, each non-root node contains the number of keywords to meet [m/2-1]<=j<=m-1

4, if the tree is not empty, then the root has at least 1 keywords, if the root is not a leaf, then at least 2 Shang trees, at most have m Shang tree

Looking at a B-tree example, a 26-letter B-Tree can be constructed like this:

Can see in this tree B-Tree Search English letter complexity is only O (M), in the case of large amount of data, such a structure can greatly increase the query speed. However, there is another kind of data structure query's wasted faster than B-tree--hash list. The definition of a hash table is as follows: Set all possible keyword sets for u, the actual memory key word is k, and |k| is much smaller than |u|. The hash method is through the hash function H to map u to the subscript of table t[0,m-1], so that the keyword in U is a variable, and the result of the function of H is the storage address of the corresponding node. So that the search can be completed within O (1) time.
However, there is a flaw in the hash table, which is the hash conflict, where two keywords compute the same result through the hash function. Set m and N to represent the length of the hash table and the number of nodes filled, n/m is the filling factor for the hash table, and the larger the factor, the greater the chance that the hash conflict will be.
Because of this flaw, the database does not use a hash table as the default implementation of the index, and MySQL claims to try to convert the disk-based B-Tree index to the appropriate hash index based on the execution query format, seeking to further improve the search speed. I think other database vendors will have a similar strategy, after all, in the database battlefield, search speed and management security is a very important competitive point.


Introduction to Basic concepts:


Index

Use indexes to quickly access specific information in a database table. An index is a structure that sorts the values of one or more columns in a database table, such as the Last Name (lname) column of the employee table. If you want to find a specific employee by last name, the index helps you get that information faster than if you must search all the rows in the table.

The index provides pointers to data values stored in the specified columns of the table, and then sorts the pointers according to the sort order you specify. The database uses the index in the same way that you use the index in the book: it searches the index to find a specific value, and then follows the pointer to the row that contains the value.

In a database diagram, you can create, edit, or delete each index type in the Indexes/Keys property page of the selected table. When you save the table that the index is attached to, or if you save the diagram that contains the table, the index is saved in the database. For more information, see Creating Indexes.

Note that not all databases use the index in the same way. For more information, see Database Server considerations, or consult your database documentation.

As a general rule, you need to create an index on a table only if you frequently query the data in an indexed column. Indexes consume disk space and reduce the speed at which rows are added, deleted, and updated. In most cases, the speed of indexing for data retrieval is much greater than its.

Indexed Columns

You can create indexes based on single or multiple columns in a database table. A multiple-column index allows you to distinguish between rows where one column may have the same value.

Indexing is also helpful if you frequently search for two or more columns, or sort by two or more columns. For example, if you often set a criterion for both the first and last names in the same query, it would make sense to create a multiple-column index on both columns.

To determine the validity of an index:

    • Check the query's WHERE and JOIN clauses. Each column included in any clause is an object that the index can select.
    • Test the new index to check its impact on running query performance.
    • Consider the number of indexes that have been created on the table. It is best to avoid having many indexes on a single table.
    • Check the definition of the index that has been created on the table. It is best to avoid overlapping indexes that contain shared columns.
    • Checks the number of unique data values in a column and compares the number to the number of rows in the table. The result of the comparison is the selectivity of the column, which helps determine whether the column is suitable for indexing and, if appropriate, the type of the index.
Index Type

Depending on the functionality of your database, you can create three indexes in the Database Designer: A unique index, a primary key index, and a clustered index. For more information about the indexing capabilities supported by the database, see your database documentation.

Tip:Although a unique index helps locate information, it is recommended that you use a primary KEY or a unique constraint instead for the best performance results.

Unique index

A unique index is one that does not allow any two rows to have the same index value.

Most databases do not allow a newly created unique index to be saved with a table when duplicate key values exist in existing data. The database may also prevent the addition of new data that will create duplicate key values in the table. For example, if a unique index is created on the employee's last name (lname) in the employee table, no two employees will have the same surname.

Primary key Index

Database tables often have a column or combination of columns whose values uniquely identify each row in the table. This column is called the primary key of the table.

Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. The index requires that each value in the primary key is unique. When a primary key index is used in a query, it also allows for fast access to the data.

Clustered index

In a clustered index, the physical order of the rows in the table is the same as the logical (indexed) Order of the key values. A table can contain only one clustered index.

If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Clustered indexes typically provide faster data access than nonclustered indexes.


How to build and take precautions

The most common scenario is to create an index for the field that appears in the WHERE clause. To make it easier to tell, let's start with a table like the following.

CREATE TABLE MyTable (

ID Serial primary Key,

category_id int NOT NULL default 0,

USER_ID int NOT NULL default 0,

adddate int NOT NULL default 0

);

If you use a statement similar to the following in your query:

SELECT * FROM MyTable WHERE category_id=1;

The most direct response is to create a simple index for category_id:

CREATE INDEX Mytable_categoryid

On MyTable (category_id);

OK. What if you have more than one choice? For example:

SELECT * FROM MyTable WHERE category_id=1 and user_id=2;

Your first reaction may be to create an index for user_id. No, that's not the best way. You can build multiple indexes.

CREATE INDEX Mytable_categoryid_userid on MyTable (category_id,user_id);

Have you noticed my habit of naming? I use the "table name _ Field 1 _ Field 2 Names" method. You'll soon find out why I did it.

Now that you've indexed the right fields, it's still a bit of a worry, you might ask, does the database actually use these indexes? The test is OK, for most databases, this is easy, just use the explain command:

EXPLAIN

SELECT * FROM MyTable

WHERE category_id=1 and user_id=2;

This is what Postgres 7.1 returns (exactly as I expected)

Notice:query Plan:

Index Scan using Mytable_categoryid_userid on

MyTable (cost=0.00..2.02 Rows=1 width=16)

EXPLAIN

This is the Postgres data, and you can see that the database uses an index (a good start) at the time of the query, and it uses the second index I created. See the benefits of naming me above, you immediately know it uses the appropriate index.

And then, a little more complicated, what if there is an order by word? Believe it or not, most databases will benefit from the index when they use the order by.

SELECT * FROM MyTable

WHERE category_id=1 and user_id=2

Order by Adddate DESC;

It's very simple, like creating an index for a field in a Where clause, and an index to the field in the word for ORDER by:

CREATE INDEX Mytable_categoryid_userid_adddate

On MyTable (category_id,user_id,adddate);

Note: "Mytable_categoryid_userid_adddate" will be truncated to

"Mytable_categoryid_userid_addda"

CREATE

EXPLAIN SELECT * FROM MyTable

WHERE category_id=1 and user_id=2

Order by Adddate DESC;

Notice:query Plan:

Sort (cost=2.03..2.03 Rows=1 width=16)

-> Index Scan using Mytable_categoryid_userid_addda

On MyTable (cost=0.00..2.02 Rows=1 width=16)

EXPLAIN

Look at the output of the explain, the database to do a more we do not require the sort, this is to know how the performance damage it, it seems that our database of its own operation is a bit too optimistic, then, to the database a little more hint.

In order to skip the sorting step, we don't need another index, just a slight change to the query statement. Here's the postgres, and we'll give the database an extra hint--in the order BY statement, join the field in the where statement. This is just a technical process, not necessarily, because in fact there is no sort operation on the other two fields, but if you join, Postgres will know what it should do.

EXPLAIN SELECT * FROM MyTable

WHERE category_id=1 and user_id=2

ORDER by category_id desc,user_id desc,adddate DESC;

Notice:query Plan:

Index Scan Backward using

Mytable_categoryid_userid_addda on MyTable

(cost=0.00..2.02 Rows=1 width=16)

EXPLAIN

Now we're using our expected index, and it's smart enough to know that you can start reading from behind the index to avoid any sort.

That's a little bit more detailed, but if your database is huge and your daily page requests are millions, I think you'll get a lot of benefit. But what if you want to do more complex queries, such as combining multiple tables and queries, especially where the fields in the where restrictions are coming from more than one table? I usually try to avoid this because the database has to combine everything in each table and then eliminate the inappropriate rows, which can be costly.

If you can't avoid it, you should look at each table you want to combine and use the above strategy to build the index, and then use the explain command to verify that you have used the index you expect. If so, it's OK. No, you might want to create temporary tables to combine them and use the appropriate index.

Note that creating too many indexes will affect the speed of updates and inserts, as it needs to update each index file as well. For a table that needs to be updated and inserted frequently, there is no need to index a single, rarely used where clause, and for smaller tables, the cost of sorting is not very large and there is no need to establish additional indexes.

The above are just some very basic things, in fact, there are a lot of learning, alone explain we can not determine whether the method is the most optimized, each database has its own some optimizer, although may not be perfect, but they will be in the query when the comparison of which way faster, in some cases, Indexing may not be fast, for example, when the index is placed in a discontinuous storage space, which increases the load on the read disk, therefore, which is the optimal, should be tested by the actual use of the environment.

In the beginning, if the table is not large, there is no need to index, my opinion is to be indexed when needed, but also some commands to optimize the table, such as MySQL can be "OPTIMIZE table."

To sum up, you should have some basic concepts on how to set up a proper index for a database.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.