Let's start the tutorial with a simple example that explains why we need a database index. Let's say we have a database table Employee, and this table has three fields (columns), Employee_Name, Employee_age, and employee_address, respectively. Suppose the table employee has thousands of rows of data.
Now suppose we want to find out from this table all employee information whose names are ' Jesus '. We decided to use the following query statement:
SELECT * FROM Employee WHERE Employee_Name = ‘Jesus‘
What happens if the table doesn't?
Once we run this query, what happens in the process of finding an employee named Jesus? The database has to have each row in the employee table and determine if the employee's name (employee_name) is ' Jesus '. Since we want to get employee information for each name Jesus, after querying to the first qualifying row, you cannot stop the query because there may be other qualifying rows. Therefore, it is necessary to find a row of rows until the last line-this means that the database has to check thousands of lines of data to find the employee whose name is Jesus. This is called a full-table scan .
How does a database index improve performance?
You might want to do a full-table scan for such a simple thing. Isn't the database supposed to be smarter? It's like looking through the whole table from the beginning to the end-it's slow and not elegant (original: Not at all sleek, I don't know how to translate). However, you can already guess according to the article title, which is when the index comes in handy. the whole point of using an index is to speed up the search by narrowing the number of records/rows in a table that need to be queried .
What is an index?
An index is a data structure for a particular column in a stored table (the most common is b-tree). The index is created on the columns of the table. So, the key to remember is that the index contains the values of the columns in a table, and these values are stored in a data structure. Remember this: The index is a data structure.
What kind of data structures can be indexed?
B-tree is the most commonly used data structure for indexing. Because they are low in time complexity, find, delete, and insert operations can be done in logarithmic time. Another important reason is that the data stored in the B-tree is ordered . A database management system (RDBMS) typically determines which data structures should be used for indexing. However, in some cases, when you create an index, you can specify the data structure to use for the index.
How does a hash table index work?
A hash table is another form of data structure that you might see as an index-these indexes are often referred to as hash indexes. The hash index is used because the hash table is extremely efficient when looking for a value. Therefore, if you use a hash index, queries that compare strings for equality can retrieve the values very quickly. For example, the query we discussed earlier (SELECT * from Employee WHERE employee_name = ' Jesus ') can benefit from the hash index created on the Employee_Name column. A HA index works by the value of the column as the key value (key) of the index, and the corresponding actual value (value) of the key value is a pointer to the corresponding row in the table. Because a hash table can basically be thought of as an associative array, a typical data item is like "Jesus = 0x28939″", and 0x28939 is a reference to a row in the in-memory table that contains Jesus. Querying a value like "Jesus" in a ha index and getting a reference to the corresponding row in memory is significantly faster than scanning the entire table for rows that have a value of "Jesus".
Disadvantages of Hash indexes
A hash table is a data structure that is not smooth, and is powerless for many types of hash indexes on query statements. For example, if you want to find all employees younger than 40 years old. How do you use a hash index for querying? This is not possible because the hash table is only suitable for querying key-value pairs-that is, queries that query for equality (example: like "WHERE name = ' Jesus '). The hash table's key-value mappings also imply that the storage of its keys is unordered. This is why hash indexes are not usually the default data structures for database indexes-because they are not as flexible as b-tree in the data structure as indexes
What other types of indexes are there?
Indexes that use R-tree as data structures are often used to help with spatial problems. For example, a query asks "to find all the Starbucks within two miles of me," and if the database tables use the R-tree index, the efficiency of such queries will increase.
The other index is the bitmap index (bitmap index), which fits on a column that contains a Boolean value (True and false), but many instances of these values (values that represent true or false)-are basically low-selectivity (selectivity) columns.
How does an index improve performance?
Because the index is basically the data structure used to store column values, this makes finding these column values faster. If the index uses the most frequently used data structure-b-tree-then the data in it is ordered. An ordered column value can greatly improve performance. The reasons are explained below.
Suppose we create a b-tree index on the employee_name column. This means that when we use the previous SQL to find the employee whose name is ' Jesus ', we do not need to scan the whole table again. Instead, an index lookup is used to find an employee whose name is ' Jesus ' because the index is sorted alphabetically by alphabetical order. The index has been sorted , which means that querying a name can be much faster because the employees whose names are ' J ' are all arranged together. Another important point is that the index stores a pointer to the corresponding row in the table to get the data for the other columns.
What exactly is stored in the database index?
You now know that a database index is created on a column of a table and stores all values for that column. However, it is important to understand that database indexes do not store values for other columns (fields) in this table . For example, if we create an index in the Employee_Name column, the values on the columns Employee_age and employee_address are not stored in the index. If we do store all the other fields in this index, it will be a copy of the whole table as an index-it takes up too much space and is very inefficient.
Index stores a pointer to a row in the table
How can we find other values of this record if we find the value of a column in the index as an index? This is simple-the database index stores a pointer to the corresponding row in the table. A pointer is a piece of memory area that records a reference to the data of the corresponding row recorded on the hard disk. Therefore, in addition to the value of the stored column in the index, an index is stored that points to the row data. That is, a value (or node) of the Employee_Name column in the index can be described as ("Jesus", 0x82829), and 0x82829 is the address of the data on the hard disk that contains "Jesus". Without this reference, you will only have access to a single value ("Jesus"), which makes no sense, since you cannot get other values for the employee of this row record-such as address and age.
How does the database know when to use the index?
When this SQL ( SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’
) run, the database checks to see if there is an index on the column of the query. Assuming the index is indeed created on the Employee_Name column, the database will then check whether it is reasonable to use the index for queries-because in some scenarios, using an index is less efficient than a full table scan. If you want to learn more about these scenarios, please read this article: selectivity in SQL
Can you force the database to use the index?
Usually, you won't tell the database when to use the index-the database decides for itself. However, it is worth noting that in most databases (like Oracle and MYSQL), you can actually develop the index you want to use.
How to create an index using SQL:
In the previous example, the SQL that created the index on the Employee_Name column is as follows:
CREATE INDEX name_indexON Employee (Employee_Name)
How to create a federated index
We can create a federated index of two columns on the Employee table, with SQL as follows:
CREATE INDEX name_indexON Employee (Employee_Name, Employee_Age)
What's the best analogy for database indexing?
A good analogy is to think of a database index as the index of a book. If you have a book about dogs, you want to find the part about ' Golden retriever '. Why do you have to turn over the original book-this is equivalent to a full table scan in the database-when you can find out which pages in the back index are related to the ' Golden Retriever ' information. Similarly, just as the index of a book contains page numbers, the index of the database contains pointers to the rows of the values you want to query in SQL.
What is the cost of using a database index?
So what are the drawbacks of using database indexes? First, the index takes up space-the larger your table, the more space the index occupies. Second, the performance loss (the primary value update operation), when you add, delete or update row data in the table, in the index will also have the same action. Remember: An index that is built on a column (or columns) needs to hold the latest data for that column .
The basic principle is that if a column in a table is used very frequently during a query, the index is created on that column .
Original link:
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
For more index-related knowledge, please refer to the link:
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
http://www.programmerinterview.com/index.php/database-sql/selectivity-in-sql-databases/
http://www.programmerinterview.com/index.php/database-sql/cardinality-in-sql/
http://use-the-index-luke.com/sql/preface#
Principles of Database Index optimization, working mechanism of indexes