Following the book on SQL tuning (SQL tuning or SQL optimization) in the essentials of high-performance SQL tuning and case resolution, we talk today about one of the most important core mechanisms of SQL tuning technology and half of the work-index (index) in every major relational repository. We know that "high-performance SQL Tuning essentials and Case Resolution" also repeatedly emphasized the importance of index to SQL tuning, but in the previous article also talked about, only look at the case and solve the specific method of the problem, and do not grasp the basic knowledge of SQL tuning, it is useless, we must know it, more to know its why, To achieve mastery, ingenious, and then the SQL tuning technology to master the point of perfection. Because "high-performance SQL Tuning essentials and Case Resolution" is mainly based on Oracle database, the SQL tuning technology is explained, in the previous article, I also said that the relational database on the analysis of SQL tuning and problem solving ideas, methods and procedures, almost exactly the same, just specific commands, Methods and forms differ. So, today we have a brief description of the indexing mechanism of the main relational libraries to help readers understand and master the content and essence of the "high-performance SQL Tuning essentials and Case Resolution", and hope that other students will benefit a lot. However, there is more than one type of index in a relational library, and there are differences between different relational libraries and different versions of the same relational library, so we will only cover the most widely used B*tree index.
1, Oracle B*tree Index: The organization Chart of the B*tree index in Oracle is described in detail in the essentials of high-performance SQL tuning and case resolution, and is no longer mentioned here, but it should be emphasized that b*tree non-leaf-level blocks in Oracle (non-leaf Level block), only the key value of the indexed column (the column value of a single row index or a combination of column values for a multicolumn index) is stored, the maximum value and the pointer to the next primary leaf level block, or a non-leaf level block (pointer), where the pointer is the file number + block number of the block. In the B*tree block of Oracle, the key value of the indexed column is stored +rowid (the file number + block number + slot number of the data row) so that the data rows in the table can easily be found through the index by the key and ROWID values in the B*tree. The data row in the segment in Oracle is the heap structure (heap), as described in "High-performance SQL Tuning essentials and case resolution." It is worth mentioning that Oracle also has a special table organization structure, that is, the Index organization table (IOT), although the application of Oracle is not many, but in other relational libraries, the application is very wide, but the name and details are different, specifically, continue to see the following content.
2, MySQL b*tree index: You know, MySQL database is a plug-in database, which is the data storage engine can be easily pluggable, therefore, MySQL also has a variety of storage engines exist simultaneously. Since this article is not dedicated to the MySQL storage engine, let's take the example of the most widely used InnoDB for illustration. The B*tree index in InnoDB (also known as key in MySQL) differs from Oracle in that it can be divided into clustered indexes (clustered index or primary key index) and non-clustered indexes depending on the organization structure (secondary index), when a table is created in InnoDB, a clustered index is created for the table's primary key, and if a primary key is not specified, a unique non-null index is selected as the primary key, and if no unique non-null index exists, an implicit primary key is automatically created, in short, the primary key of the table is not available. The data in the InnoDB table is stored in the cluster index of the table, specifically, the cluster index is a b*tree structure, except that the leaf level page, in addition to the key value of the cluster index, contains all the data column values in the table, so the data for the InnoDB table is ordered. In the InnoDB table, the non-clustered index, whose leaf level block does not contain the physical address of the data row (similar to the rowid of the data row in Oracle), but rather contains the key values in the cluster index on the table, so InnoDB, looking for data through a nonclustered index, typically undergoes two key-value lookups, first on a non-clustered index , you find the key value of the cluster index, and then look again on the cluster index to find the data row that you are actually looking for. It is emphasized that the cluster index of a table in InnoDB is not an option. Its organizational structure is similar to the Index organization table in Oracle.
3.SQL Server B*tree index: SQL Server database, as one of Microsoft's blockbuster products, sometimes also known as MSSQL, the current domestic and foreign markets have a considerable share. The B*tree index in MSSQL, similar to that in MySQL, is also divided into clustered indexes (clustered index) and non-clustered indexes (nonclustered index), but unlike the B*tree index in MySQL, The cluster index on the table in MSSQL is not mandatory, that is, you create a table, the table is a heap table, that its corresponding segments are organized and stored in the form of heaps, heap concept, "high-performance SQL Tuning essentials and Case resolution," a detailed discussion in the book, here no longer repeat. On the heap table in MSSQL, if you create a clustered index, the data in that table is moved to the leaf level page of the cluster index, sorted by the key value of the cluster index, organized and stored, the original heap no longer exists, and if you choose not to create a clustered index on the table, The table will always be in the form of a heap. And the non-clustered index in MSSQL, because the existence of the cluster index, its internal organization and mechanism are divided into two cases, when there is a clustered index on the table, the leaf page of the non-clustered index stores the key value of the cluster index, that is, there is no pointer to the physical address of the relevant data row, and if there is no cluster index A non-clustered index leaf-level page stores a pointer to the physical address of the related data row in the table (this pointer is called the RID in MSSQL). Regardless of whether a clustered index exists on the table, MSSQL finds the behavior of data rows through a non-clustered index as a bookmark lookup (bookmark lookup), although in MSSQL, the primary key is generally used and created as a clustered index on the table. We can see that the cluster index in MSSQL is similar to the organizational structure and the Index organization table in Oracle and the clustered index in MySQL.
4.Postgresql b*tree Index: Postgresql database, as one of the most powerful open source relational libraries, known as the free version of Oracle, but in terms of its application characteristics, row locks and MVCC and other aspects of the specific internal implementation, there are really many similarities with Oracle, It can also be said to be the most similar relational library in relation to the Coudas family, and Oracle. We also said that the PostgreSQL database is very similar to Oracle in many ways, so the organization and application of the B*tree index is no exception, and the B*tree index in the PostgreSQL database is similar to that in Oracle, Also does not have MySQL and MSSQL database in the cluster index one says. It is worth mentioning that database objects such as tables and indexes in PostgreSQL are organized and stored as separate files, 8k size data pages are also different from those in Oracle, and the MVCC in PostgreSQL differs greatly from the implementation mechanism in Oracle because The pre-image data of the data in the PostgreSQL table is stored together with the current version and needs to be purged periodically through vaccum. It is also worth mentioning that because there is no version information for the data in the B*tree index in PostgreSQL, the index overlay technique for the SQL statement does not exist, and the table action is required to confirm the version information of the specific data row, regardless of whether the indexed column can overwrite all columns of the SQL statement, although The SCAN-ONLY scans operation was introduced in the postgresql9.2 version, although in some cases it was possible to circumvent the table operation, but the VM file needed to access the table, and more crucially, if the corresponding bit (bit) in the VM file was the unset state, or the table was required, In view of its institutional limitations, the actual value of the technology is not so big, perhaps to write less read more business, will have a certain value.
Above, only the common mainstream relational library indexing mechanism to make a brief statement to help readers understand the "high-performance SQL tuning essentials and Case Analysis" of the book content, but also hope that other students benefit.
"Essentials of high-performance SQL tuning and Case Resolution" a book on mainstream relational database SQL tuning (SQL tuning or SQL Optimization) core mechanism-index