One, the structure of the Hash index
The hash index consists of the buckets set, and index Key is mapped by the hash function, resulting in a hash value, which is filled into the corresponding bucket, and the hash value of each bucket is different. SQL Server provides a hash function to insinuate the index key into the corresponding bucket. The hash function is deterministic, and for the same index Key,hash function The hash value is fixed and insinuate to the same bucket.
A hash index consists of a collection of buckets organized in an array. A Hash function maps index keys to corresponding buckets in the hash index. The following figure shows three index keys is mapped to three different buckets in the hash index. For illustration purposes the hash function, name is f (x).
The hashing function used for hash indexes have the following characteristics:
SQL Server has one hash function, which is a used for all hash indexes.
The hash function is deterministic. The same index key is always mapped to the same bucket in the hash index.
Multiple index keys is mapped to the same hash bucket.
The hash function is balanced, meaning that the distribution of index key values over hash buckets typically follows a Poi Sson distribution.
Poisson distribution is a even distribution. Index key values is not evenly distributed in the hash buckets. For example, a Poisson distribution of n distinct index keys over n hashes buckets results in Approximatel Y one third empty buckets, one third of the buckets containing one index key, and the other third containing the index key S. A small number of buckets would contain more than the keys.
Because the different index key after the hash function insinuate, may generate the same hash value, insinuate to the same bucket, this is the hash conflict. If multiple index keys insinuate into the same bucket, then these index keys form a linked list. The longer the list, the poorer the lookup performance. If the hash index has a sufficient number of buckets, it can reduce the hash conflict to a certain extent, improve the seek performance of the hash index.
If the index keys is mapped to the same hash bucket, there is a hash collision. A large number of hash collisions can has a performance impact on read operations.
The In-memory hash index structure consists of an array of memory pointers. Each bucket maps to a offset in this array. Each bucket in the array points to the first row in that hash bucket. Each row in the bucket points to the next row, thus resulting in a chain of rows for each hash bucket, as illustrated in T He following figure.
The figure had three buckets with rows. The second bucket from the top contains the three red rows. The fourth bucket contains the single blue row. The bottom bucket contains the green rows. These could be different versions of the same row.
Second, Memory Optimized Index
In SQL Server 2014, table is divided into two categories, depending on whether the table resides in memory: memory-optimized table and disk-based table. The index created on table is divided into two categories: Memory-optimized index and disk-based index. Disk-based index is a btree structure with two types of clustered and nonclustered.
Memory-optimized Index Unique Features:
- memory-optimized indexes must be created at CREATE TABLE, the CREATE INDEX statement cannot create memory-optimized indexes.
- memory-optimized indexes exists only in memory, and the index structure is not persisted to disk.
- memory-optimized Indexes is the memory address that covers the Index,index node (any node of the hash index, or the leaf node of the nonclustered index) that contains the data rows, which makes memory-optimized indexes is able to access all of the table's column, similar to the disk-based clustered index, where the difference is memory-optimized indexes and The structure of the memory-optimized table is physically separate, while the Disk-based clustered index and table are the same on the physical storage.
The index created on memory-optimized table is called Memory-optimized Index, and there are two types: nonclustered hash index and nonclustered index.
Features of 1,memory-optimized nonclustered index
- memory-optimized Nonclustered index is a btree structure whose leaf node contains the memory address of the data row.
- memory-optimized nonclustered index is ordered and is sorted by index key when index is created, and its order is one-way, and can only be found in the sort direction defined by index key when searching. For example, if the order of index definitions is (C1 desc), then the index cannot be found by (C1 ASC).
- Memory-optimized nonclustered the order of the index key is important if the filter condition does not contain the first column of index key, then the index cannot be referenced for lookup. Only the previous index column is included in the filter condition to use the index to find. The index column behind the position is missing and cannot affect the search for index. For example, index Key (C1,C2,C3,C4), filter condition (c1), or (C1,C2) can be found using index.
- memory-optimized nonclustered index is suitable for querying for ranges and unequal assertions on memory-optimized table
Features of 2,hash Index
- Hash index uses hash table to organize the index structure, each node contains a pointer to the memory address of the data row directly.
- Hash index is unordered, suitable for index Seek
- Hash index key must all appear in the filter condition, SQL Server will use the hash index seek operation to find the corresponding data row, if any one of the index column is missing, then SQL The server executes the full table scan to get the data rows that match the criteria. Because, if you specify n column when you create the hash index, SQL Server calculates the hash Value for the N column, insinuate to the corresponding bucket, so that only the N column exists to locate the corresponding bucket. And then find the appropriate data.
Three, Hash Table
The hash index is actually used to store the index key and value using a hash table, and the Hashtable is constructed in memory to achieve a fast access to the data in optimized table by the hash index. HashTable mainly consists of hash Function, bucket set and element chain list. The search advantage of Hash table is that it doesn't need sorting, and the search speed is irrelevant to the data.
Refer to "hash and bucket in Linux kernel":
Hashtable is a set of stored index key (key) and value pairs, the Hashtable object is composed of a hash bucket (bucket) containing the elements in the collection, and the bucket is a virtual subgroup of the elements within the Hashtable. A hash table consisting of 5 buckets with 7 elements:
A hash function is an algorithm that returns a numeric hash program code based on an index key. The index key (key) is the value of some property of the object being stored. When an object is added to Hashtable, it is stored in the bucket associated with the hash program code that matches the object hash program code. When a value is searched within a hashtable, the hash program code is generated for that value, and the bucket associated with the hash program code is searched. For example, student and teacher are placed in different buckets, and the dog and God are placed in the same bucket. So it performs better when the index key is the only one that gets the performance of the element from Hashtable.
Bucket in English explanation:
Hash table lookup Operations is often O (n/m) (where n is the number of objects in the table and M is the number of buckets s), which is close to O (1), especially when the hash function has spread the hashed objects evenly through the hash table, And there is more hash buckets than objects to be stored.
Four,Hash Index Key Columns
hash indexes require values for all index key columns in order to compute the hash value, and locate the Correspon Ding rows in the hash table. Therefore, if a query includes equality predicates for only a subset of the index keys in the WHERE clause, SQL Server can Not use a index seek to locate the rows corresponding to the predicates in the WHERE clause.
In contrast, ordered indexes like the disk-based nonclustered indexes and the memory-optimized nonclustered indexes Suppor T index seek on a subset of the index key columns, as long as they is the leading columns in the index.
The hash index requires a key (to hash) to seek into the index. If an index key consists of both columns and you is provide the first column, SQL Server does not has a complete key to Hash. This would result in an Index scan query plan. Usage determines which columns should be indexed.
For a nonclustered memory-optimized index, the full key is not required to perform an index seek. Although, given the column order of the index key, a scan would occur if a value for a column comes after a missing column.
Appendix:
Refer to the guidelines for Using Indexes on memory-optimized Tables:
SELECT C1, c2 from t 1;
If There is no index on column C1, SQL Server would need to scan the entire table T, and then filter on the rows that Satis FY the condition c1=1. However, if T has a index on column c1, SQL Server can seek directly on the value 1 and retrieve the rows.
When searching for records that has a specific value, or range of values, for one or more columns in the table, SQL Serve R can use a index on those columns to quickly locate the corresponding records. Both disk-based and Memory-optimized tables benefit from indexes. There is, however, certain differences between index structures that need to being considered when using memory-optimized ta Bles. (Indexes on memory-optimized tables is referred to as memory-optimized Indexes.) Some of the key differences are:
Memory-optimized indexes must is created with CREATE TABLE (Transact-SQL). disk-based indexes can is created with create TABLE and create INDEX.
memory-optimized indexes exist only in memory. Index structures is not persisted to disk and index operations is not logged in the transaction log. The index structure is created when the Memory-optimized table was created in memory, both during CREATE table and during D Atabase startup.
memory-optimized indexes is inherently covering. Covering means that all columns is virtually included in the index and bookmark lookups is not needed for Memory-optimiz Ed tables. Rather than a reference to the primary key, memory-optimized indexes simply contain a memory pointer to the actual row in The table data structure.
Fragmentation and fillfactor do not apply to memory-optimized indexes. In disk-based indexes, fragmentation refers to pages on the B-tree being written to disk Out-of-order. memory-optimized indexes is not a written to or read from disk. Fillfactor in disk-based b-tree indexes refers to the degree to which the physical page structures is filled with actual Data. The memory-optimized index structures do not have fixed-size pages.
There is types of memory-optimized indexes:
Nonclustered hash indexes, which is made for point lookups.
Nonclustered indexes, which is made for range scans and ordered scans.
with a hash index, data is accessed through an in-memory hash table. The Hash indexes does not have pages and is always a fixed size. However, a hash index can has empty hash buckets, which result in limited wasted space. The values returned from a query using a hash index is not sorted. Hash indexes is optimized for the index seeks on equality predicates and also support full index scans.
nonclustered indexes (not hash indexes) support Everything that hash indexes supports p LUs seek operations on inequality predicates such as greater than or less than, as well as sort order. Rows can retrieved according to the order specified with index creation. If the sort order of the index matches the sort order required for a particular query, for example if the index key Matche s the ORDER by clause, there are no need to sort the rows as part of the query execution. memory-optimized nonclustered indexes is unidirectional; They do don't support retrieving rows in a sort order that's the reverse of the sort order of the index. For example, for a index specified as (C1 ASC), it is not possible to scan the index in reverse order, as (C1 DESC).
each index consumes memory. Hash indexes consume a fixed amount of memory, which is a function of the bucket count. For nonclustered indexes, memory consumption is a function of the row count and the size of the "index key columns, with so Me additional overhead depending on the workload. Memory for memory-optimized indexes are in addition to and separate from the memory used to store rows in memory-optimized Tables.
Duplicate key values always share the same hash bucket. If A hash index contains many duplicate key values, the resulting long hash chains would harm performance. Hash collisions, which occur in any hash index, would further reduce performance in this scenario. For that reason, if the number of the unique index keys are at least smaller than the row count, you can reduce the R ISK of hash collisions by making the bucket count much larger (at least eight times the number of the unique index keys; see D Etermining the Correct Bucket Count for Hash Indexes for more information) or can eliminate Hash collisions entirely b Y using a nonclustered index.
Reference Documentation:
Hash Indexes
Guidelines for Using Indexes on memory-optimized Tables
Troubleshooting Common performance problems with memory-optimized Hash Indexes
Hash and bucket in the Linux kernel
Learn Hash Index