Detailed description of SQL Server2014 hash index principles, server2014
When a key-value pair is passed to a hash function, after calculation by the hash function, according to the results, the key-value pairs will be placed in the appropriate hash buckets (hash bucket ).
Example
We assume that modulo 10 (% 10) is a hash function. If the key of the key-value pair is 1525 and is passed to the hash function, 1525 is stored in the fifth bucket.
Because 5 as 1525% 10 = 5.
Similarly, 537 is stored in the seventh bucket, 2982 is stored in the second bucket, and so on.
Similarly, in the hash index, the hash index column is passed to the hash function for matching (similar to the HashMap operation in java). After successful matching,
The index column is stored in the matched hash bucket table, which contains the actual data row pointer, and then searches for the corresponding data row based on the actual data row pointer.
To sum up, the SQL Server engine needs to do the following to find a row of data or process a where clause:
1. Generate a suitable hash function based on the parameters in the where condition.
2. Match the index column to match the corresponding hash bucket. Finding the corresponding hash bucket means that the corresponding row pointer is also found)
3. Read data
The hash index is simpler than the B-tree index because it does not need to traverse B-tree, so the access speed is faster.
Examples of Hash Functions and corresponding syntax
CREATE TABLE dbo.HK_tbl ( [ID] INT IDENTITY(1, 1) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 100000 ) , [Data] char(32) COLLATE Latin1_General_100_BIN2 NULL , [dt] datetime NOT NULL, ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
In SQL Server 2014, hash indexes cannot be added after the memory optimization table is created. However, in SQL Server 2016, hash indexes can be added after the table is created. However
Adding a hash index is an offline operation.
Number of buckets with a hash Index
(BUCKET_COUNT = 100000) defines the number of buckets that can be used by the hash index. This BUCKET is fixed and the number of buckets is specified by the user,
Instead of the number of buckets generated by SQL Server during query execution. The number of buckets is always rounded to the power of 2 (1024,204 8, 4096 etc ..)
In fact, the hash index of SQL Server is similar to the MySQL adaptive hash index principle, in order to get rid of the limitations of B tree, so that the search efficiency is faster
How does a relational database work this article also describes the principle of hash join. You can take a look at it.