Database system concept 14-static hashing

Source: Internet
Author: User

One of the drawbacks of sequential file organization is that you must locate the data by accessing the index or using a binary search, which requires more I/O operations. Hashing technology-based file organization does not require access to the index structure, and hashing provides a way to organize the index.
In hash (hash) technology, buckets (buckets) are used to represent storage units that can store one or more records. If K represents a collection of all search codes and B represents a collection of all buckets, the hash function h represents a mapping function from K to B.
When inserting a record with Ki, the address of the bucket is calculated by the hash function, and if there is room for the bucket, the data is inserted.
When you query Ki, it is also the address of the bucket by H (KI), but there are often more than one record in the bucket, then we need to search further according to the search code inside the bucket.
Hash can have two purposes, in the hash file organization, the hash function directly locates the disk block of the record, in the hash index organization, the search code and pointer organized into a hash file structure.
A) hash function
It is important to choose the hash function reasonably, otherwise it may cause the records to be centrally mapped to a small number of buckets. The distribution characteristics of the hash function are uniform and random, and the number of records allocated to each bucket should be equal, and the allocation result is independent of the order of the search code itself.
A typical hash function is calculated based on the binary value of the search code, such as the binary all bits can be computed and then modulo. A result a well-designed hash function should not be affected by the number of records, but with a stable search efficiency.
b) Bucket overflow handling
If the record is mapped to a bucket and there is no space available, an overflow occurs. The reason for the overflow may be that there is not enough buckets as the number of records grows, or because the hash function is not designed properly or there are too many identical search codes, causing the records to be centrally mapped to some buckets, and a bucket can hold records that are limited, which is known as bucket skew ( Bucket skew).
In order to deal with bucket overflow, you can leave a certain amount in the determination of buckets, but this will cause space waste; You can also use an overflow bucket (overflow bucket) to receive overflow records: Once an overflow occurs, a new overflow bucket is added to receive the overflow record. The overflow bucket and the original bucket form the linked list (overflow chaining), so when looking for data, to increase the presence of overflow buckets of the probe, if present, further in the overflow bucket search.
The disadvantage of this static hash is that the number of buckets must be determined during the design phase, and as the number of records increases or shrinks, the number of buckets cannot follow the change, resulting in overflow or wasted space.
c) Hash Index
In addition to hashing the file organization, you can also organize the index in a hashed manner. The hash function is applied to the search code to determine the corresponding bucket, and then the search code and the corresponding pointer are deposited.


Learning materials: Database System concepts, by Abraham Silberschatz, Henry F.korth, S.sudarshan

Database system concept 14-static hashing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.