How does a "serial" relational database work? (6)-Hash table

Source: Internet
Author: User
Tags database join

Finally, we introduce the important data structure is the hash table. It is useful when you need to find quickly, and understanding the hash table will help us understand the common database join method of hash join. This data structure is often used by the database as a storage internal data structure: A table lock or a cache pool (which is described in subsequent chapters).

The hash table can be quickly found by element key, in order to build a hash table, you need to define:

    • Key of an element;
    • A hash function of key, the hash value of key represents the location of the element (we are often called the hash bucket);
    • A comparison function of key, once you have found the correct bucket, you can find the correct element by comparing the function.
A simple example

Let's look at a dummy example:

In the hash table actually has 10 barrels, the hash function is to take 10 of the remainder, that is, the single digit of each key:

    • If the single digit is 0, then the element is in the number No. 0 bucket;
    • If the single digit is 1, then the element is in the number 1th bucket;
    • If the single digit is 2, then the element is in the number 2nd bucket;
    • ...

A comparison function is a function that compares two integers. If we want to find 78:

    • The hash value of 78 for a hash table calculation is 8;
    • Find the number 8th bucket, the first element is 78;
    • returns 78;
    • The entire search costs 2 operations: 1-Calculates the hash value; 2-finds the element in the bucket;

If we want to find 59:

    • The hash value of 59 for a hash table calculation is 9;
    • Find Bucket 9th, the first element is 99,99! =59, so this is not the element I'm looking for;
    • With the same logic found 9, 79, ..., last 29;
    • Element 59 does not exist;
    • The total search cost 7 operations.
The standard of good hash function

The criteria depend on the value you are looking for, and the different types of values are spent differently.
If the hash function in the previous example is swapped for the remainder of the 1 000 000 (that is, the last 6 digits), the operand consumed by the second example will be reduced to 1 because there is no element in the No. 000059 bucket. In fact, the real difficulty is finding a hash function that minimizes the number of elements in each bucket. (The Translator notes: We generally call this a reduced hash conflict )

In the above two examples, it is easy to find a good hash function. However, it is difficult to find a good hash function when key is of the following types:

    • 1 strings, such as a person's name;
    • 2 strings, such as a person's surname + first name;
    • 2 strings and a date, such as a person's surname + name + date of birth.

As long as you have a hash function that is good enough, the time complexity of the search is O (1).

Comparison of arrays and hash tables

What is the use of arrays? This is a good question!

    • Hash-based database table, you can load only the normal bucket in memory, the other buckets can be left on the disk;
    • An array must occupy a contiguous memory space, and if a database table based on a two-dimensional array is large, it is difficult to find enough contiguous space in memory;
    • Hash-based database table, you can choose any key, such as can choose key for the country + name.

For more information, refer to another article I wrote about Java HashMap. But understanding this article does not require you to understand Java.

In the next chapter, we'll start with a holistic view of the database.

How does a "serial" relational database work? (6)-Hash table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.