Hash
Basic definition
Simple Application Server
USD1.00 New User Coupon
* Only 3,000 coupons available.
* Each new user can only get one coupon(except users from distributors).
* The coupon is valid for 30 days from the date of receipt.
1. Hash: Determine the location of data storage by the value of the data item. The storage location in the hash table is called a slot.
2. Hash function: The function that realizes the conversion from a data item to a storage slot is called a hash function.
3. Slot number: the storage location of the data item returned by the hash function.
Several commonly used hash functions:
Find the remaining hash:
method:
Divide the data item by the size of the hash table and use the remainder as the slot number.
In fact, the "remainder" method will appear in all hash functions in different forms. Because the slot number returned by the hash function must be within the size of the hash table, the size of the hash table is generally calculated.
Data search:
You only need to use the same hash function to calculate the search item, and test whether there is a data item in the slot corresponding to the returned slot number
insufficient:
There may be a "conflict" phenomenon. That is: two different data items get the same slot number after calculating the remainder.
Perfect hash function:
1. Method:
Given a set of data items, if a hash function can map each data item to a different slot. For a fixed set of data, we can always find ways to design a perfect hash function.
2. Insufficiency:
But if this set of data changes frequently, it is difficult to have a perfect hash function (that is, there will be some conflicts, but! The conflicts are actually not fatal, we can handle them properly!)
3. The method of designing a perfect hash function:
①Design a large enough hash table (that is, expand the capacity of the hash table) so that all possible data items can occupy different slots. (Not practical)
②Follow the second, a good hash function needs to have characteristics:
Least conflict (approximately perfect),
Low computational difficulty (small additional overhead),
Fully disperse data items (save space)
One of the applications of hashing
Area chain
Introduction to the area chain: The area chain is a distributed database. Each node connected through the network stores all the data of the entire database. The data stored in any location will be synchronized.
Its essential feature: decentralization, that is, there is no control center, coordination center node, all nodes are equal and cannot be controlled.
The blockchain is composed of blocks, which are divided into head and body
The block header records some metadata and information linked to the previous block.
Generation time, hash value of the previous block (head+body)
The area chain cannot be modified:
Since the hash value is resistant to modification, any change to the data of a certain block will inevitably cause a change in the hash value. In order not to cause this block to leave the chain, all subsequent blocks need to be modified.
Due to the "proof of work" mechanism, this kind of large-scale modification is impossible unless you master 51% of the computing power of the entire network.
Hash function design
1. Folding method:
The data item is divided into several segments according to the number of bits, and then several segments of numbers are added together, and finally the size of the hash table is calculated to obtain the hash value.
Sometimes the folding method also includes a step of reversing the interval
2. Take the middle of the square (the amount of calculation is slightly larger):
First, the data item is squared, then the middle two digits of the squared number are taken, and the remainder of the hash table is calculated.
3. Non-numerical items:
It is also possible to hash non-digital data items, and treat each character in the string as an ASCII code, and then accumulate these integers to calculate the remainder of the hash table size.
note:
Such a hash function returns the same hash value for all anagrams. To prevent this, the position of the string can be used as a weighting factor, multiplied by the ord value.
4. Digital analysis method:
For a given set of key codes, analyze the frequency of each digit in all key codes, and select several numbers with better distribution as the value of the hash function.
Basic principles of hash function design
The hash function cannot be too complicated, otherwise it will become a computational burden for the stored procedure and the search process.
The hash value should be distributed as evenly as possible
Conflict resolution:
1. Resolve conflicts:
A systematic method to store the second data item in the conflicting data in the hash table.
2. Solution
Open addressing: Find an open empty slot to save. The easiest way is to start from the conflicting slot and scan backwards until it encounters an empty slot. If the end of the hash table is not found, then scan from the beginning.
The method of searching backward one by one is the "linear detection" in open addressing technology.
3. Disadvantages:
There is a tendency to gather easily.
4. Improvement:
Change one-by-one detection to skip detection (re-hashing).