I. Differences from collections and dictionaries:
The elements in the set and Dictionary (linear table, binary search tree, AVL Tree, and B tree) have no direct correspondence with key codes,
The elements in the hash table correspond to the key code, and this layer of relationship is bridge through the hash function, and the key code is mapped to an address location in the table to search for elements at a higher speed, the number of comparisons is small.
2. Hash
1. Properties of the hash:
The. Hash function is a compression image function. The key code set is much larger than the hash address set. The ing of different key codes will surely appear on the same hash address, so there is a conflict. However, conflict processing will definitely reduce the search efficiency. Therefore, it is especially important to create a hash with even distribution of addresses, that is, the probability that the mapped addresses will appear equal.
2. Hash function:
The definition field includes the key code to be stored. If the hash list allows m addresses, the value must be 0 ~ The address values calculated by the hash function must be evenly distributed in the hash table.
3.Hash Function Classification
A. Do not leave the remainder. A prime number not greater than m but closer to m must be used as the Division.
B. Digital analysis.
C. The square shows China and France.
D. Folding Method.
4. The closed hash Method for conflict handling: the hash list has m addresses and is changed to m buckets. There can be multiple elements in the bucket, and the key codes in the bucket are synonymous with each other. Because the number of elements in the bucket is small, most of the bucket uses sequential search. The close hash method limits an element in a bucket as an array. The conflict is resolved by how to find the next empty bucket:
A. Linear Probing Method. If it is an empty bucket, insert a new element directly. If it is not empty, search for the next empty bucket in sequence.
The two concepts must be clarified: 1. The average length of a successful search, that is, the number of times empty buckets are searched, and each search is successful at least once.
2. Average search length for unsuccessful searches. Start with an address in the hash list and find the number of searches for the next empty bucket. Each address must be calculated.
B. Secondary Exploration Method
C. double hash. If a hash function conflicts with a conflicting function, a conflicting function must be used. When the function values of the two are added, the address value is used. If a conflict already occurs, a conflicting function is used again, until the empty bucket is obtained.
Advantage: instead of looking for empty buckets one by one, it is helpful to avoid "accumulation" and thus improve the search efficiency.
5.Conflict Handling Method
The difference from the closed hash method: there can be multiple elements in the bucket in the hash list. The elements in the bucket are connected through a single-chain table, which is called a synonym table.
Advantage: Although pointer-related storage space is involved, the closed hash method requires a large amount of free space to maintain search efficiency. For example, the loading factor must not be greater than 0 in the secondary exploration method. 5. therefore, the split-column method saves more space.
Load Factor: α = number of elements in the table/length of the hash table
α is a factor that indicates the filling degree of the hash list. Since the table length is a fixed value, α is proportional to the number of elements in the input table. Therefore, the larger the α value, the more elements in the input table, the higher the possibility of conflict. The smaller the α value, if the number of elements in the table is small, the possibility of conflict is smaller.
Iii. Efficiency Analysis of hash
1. The split hash is better than the closed hash method.
2. In the hash function, the except-residue method is better than other hash functions, and the worst is the folding method.