Preface
HashTable is also called a hash table. It is a fast data query structure. It is usually used to design a hash function H (x) for the data to be recorded ), locate the data based on this function. If it is a closed hash, it is saved directly to the H (x) subscript of the array. If it is an open hash, it is saved to the linked list under the pointer array H (x. In OI, some Pascaler closed hashes used to avoid linked lists are considered rather bad. The reason will be explained later. So this article only talks about hash.
Hash Table organization:
First, we need to determine a hash function H (x). x is the object to be recorded. We use H (x) to determine the location of the record chain of the object.
A pointer array is also required to store the header pointer of each chain. To use the linked list, you must have a class/struct as the basic unit of the linked list.
General implementation of hash tables:
The first is the basic elements of the linked list:
Template <class T>
Struct t_node
{
Public:
T key;
// Other info
T_node * next;
};
Then there is the skeleton of the HashTable class (I encapsulated it as a class here ):
Template <class T>
Class hashtable
{
Public:
Hashtable ();
Int hash (const T & sr );
Void insert ();
T_node * find (const T & sr );
// Add more functions
Private:
T_node * ht [t_size]; // you shoshould define t_size as something before
// Add more things
};
Next is the constructor:
Hashtable <T>: hahstable ()
{
Memset (ht, 0, sizeof (ht ));
}
First, skip the hash function to introduce the insert function:
Void hashtable <T >:: insert (const T & sr)
{
Int loc = hash (sr );
If (ht [loc] = 0)
{
// This field is empty. Insert a new linked list.
Ht [loc] = new t_node ();
Ht [loc]-> key = T;
}
Else
{
T_node * now = ht [loc];
While (true)
{
If (now-> key = sr)
{
// The element already exists.
Return;
}
Else if (now-> next = 0)
{
// This element does not exist in the chain and is inserted locally
Now-> next = new t_node ();
Now-> next-> key = T;
Return;
}
Else now = now-> next;
}
}
}
Then look:
T_node * hashtable <T>: find (const T & st)
{
Int loc = hash (sr );
If (ht [loc] = 0)
{
// This field is empty ~ Returns a null pointer.
Return 0;
}
Else
{
T_node * now = ht [loc];
While (true)
{
If (now-> key = sr)
{
// Found
Return now;
}
Else if (now-> next = 0)
{
// After traversing the entire chain, it is still available ..
Return 0;
}
Else now = now-> next; // view the next element of the chain
}
}
}
Of course, you can make various changes according to the specific situation. If you want to limit the efficiency, you can change the key to a pointer in t_node, and then use your own memory allocation function to replace new.
The simplest hash function:
In fact, the simplest hash table 1 is H (x) = x, which means that if the record object is an integer, this integer is directly used as the subscript (char type can also be considered as an integer ), this is an array, but it can also be seen as a hash table.
The simplest hash table 2 is H (x) = 1, which means that no matter what elements are put in the same subscript, This is the linked list, it can also be seen as a hash table.
Hash Functions of large integers:
When the record object is a big integer, if H (x) = x is used, the range of the array will not be able to afford, so the design of the hash function should be considered at this time, there are also many design methods, the most extensive one is H (x) = x % k, k is usually a prime number.
General hash functions:
We may record something like class or struct. At this time, we can select some key variables to perform an operation to determine the subscript.
Conflict handling:
Even better hash functions are difficult to avoid conflicts. The so-called conflicts mean H (a) = H (B, the Processing Method of hash is to mount a linked list behind the array, so that conflicting elements can be directly mounted to the end of the linked list, and closed hash does not have a linked list, it is generally a repetition of Hn (x) or to H (x) + a (a =, 3 ..) this will make the hash table messy, and conflicts may lead to other conflicts, and it is not easy to estimate the range of the hash array. Therefore, we do not advocate the use of closed hash.
By the way, a good hash function is used to minimize conflicts with the balance, so that the length of each chain is evenly distributed as much as possible. A good hash function design relies on long-term experience accumulation, which is not a task of a day.
The essence of hash table:
The essence of hash is to combine the advantages of arrays and linked lists. The access complexity of arrays is O (1), and the insertion complexity of linked lists is O (1 ), however, the insert complexity of arrays and the access complexity of linked lists are both high, so a hash is generated. We can apply this idea to many places. This is the key point I want to talk about, but the monks are not easy to learn and do not know how to express it. I will sort out the code later.
Author: "Rain logs-RainCode"