First, let's take a look at the basic concepts.
A hash table is a table that stores key-value pairs, the reason for not calling it map (the storage of key-value pairs together is usually called map) is that it has the following features: it can map the key code to a location in the table for direct access, in this way, the access speed is very fast. The ing function is called a hash function ).
1) for the key keyword, F (key) is the storage location, and F is the hash function.
2) If key1! = Key2 but F (key1) = f (key2). This phenomenon is called collison ). Conflicts are inevitable because the key value is infinite and the table capacity is always limited (* See the last question *). We are pursuing any keyword. The probability of the addresses in the table to be hashed is equal. Such a hash function is a uniform hash function.
Multiple Hash Functions
* Direct addressing: a linear function value that obtains a keyword or keyword is a discrete address. That is, H (key) = key or H (key) = A · key + B, where A and B are constants (this hash function is called its own function)
× Digital Analysis
X China and France
× Folding Method
× Random Number Method
×Division and residual remainder: the remainder obtained after the keyword is divided by a number p that is not greater than m in the hash table is the hash address. That is, H (key) = Key mod P, P <= m. You can not only directly modulo keywords, but also perform the modulo operation after the folding and square calculation. The choice of P is very important. Generally, the prime number or M is used. If P is not good, synonyms are easily generated.
As you can imagine, when the number of data in a table is close to the size of the table, the probability of conflict increases significantly. Therefore, when the "number of data/table capacity" reaches a certain proportion, the table capacity needs to be expanded. This proportion is called the load factor ).
There are two main methods to resolve conflicts:
* The separated link method is used to connect different elements from hash to the same address with a linked list. It is also called the zipper method.
× Open address method. If there is an address conflict, find the address nearby. Including linear detection, square detection, and dual hash.
Then let's take a look at the hashtable Implementation of Java.
Java. util. hashtable is essentially an array, and the elements of the array are linked key-value pairs (one-way linked list ).
Java code private transient entry [] Table; // entry Array
Private Static class entry <K, V> implements map. entry <K, V> {int hash; k key; V value; entry <K, V> next; // entry indicates a single-chain table ...}
We can use the constructor that specifies the array size and fill factor, or use the default constructor. The default array size is 11 and the fill factor is 0.75.
public Hashtable(int initialCapacity, float loadFactor) { ... } public Hashtable() { this(11, 0.75f); }
To expand the array, the size becomes oldcapacity * 2 + 1. Of course, this cannot guarantee that the array size is always a prime number.
Let's take a look at the element insertion method. Put method:
public synchronized V put(K key, V value) { // Make sure the value is not null if (value == null) { throw new NullPointerException(); } // Makes sure the key is not already in the hashtable. Entry tab[] = table; int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF) % tab.length; for (Entry<K, V> e = tab[index]; e != null; e = e.next) { if ((e.hash == hash) && e.key.equals(key)) { V old = e.value; e.value = value; return old; } } }
In Java, the object class has several methods, one of which is hashcode (). This indicates that all objects in Java have this method, and the object's own hash code can be obtained by calling. Return the remainder address of the table length, and use the linked list in conflicting positions.
Hashmap has almost the same functionality as hashtable. However, the initial size of the hashmap array is 16 rather than 11. When you want to expand the array, the size is changed to twice the original size, and the default loading factor is also 0.75. the put method is as follows:
public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K, V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; } /** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); }
Let's look at hashtable in other open-source Java libraries.
There are currently multiple open-source Java Collection implementations, with different purposes and different focuses. The following analysis of hash tables in the open-source framework mainly involves the following aspects: Default loading Factor and capacity extension methods, hash functions, and conflict resolution methods.
1. The trove-trove Library provides an efficient set of basic collection classes.
The inheritance relationship of GNU. Trove. Set. Hash. thashmap: thashmap-> tobjecthash-> thash. The internal keys and values are represented by two arrays respectively. The method of conflict resolution adopts the open addressing method, and the open addressing method has high requirements on space. Therefore, the default loading factor load factor is 0.5, instead of 0.75. The following code is explained step by step:
The default value is initialization. The filling factor is 0.5. The array size starts from the prime number, that is, it is always a prime number.
/** the load above which rehashing occurs. */ public static final float DEFAULT_LOAD_FACTOR = 0.5f; protected int setUp( int initialCapacity ) { int capacity; capacity = PrimeFinder.nextPrime( initialCapacity ); computeMaxSize( capacity ); computeNextAutoCompactionAmount( initialCapacity ); return capacity; }
Then, let's look at the put method. insertkey (T key) is its hash algorithm. After the hash code is used to remainder the array length, obtain the index. First, check whether the position is occupied, if it is occupied, use the double hash algorithm to resolve the conflict, that is, the insertkeyrehash () method in the code.
public V put(K key, V value) { // insertKey() inserts the key if a slot if found and returns the index int index = insertKey(key); return doPut(value, index); } protected int insertKey(T key) { consumeFreeSlot = false; if (key == null) return insertKeyForNull(); final int hash = hash(key) & 0x7fffffff; int index = hash % _set.length; Object cur = _set[index]; if (cur == FREE) { consumeFreeSlot = true; _set[index] = key; // insert value return index; // empty, all done } if (cur == key || equals(key, cur)) { return -index - 1; // already stored } return insertKeyRehash(key, index, hash, cur); }
2. javolution-provides Java solutions for real-time, built-in, and high-performance systems
In javolution, the hash table is jvolution. util. fastmap, which is a two-way linked list. The default initial size is 16, which is twice the size during expansion. The load factor is not explicitly defined. We can see from the following statement that the value is 0.5.
if (map._entryCount + map._nullCount > (entries.length >> 1)) { // Table more than half empty. map.resizeTable(_isShared); }
Let's look at the put function. What's amazing is the acquisition of its index and slot, which is completely achieved by shift of hashkey, so that index calculation and collision avoidance are ensured.
private final Object put(Object key, Object value, int keyHash, boolean concurrent, boolean noReplace, boolean returnEntry) { final FastMap map = getSubMap(keyHash); final Entry[] entries = map._entries; // Atomic. final int mask = entries.length - 1; int slot = -1; for (int i = keyHash >> map._keyShift;; i++) { Entry entry = entries[i & mask]; if (entry == null) { slot = slot < 0 ? i & mask : slot; break; } else if (entry == Entry.NULL) { slot = slot < 0 ? i & mask : slot; } else if ((key == entry._key) || ((keyHash == entry._keyHash) && (_isDirectKeyComparator ? key.equals(entry._key) : _keyComparator.areEqual(key, entry._key)))) { if (noReplace) { return returnEntry ? entry : entry._value; } Object prevValue = entry._value; entry._value = value; return returnEntry ? entry : prevValue; } } ... }