Hash is a common technique used to store data. In this way, data can be inserted and retrieved quickly. The data structure used by hash is called a hash. Although the hash allows you to quickly insert, delete, and retrieve data, you cannot perform a search operation such as the maximum or minimum value. For such operations, other data structures will be more suitable .. The. NET Framework library provides a very useful class for processing the hash, that is, the hashtable class.
A. Hash. The hash data structure is designed around arrays. Although the size of the array can be increased later as needed, the array is composed of elements ranging from element 0th to some predefined size. Each data item stored in the array is based on some data blocks, which is called a keyword. To store an element in a hash, the so-called hash function is used to map the keyword to a number ranging from 0 to the hash size. The ideal goal of a hash function is to store every keyword in its unit in an array. However, since the possible keywords are unlimited, and the size of the array is limited, therefore, the objective of the hash function is to evenly distribute keywords to the elements of the array as much as possible.
1. Select the hash function. Selecting a hash function is based on the data type of the keywords used. If the keyword used is an integer, the simplest function is to return the result of the keyword modulo the array size. However, this method is not recommended in some cases. For example, if the keywords all end with 0 and the array size is 10. This is one of the reasons why the array size must always be a prime number. In addition, if the keyword is a random integer, the hash function should distribute the keyword more evenly. The following program illustrates how the hash function works:
using System;class chapter12{ static void Main() { string[] names = new string[99]; string name; string[] someNames = new string[]{"David","Jennifer", "Donnie", "Mayo","Raymond","Bernica", "Mike", "Clayton", "Beata", "Michael"}; int hashVal; for (int i = 0; i < 10; i++) { name = someNames[i]; hashVal = SimpleHash(name, names); names[hashVal] = name; } ShowDistrib(names); } static int SimpleHash(string s, string[] arr) { int tot = 0; char[] cname; cname = s.ToCharArray(); for (int i = 0; i <= cname.GetUpperBound(0); i++) tot += (int)cname[i]; return tot % arr.GetUpperBound(0); } static void ShowDistrib(string[] arr) { for (int i = 0; i <= arr.GetUpperBound(0); i++) if (arr[i] != null) Console.WriteLine(i + " " + arr[i]); }}
2. Search for data in the hash table. To search for data in the hash list, you need to calculate the hash value of the keyword and then access the corresponding element in the array. This is simple. The following are functions:
static bool InHash(string s, string[] arr){ int hval = BetterHash(s, arr); if (arr[hval] == s) return true; else return false;}
3. resolve the conflict. This is inevitable when processing the hash, that is, the calculated hash value of the keyword has stored Another keyword. This is the so-called conflict. Several techniques can be used in the event of a conflict. These technologies include the bucket-based hash method, the open addressing method, and the dual hash method.
(1). Bucket hash method. A bucket is a simple data structure stored in a hash element. It can store multiple data items. In most implementations, this data structure is an array, but arraylist will be used in the implementation here. It will allow you to allocate more space without considering that the operation is out of the range. Finally, this data structure will make the implementation more efficient. To insert a data item, use the hash function to determine which arraylist is used to store the data item. Check whether the data item is already in the arraylist. If it exists, nothing will be done. If the data does not exist, call the add method to add the data item to the arraylist. To remove a data item from the hash list, first determine the hash value of the data item to be removed and go to the corresponding arraylist. Then check to make sure that the data item is in the arraylist
. If yes, remove it. As follows:
public class BucketHash{ private const int SIZE = 101; ArrayList[] data; public BucketHash() { data = new ArrayList[SIZE]; for (int i = 0; i <= SIZE - 1; i++) data[i] = new ArrayList(4); } public int Hash(string s) { long tot = 0; char[] charray; charray = s.ToCharArray(); for (int i = 0; i <= s.Length - 1; i++) tot += 37 * tot + (int)charray[i]; tot = tot % data.GetUpperBound(0); if (tot < 0) tot += data.GetUpperBound(0); return (int)tot; } public void Insert(string item) { int hash_value; hash_value = Hash(itemvalue); if (data[hash_value].Contains(item)) data[hash_value].Add(item); } public void Remove(string item) { int hash_value; hash_value = Hash(item); if (data[hash_value].Contains(item)) data[hash_value].Remove(item); }}
When the bucket hash method is used, the most important thing to do is to keep as few arraylist elements as possible. When you add data items to or remove data items from the hash, This minimizes the additional work that needs to be done. In the previous code, you can minimize the size of the arraylist by setting the initial capacity of each arraylist In the constructor call. Once there is a conflict, the arraylist capacity will change to 2, and the capacity will be doubled every time the arraylist is full. Although a good hash function is used, the arraylist should not be too large. The ratio of the number of elements to the table size in the hash is called the load factor. Studies show that the load coefficient is 1.0
Or when the table size is equal to the number of elements, the hash table has the best performance.
(2). Open addressing method. The open addressing function searches for empty units in the hash array to place data items. If the first unit is full, try the next empty unit until an empty unit is found. In this section, we will see two different open address policies: Linear and square probes. The linear probe method uses linear functions to determine the array units to be inserted. This means that the unit will be tried in sequence until it is found.
An empty unit. The problem with linear profiling is that the data elements in adjacent units in the array are close to clustering, which makes the subsequent exploration of empty units longer and less efficient. The square method solves the clustering problem. The square function is used to determine the unit to be tried.
(3). double hash. The dual hash method is an interesting conflict resolution policy, but it has already been demonstrated that the square probing method usually achieves better performance.
B. hashtable class. The hashtable class is a special type of a dictionary object. It stores key-value pairs. The values are stored Based on the hash code derived from the keywords. Here, you can specify a hash function for the Data Type of the keyword or use the built-in function (which will be discussed later ). The hashtable class is very efficient and should be used wherever possible for custom implementation.
1. Use hashtable. The hashtable class is part of the system. Collections namespace. Therefore, you must import system. Collections partially at the beginning of the program. The hashtable object can be instantiated as follows:
Hashtable symbols = new Hashtable();HashtTable symbols = new Hashtable(50);HashTtable symbols = new Hashtable(25, 3.0F);
You can use the add method to add a key-value pair to the hash table. This method removes two parameters: the keyword and the value associated with the keyword. After calculating the hash value of a keyword, this keyword is added to the hash list. As follows:
Hashtable symbols = new Hashtable(25);symbols.Add(" salary", 100000);symbols.Add(" name", "David Durr");symbols.Add(" age", 43);symbols.Add(" dept", "Information Technology");
You can also use indexes to add elements to the hash list. To do this, you need to write a value assignment statement to assign a value to the specified keyword as an index (which is very similar to an array index ). If this keyword does not exist, add a new hash element to the hash table. If the keyword already exists, use a new value to overwrite the existing value. As follows:
Symbols["sex"] = "Male";Symbols["age"];
The hashtable class has two very useful methods for retrieving keywords and values from the hash: keys and values. These methods create an enumerator object that allows you to use for each loops or other techniques to check keywords and values.
using System;using System.Collections;class chapter12{ static void Main() { Hashtable symbols = new Hashtable(25); symbols.Add(" salary", 100000); symbols.Add(" name", "David Durr"); symbols.Add(" age", 45); symbols.Add(" dept", "Information Technology"); symbols["sex"] = "Male"; Console.WriteLine("The keys are: "); foreach (Object key in symbols.Keys) Console.WriteLine(key); Console.WriteLine(); Console.WriteLine("The values are: "); foreach (Object value in symbols.Values) Console.WriteLine(value); }}
2. Practical Methods of the hashtable class.
(1) The. Count attribute stores the number of elements in the hash list. It returns an integer.
(2) the clear method can immediately remove all elements from the hash.
(3) The remove method removes the keyword and removes both the specified keyword and the associated value.
(4) check whether the element or value is in the hash using the containskey method.
3. hashtable application. The program first reads a series of terms and definitions from a text file. This process is implemented by coding in the subprogram buildglossary. The text file structure is: Word, definition, separated by commas (,) between words and their definitions. Each word in this glossary is a separate word, but the glossary can easily replace the processing phrase. This is why comma is used instead of space as the separator. In addition, this structure allows the use of words as keywords, which is the correct method to construct this hash. Another subroutine displaywords displays a word in a list box, so you can select a word to get its definition. Since a word is a keyword, keys can be used.
Returns a word from the hash. Then, the user can see the defined words. You can click a word in the list box to obtain its definition. You can use the item method to retrieve the definition and display it in the text box.
using System;using System.Collections.Generic;using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using System.Collections;using System.IO ;namespace WindowsApplication3{ public partial class Form1 : Form { private Hashtable glossary = new Hashtable(); public Form1() { InitializeComponent(); } private void Form1_Load(object sender, EventArgs e) { BuildGlossary(glossary); DisplayWords(glossary); } private void BuildGlossary(Hashtable g) { StreamReader inFile; string line; string[] words; inFile = File.OpenText(@"c:\words.txt "); char[] delimiter = new char[] { ',' }; while (inFile.Peek() != -1) { line = inFile.ReadLine(); words = line.Split(delimiter); g.Add(words[0], words[1]); } inFile.Close(); } private void DisplayWords(Hashtable g) { Object[] words = new Object[100]; g.Keys.CopyTo(words, 0); for (int i = 0; i <= words.GetUpperBound(0); i++) if (!(words[i] == null)) lstWords.Items.Add((words[i])); } private void lstWords_SelectedIndexChanged(object sender, EventArgs e) { Object word; word = lstWords.SelectedItem; txtDefinition.Text = glossary[word].ToString(); } }}