Use hashtable in Java

Source: Internet
Author: User
Tags key string rehash

HashTables provides a useful method for ApplicationsProgramTo achieve the best performance.
HashTables is no longer a new concept in the computer field. They are used to accelerate the processing speed of computers. They are processed using today's standards, which is very slow, and they allow you to query many data entries, quickly find a special entry. Although the speed of modern machines is several thousand times faster, HashTables is still a useful method to get the best performance of applications.

Imagine that you have a data file containing about one thousand records, such as a customer record for a small business, and a program that reads the records into the memory for processing. Each record contains a unique five-digit customer ID number, customer name, address, and account balance. Assume that the records are not classified by Customer ID number order. Therefore, if the program wants to use the customer ID number as the "key" to find a special customer record, the only search method is to continuously search for each record. Sometimes, it will quickly find the record you need; but sometimes, before the program finds the record you need, it almost found the last record. If you want to search for 1,000 records, you need to check 500.5 (1000 + 1)/2) records on average for any record. If you often need to search for data, you need a faster way to find a record.

One way to speed up the search is to divide the records into several segments, so that you do not need to search for a large list, but to search for several short lists. For our digital customer ID number, you can create 10 lists, consisting of ID numbers starting with 0 to form a list, ID numbers starting with 1 to form a list, and so on. To search for the customer ID 38016, you only need to search for the list starting with 3. If there are 1,000 records, the average length of each list is 100 (1,000 records are divided into 10 lists ), then the average comparison times for searching a record are reduced to about 50 (see figure 1 ).
Of course, if the customer number of about one tenth starts with 0, and the other tenth starts with 1, this method will be very suitable. If the customer number of 90% starts with 0, there will be 900 records in that list, and 450 comparisons are required on average for each query. In addition, 90% of the searches to be executed by the program are for numbers starting with 0. Therefore, the average comparison is much larger than the range of simple mathematical operations.
If we can allocate records in our list in this way, it will be better, that is, each list has the same record, regardless of the distribution of numbers in the key value. We need a way to combine customer numbers and better distribute the results. For example, we can multiply the number in each digit by a large number (different from the number position), then add the result to produce a total number, divide the number by 10, the remainder is used as the index value (the same divisor is assigned to a group ). When reading a record, the program runs the hash function on the customer number to determine the list of records. When you need to query, use the same hash function as a "key" for the customer number, so that you can search for the correct list. A Data Structure like this is called a hash table (hashtable ).

HashTables in Java
Java contains two classes: Java. util. hashtable and Java. util. hashmap. They provide a hashtable mechanism for multiple purposes. These two classes are very similar and generally provide the same public interface. But they do have some important differences. I will talk about them later.
The hashtable and hashmap objects allow you to combine a key and a value and input the key/value pair to the table using the put () method. Then you can call the get () method and take the key as the parameter to get the value ). As long as two basic requirements are met, the key and value can be any object. Note that because key and value must be objects, the original type (primitive types) must be converted to an object by using methods such as INTEGER (INT.
To use a specific class object as a key, this class must provide two methods: equals () and hashcode (). These two methods are available in Java. lang. object, so all classes can inherit these two methods. However, the implementation of these two methods in the object class is generally useless, so you usually need to reload these two methods by yourself. The equals () method compares its object with another object. If the two objects represent the same information, true is returned. This method also checks and ensures that the two objects belong to the same class. If the two reference objects are exactly the same, object. Equals () returns true, which explains why this method is generally not suitable. In most cases, you need a method to compare a field and a field. Therefore, different objects representing the same data are equal.
The hashcode () method executes a hash function using the object content to generate an int value. Hashtable and hashmap use this value to determine which bucket (hash element) (or list) the key/value is in. For example, we can look at the string class because it has its own method to implement these two methods. String. Equals () compares two string objects with one character and one character. If the strings are the same, true is returned:
String myname = "Einstein ";
// The following test is
// Always true
If (myname. Equals ("Einstein "))
{...
String. hashcode () runs the hash function on a string. Number of each character in the string Code Multiply by 31. The result depends on the character position in the string. Then, add the calculation results to obtain a total number. This process seems complicated, but it ensures better distribution of values. It also proves how far you can go when developing your own hashcode () method and is sure that the result is unique.
For example, suppose I want to use a hashtable to implement a book directory and use the ISBN number of the book as the search key for search. I can use the string class to carry details and prepare the equals () and hashcode () methods. We can use the put () method to add key/value pairs to hashtable.
The put () method accepts two parameters, both of which belong to the object type. The first parameter is the key, and the second parameter is the value. The put () method calls the hashcode () method of the key and uses the number of lists in the table to divide the result. Use the remainder as the index value to determine the list to which the record is added. Note that keys are unique in the table. If you use an existing key to call put (), the matched entries are modified, therefore, it refers to a new value, and the old value is returned (when the key does not exist in the table, put () returns a null value ). To read a value from the table, we use the search key for the get () method. It returns an object of the correct type for conversion:
Bookrecord BR = (bookrecord) isbntable. Get ("0-345-40946-9 ");
System. Out. println ("Author:" + Br. Author + "title:" + Br. Title );

Another useful method is remove (). Its usage is almost the same as get (). It deletes entries from the table and returns them to the caller.

Your own class
If you want to use an original type as a key, you must create an object of the same type. For example, if you want to use an integer key, you should use the Constructor Integer (INT) to generate an object from the integer. All encapsulation classes, such as integer, float, and Boolean, regard the original values as objects and reload the equals () and hashcode () methods. Therefore, they can be used as keys. Many other classes provided in JDK are also like this (even hashtable and hashmap classes implement their own equals () and hashcode () methods ), but you should check the file before using any class object as hashtable keys. It is also necessary to check the class source and how equals () and hashcode () are implemented. For example, byte, character, short, and integer return the represented Integer as the hash code. This may or may not be suitable for your needs.

If you want to create a hashtable that uses a defined class object as the key, you should be sure that the equals () and hashcode () of this class () methods provide useful values. First, check your extended class to determine whether its implementation meets your needs. If not, you should overload the method.

The basic design constraint of any equals () method is that if the object passed to it belongs to the same class and its data field is set to indicate the same data value, then it should return true. You should also be sure that if you pass an empty parameter to this method, your code returns false:
Public Boolean equals (Object O ){
If (O = NULL) |! (O instanceof myclass )){
Return false;
}

// Now compare data fields...

In addition, some rules should be remembered when designing a hashcode () method. First, the method must return the same value for a specific object, regardless of the number of times the method is called (of course, as long as the object content does not change between calls, this should be avoided when an object is used as a hashtable key ). Second, if the two objects defined by your equals () method are equal, they must also generate the same hash code. Third, it is more like a policy than a principle. You should try to design a method to generate different results for different object content. It doesn't matter if different objects occasionally generate the same hash code. However, if this method can only return values ranging from 1 to 10, only 10 lists can be used, regardless of the number of hashtable lists.

When designing equals () and hashcode (), another factor to remember is performance problems. Each call to put () or get () includes calling hashcode () to find the correct list. When get () scans the list to find the key, it calls equals () for each element in the list (). Implement these methods to make them run as quickly and effectively as possible, especially when you plan to make your class public and available, because other users may want to execute at an important speed, use your class in high-performance applications.

Hashtable Performance
The main factor affecting hashtable efficacy is the average length of the table list, because the average search time is directly related to the average length. Obviously, to reduce the average length, you must increase the number of lists in hashtable. If the number of lists is so large that most or all lists contain only one record, you will get the best search efficiency. However, this may be too much. If your hashtable list is much larger than the number of data entries, you do not have to spend such memory. In some cases, this is not acceptable.
In our previous example, we know in advance how many records we have 1,000. After knowing this, we can determine how many lists should be included in our hashtable to achieve the best compromise between search speed and memory usage efficiency. However, in many cases, you do not know how many records you want to process in advance; the files read from the data may be extended, or the number of records may change greatly throughout the day.
As the number of entries increases, the hashtable and hashmap classes dynamically expand tables to solve this problem. Both classes have constructors that accept the initial number of table lists, and a load factor as a parameter ):
Public hashtable (INT initialcapacity, float loadfactor)

Public hashmap (INT initialcapacity, float loadfactor)

calculate a critical value by multiplying the two numbers. Each time a new entry is added to a hash table, the count is updated. When the count exceeds the critical value, the table is reset (rehash ). (The number of items in the list is increased to double the previous number by 1, and all entries are transferred to the correct list .) The default constructor sets the initial capacity to 11 and the load factor to 0.75, so the critical value is 8. When the Ninth Record is added to the table, the hash table is adjusted to have 23 lists. The new critical value is 17 (an integer of 23*0.75 ). As you can see, the load factor is the upper limit of the average number of hash tables, which means that by default, there are very few hash tables that contain more than one record. Compare our original example. In that example, we have 1,000 records distributed in 10 lists. If the default value is used, the table will be expanded to include more than 1,500 lists. But you can control this. If the number of lists multiplied by the load factor is greater than the number of entries you process, the table will never be duplicated. So we can follow the example below: // table will not rehash until it
// has 1,100 entries (10*110):
hashtable myhashtable = new hashtable (10,110.0 F);

you may not want to do this unless you do not have an empty list to save memory and do not mind the extra search time. This may happen in the embedded system. However, this method may be useful because resetting takes up a lot of computing time, which ensures that this will never happen again. Note: Although calling put () can increase the number of tables (the number of tables is increased), calling remove () will not have the opposite result. Therefore, if you have a large table and delete most of the entries from it, you will have a large but mostly empty table.
hashtable and hashmap
the hashtable and hashmap classes have three important differences. The first difference is mainly due to the historical reasons. Hashtable is based on the obsolete dictionary class, And hashmap is an implementation of the map interface introduced by Java 1.2.
perhaps the most important difference is that the hashtable method is synchronous, but the hashmapu method is not. This means that, although you can use a hashtable in a multi-threaded application without taking any special actions, you must also provide external synchronization for a hashmap. A convenient method is to use the static synchronizedmap () method of the collections class to create a thread-safe map object and return it as an encapsulated object. The method of this object allows you to access the potential hashmap synchronously. The result is that when you do not need synchronization, you cannot cut off the synchronization in hashtable (for example, in a single-threaded application), and synchronization increases a lot of processing costs. The third difference is that only hashmap allows you to use a null value as the key or value of a table entry. Only one record in hashmap can be an empty key, but any number of entries can be empty values. That is to say, if no search key is found in the table, or if a search key is found but it is an empty value, get () returns NULL. If necessary, use the containkey () method to differentiate the two cases.

It is recommended that hashtable be used when synchronization is required, and hashmap be used. However, when necessary, hashmap can be synchronized. hashmap has more functions than hashtable, and it is not based on an old class. Some people think that, in various cases, hashmap takes precedence over hashtable.

About Properties
Sometimes, you may want to use a hashtable to map the key string to the value string. There are some examples of Environment strings in DOS, windows, and Unix. For example, the key string path is mapped to the value string c: \ windows; C: \ WINDOWS \ SYSTEM. HashTables is a simple method to represent these, but Java provides another method.
The Java. util. properties class is a subclass of hashtable and is designed for string keys and values. The usage of properties objects is similar to that of hashtable, but the class adds two time-saving methods, you should know. The store () method saves the content of a properties object in a readable form to a file. The load () method is the opposite. It is used to read files and set properties objects to include keys and values. Note: Because properties extends hashtable, you can use the put () method of the super class to add keys and values that are not a string object. This is not desirable. In addition, if you use store () for a properties object that does not contain a String object, store () will fail. As an alternative to put () and get (), you should use setproperty () and getproperty (), which use string parameters. Now, I hope you can know how to use HashTables to accelerate your processing.

About Author:
Pete Ford has been engaged in software development for more than 20 years. He mainly studies embedded systems and turnkey systems. He lives and works in Dallas, Texas. You can contact him via p_ford@mindspring.com.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.