Hashtable for Java Study Notes

Last Update:2018-12-03 Source: Internet

Author: User

Tags key string rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HashTables provides a useful method to optimize application performance.

HashTables is no longer a new concept in the computer field. They are used to accelerate the processing speed of computers. They are processed using today's standards, which is very slow, and they allow you to query many data entries, quickly find a special entry. Although the speed of modern machines is several thousand times faster, HashTables is still a useful method to get the best performance of applications.

Imagine that you have a data file containing about one thousand records ?? For example, there is another program for a small business's customer record, which reads the record into the memory for processing. Each record contains a unique five-digit customer ID number, customer name, address, and account balance. Assume that the records are not classified by the order of customer ID numbers. Therefore, if the program wants to use the customer number as the "key" to find a special customer record, the only search method is to continuously search for each record. Sometimes, it will quickly find the record you need; but sometimes, before the program finds the record you need, it almost searches for the last record. If you want to search for 1,000 records, you need to perform an average checksum of 500.5 to search for any record.
(1000 + 1)/2) records. If you often need to search for data, you need a faster way to find a record.

One way to speed up the search is to divide the records into several segments, so that you do not need to search for a large list, but to search for several short lists. For our digital customer ID number, can you create 10 lists ?? IDS starting with 0 form a list, and IDs starting with 1 form a list, and so on. To search for the customer ID 38016, you only need to search for the list starting with 3. If there are 1,000 records, the average length of each list is 100 (1,000 records are divided into 10 lists ), then the average comparison times for searching a record are reduced to about 50 (see figure 1 ).

Of course, if the customer number of about one tenth starts with 0, and the other tenth starts with 1, this method will be very suitable. If the customer number of 90% starts with 0, there will be 900 records in that list, and 450 comparisons are required on average for each query. In addition, 90% of the searches to be executed by the program are for numbers starting with 0. Therefore, the average comparison is much larger than the range of simple mathematical operations.

If we can allocate records in our list in this way, it will be better, that is, each list has the same record, regardless of the distribution of numbers in the key value. We need a way to combine customer numbers and better distribute the results. For example, we can multiply the number in each digit by a large number (different from the number position), then add the result to produce a total number, divide the number by 10, use the remainder as the index value ). When reading a record, the program runs the hash function on the customer number to determine the list of records. When you need to query, use the same hash function as a "key" for the customer number, so that you can search for the correct list. A Data Structure like this
It is called a hash table (hashtable ).

JavaHashTables in
JavaContains two classes,Java. Util. hashtable andJava. Util. hashmap, which provides a multi-purpose hashtable mechanism. These two classes are very similar and generally provide the same public interface. But they do have some important differences. I will talk about them later.

The hashtable and hashmap objects allow you to combine a key with a value and usePut() Method to input the key/value pair to the table. Then you can call the get () method and take the key as the parameter to get the value ). As long as two basic requirements are met, the key and value can be any object. Note that because key and value must be objects, the original type (primitive
Types) must be converted to an object by using methods such as INTEGER (INT.

To use a specific class object as a key, this class must provide two methods: equals () and hashcode (). The two methods areJava. Lang. object, so all classes can inherit these two methods. However, the implementation of these two methods in the object class is generally useless, so you usually need to reload these two methods by yourself.

The equals () method compares its object with another object. If the two objects represent the same information, true is returned. This method also checks and ensures that the two objects belong to the same class. If the two reference objects are exactly the same, object. Equals () returns true, which explains why this method is generally not suitable. In most cases, you need a method to compare a field and a field. Therefore, different objects representing the same data are equal.

The hashcode () method executes a hash function using the object content to generate an int value. Hashtable and hashmap use this value to determine which bucket (hash element) (or list) the key/value is in.

For example, we can look at the string class because it has its own method to implement these two methods. String. Equals () compares two string objects with one character and one character. If the strings are the same, true is returned:
String myname = "Einstein ";
// The following test is
// Always true
If (myname. Equals ("Einstein "))
{...

String. hashcode () runs the hash function on a string. The code for each character in a string is multiplied by 31, and the result depends on the character position in the string. Then, add the calculation results to obtain a total number. This process seems complicated, but it ensures better distribution of values. It also proves how far you can go when developing your own hashcode () method and is sure that the result is unique.

For example, suppose I want to use a hashtable to implement a book directory and use the ISBN number of the book as the search key for search. I can use the string class to carry the details, and have prepared the equals () and hashcode () methods (see list 1 ). We can usePut() Add key/value pairs to hashtable (see list 2 ).

Put() The method accepts two parameters, both of which belong to the object type. The first parameter is the key, and the second parameter is the value.PutThe () method calls the hashcode () method of the key and uses the number of lists in the table to divide the result. Use the remainder as the index value to determine the list to which the record is added. Note that keys are unique in the table. If you use an existing key to callPut(), The matched entry is modified, so it refers to a new value, and the old value is returned (when the key does not exist in the table,Put() Returns a null value ).

To read a value from the table, we use the search key for the get () method. It returns an object of the correct type. For details, refer to: bookrecord BR =
(Bookrecord) isbntable. Get (
"0-345-40946-9 ");
System. Out. println (
"Author:" + Br. Author
+ "Title:" + Br. Title );

Another useful method is remove (). Its usage is almost the same as get (). It deletes entries from the table and returns them to the caller.

Your own class
If you want to use an original type as a key, you must create an object of the same type. For example, if you want to use an integer key, you should use the Constructor Integer (INT) to generate an object from the integer. All encapsulation classes ?? For example, integer, float, and Boolean all regard the original values as objects. they reload the equals () and hashcode () methods, so they can be used as keys. Many other classes provided in JDK are also like this (even hashtable and hashmap classes implement their own equals () and hashcode () methods ), but you use any class object as hashtable
Check the file before keys. It is also necessary to check the class source and how equals () and hashcode () are implemented. For example, byte, character, short, and integer return the represented Integer as the hash code. This may or may not be suitable for your needs.

InJavaUse HashTables

If you want to create a hashtable that uses a defined class object as the key, you should be sure that the equals () and hashcode () of this class () methods provide useful values. First, check your extended class to determine whether its implementation meets your needs. If not, you should overload the method.

The basic design constraint of any equals () method is that if the object passed to it belongs to the same class and its data field is set to indicate the same data value, then it should return true. You should also be sure that if you pass an empty parameter to this method, your code returns false: Public Boolean equals (Object O)
{
If (O = NULL)
|! (O instanceof myclass ))
{
Return false;
}

// Now compare data fields...

In addition, some rules should be remembered when designing a hashcode () method. First, the method must return the same value for a specific object, regardless of the number of times the method is called (of course, as long as the object content does not change between calls, this should be avoided when an object is used as a hashtable key ). Second, if the two objects defined by your equals () method are equal, they must also generate the same hash code. Third, this is more like a policy than a principle. You should try to design a method to generate different results for different object content. It doesn't matter if different objects occasionally generate the same hash code. However, if this method can only return values ranging from 1 to 10, only 10 lists can be used, regardless
Number of hashtable lists.

When designing equals () and hashcode (), another factor to remember is performance problems. Each callPut() Or get (), including calling hashcode () to find the correct list. When get () scans the list to find the key, it calls equals () for each element in the list (). Implement these methods to make them run as quickly and effectively as possible, especially when you plan to make your class public and available, because other users may want to execute at an important speed, use your
Class.

Hashtable Performance
The main factor affecting hashtable efficacy is the average length of the table list, because the average search time is directly related to the average length. Obviously, to reduce the average length, you must increase the number of lists in hashtable. If the number of lists is so large that most or all lists contain only one record, you will get the best search efficiency. However, this may be too much. If your hashtable list is much larger than the number of data entries, you do not have to spend such memory. In some cases, this is not acceptable.
In our previous example, we know in advance how many records we have 1,000. After knowing this, we can determine how many lists should be included in our hashtable to reach the best compromise between search speed and memory usage efficiency. However, in many cases, you do not know how many records you want to process in advance; the files read from the data may be extended, or the number of records may change greatly throughout the day.

As the number of entries increases, the hashtable and hashmap classes dynamically expand tables to solve this problem. Both classes have constructors that accept the initial number of table lists, and a load factor: Public hashtable (
Int initialcapacity,
Float loadfactor)

Public hashmap (
Int initialcapacity,
Float loadfactor)

Multiply the two numbers to calculate a critical value. Each time a new entry is added to a hash table, the count is updated. When the count exceeds the critical value, the table is reset (rehash ). (The number of items in the list is increased to double the previous number by 1, and all entries are transferred to the correct list .) The default constructor sets the initial capacity to 11 and the load factor to 0.75, so the critical value is 8. When the Ninth Record is added to the table, the hash table is adjusted to have 23 lists. The new critical value is 17 (an integer of 23*0.75 ). As you can see, the load factor is the upper limit of the average number of hash tables, which means that by default, there are very few hash tables that contain more than one record. Compare our original example. In that example, we have 1,000 records distributed in 10 columns.
Table. If the default value is used, the table will be expanded to include more than 1,500 lists. But you can control this. If the number of lists multiplied by the load factor is greater than the number of entries you process, the table will never be duplicated. So we can follow the example below: // table will not rehash until it
// Has 1,100 entries (10*110 ):
Hashtable myhashtable =
New hashtable (10,110.0 F );

You may not want to do this unless you do not have an empty list to save memory and do not mind additional search time. This may happen in the embedded system. However, this method may be useful because resetting takes up a lot of computing time, which ensures that this will never happen again.

Note that althoughPut() Can increase the number of tables (the number of tables is increased). Calling remove () does not have the opposite result. Therefore, if you have a large table and delete most of the entries from it, you will have a large but mostly empty table.
Hashtable and hashmap
There are three important differences between the hashtable and hashmap classes. The first difference is mainly due to the historical reasons. Hashtable is based on the old dictionary class, And hashmap isJava1.2 introducedMapAn Implementation of the interface.

Perhaps the most important difference is that the hashtable method is synchronous, while the hashmap method is not. This means that, although you can use a hashtable in a multi-threaded application without taking any special action, you must provide external synchronization for a hashmap. A convenient method is to use the static synchronizedmap () method of the collections class to create a thread-safeMapObject and return it as an encapsulated object. The method of this object allows you to access the potential hashmap synchronously. The result is that when you do not need synchronization, you cannot cut off the synchronization in hashtable (for example, in a single-threaded application), and synchronization increases a lot of processing costs.

The third difference is that only hashmap allows you to use a null value as the key or value of a table entry. Only one record in hashmap can be an empty key, but any number of entries can be empty values. That is to say, if no search key is found in the table, or if a search key is found but it is an empty value, get () returns NULL. If necessary, use the containkey () method to differentiate the two cases.

It is recommended that hashtable be used when synchronization is required, and hashmap be used. However, when necessary, hashmap can be synchronized. hashmap has more functions than hashtable, and it is not based on an old class. Some people think that, in various cases, hashmap takes precedence over hashtable.

About Properties
Sometimes, you may want to use a hashtable to map the key string to the value string. There are some examples of Environment strings in DOS, windows, and Unix. For example, the key string path is mapped to the value string c: \ windows; C: \ WINDOWS \ SYSTEM. HashTables is a simple method to indicate these,JavaProvides another method.

JavaThe. util. properties class is a subclass of hashtable and is designed for string keys and values. The usage of properties objects is similar to that of hashtable, but the class adds two time-saving methods, you should know.

The store () method saves the content of a properties object in a readable form to a file. The load () method is the opposite. It is used to read files and set properties objects to include keys and values.

Note that because properties extends hashtable, you can usePut() Method to add keys and values that are not a string object. This is not desirable. In addition, if you use store () for a properties object that does not contain a String object, store () will fail. AsPut() And get () replace, you should use setproperty () and getproperty (), they use string parameters.

Now, I hope you can know how to use HashTables to accelerate your processing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More