Java hashmap and hashtable

Source: Internet
Author: User
Tags key string rehash

Reprinted from:

Http://zztu.javaeye.com/blog/173964

This blog is reproduced from:

Http://user.qzone.qq.com/33658071/blog/1193889677


HashTables provides a useful method to optimize application performance.
HashTables is no longer a new concept in the computer field. They are used to accelerate the processing speed of computers. They are processed using today's standards, which is very slow, and they allow you to query many data entries, quickly find a special entry. Although the speed of modern machines is several thousand times faster, HashTables is still a useful method to get the best performance of applications.
Imagine that you have a data file containing about one thousand records ?? For example, there is another program for a small business's customer record, which reads the record into the memory for processing. Each record contains a unique five-digit customer ID number, customer name, address, and account balance. Assume that the records are not classified by the order of customer ID numbers. Therefore, if the program wants to use the customer number as the "key" to find a special customer record, the only search method is to continuously search for each record. Sometimes, it will quickly find the record you need; but sometimes, before the program finds the record you need, it almost found the last record. If you want to search for 1,000 records, you need to check 500.5 (1000 + 1)/2) records on average for any record. If you often need to search for data, you need a faster way to find a record.
One way to speed up the search is to divide the records into several segments, so that you do not need to search for a large list, but to search for several short lists. For our digital customer ID number, can you create 10 lists ?? IDS starting with 0 form a list, and IDs starting with 1 form a list, and so on. To search for the customer ID 38016, you only need to search for the list starting with 3. If there are 1,000 records, the average length of each list is 100 (1,000 records are divided into 10 lists ), then the average comparison times for searching a record are reduced to about 50 (see figure 1 ).
Of course, if the customer number of about one tenth starts with 0, and the other tenth starts with 1, this method will be very suitable. If the customer number of 90% starts with 0, there will be 900 records in that list, and 450 comparisons are required on average for each query. In addition, 90% of the searches to be executed by the program are for numbers starting with 0. Therefore, the average comparison is much larger than the range of simple mathematical operations.
If we can allocate records in our list in this way, it will be better, that is, each list has the same record, regardless of the distribution of numbers in the key value. We need a way to combine customer numbers and better distribute the results. For example, we can multiply the number in each digit by a large number (different from the number position), then add the result to produce a total number, divide the number by 10, use the remainder as the index value ). When reading a record, the program runs the hash function on the customer number to determine the list of records. When you need to query, use the same hash function as a "key" for the customer number, so that you can search for the correct list. A Data Structure like this is called a hash table (hashtable ).
HashTables in Java
Java contains two classes: Java. util. hashtable and Java. util. hashmap. They provide a hashtable mechanism for multiple purposes. These two classes are very similar and generally provide the same public interface. But they do have some important differences. I will talk about them later.
The hashtable and hashmap objects allow you to combine a key and a value and input the key/value pair to the table using the put () method. Then you can call the get () method and take the key as the parameter to get the value ). As long as two basic requirements are met, the key and value can be any object. Note that because key and value must be objects, the original type (primitive types) must be converted to an object by using methods such as INTEGER (INT.
To use a specific class object as a key, this class must provide two methods: equals () and hashcode (). These two methods are available in Java. lang. object, so all classes can inherit these two methods. However, the implementation of these two methods in the object class is generally useless, so you usually need to reload these two methods by yourself.
The equals () method compares its object with another object. If the two objects represent the same information, true is returned. This method also checks and ensures that the two objects belong to the same class. If the two reference objects are exactly the same, object. Equals () returns true, which explains why this method is generally not suitable. In most cases, you need a method to compare a field and a field. Therefore, different objects representing the same data are equal.
The hashcode () method executes a hash function using the object content to generate an int value. Hashtable and hashmap use this value to determine which bucket (hash element) (or list) the key/value is in.
For example, we can look at the string class because it has its own method to implement these two methods. String. Equals () compares two string objects with one character and one character. If the strings are the same, true is returned:
String myname = "Einstein ";
// The following test is
// Always true
If (myname. Equals ("Einstein "))
{...
String. hashcode () runs the hash function on a string. The code for each character in a string is multiplied by 31, and the result depends on the character position in the string. Then, add the calculation results to obtain a total number. This process seems complicated, but it ensures better distribution of values. It also proves how far you can go when developing your own hashcode () method and is sure that the result is unique.
For example, suppose I want to use a hashtable to implement a book directory and use the ISBN number of the book as the search key for search. I can use the string class to carry the details, and have prepared the equals () and hashcode () methods (see list 1 ). We can use the put () method to add key/value pairs to hashtable (see list 2 ).
The put () method accepts two parameters, both of which belong to the object type. The first parameter is the key, and the second parameter is the value. The put () method calls the hashcode () method of the key and uses the number of lists in the table to divide the result. Use the remainder as the index value to determine the list to which the record is added. Note that keys are unique in the table. If you use an existing key to call put (), the matched entries are modified, therefore, it refers to a new value, and the old value is returned (when the key does not exist in the table, put () returns a null value ).
To read a value from the table, we use the search key for the get () method. It returns an object of the correct type. For details, refer to: bookrecord BR =
(Bookrecord) isbntable. Get (
"0-345-40946-9 ");
System. Out. println (
"Author:" + Br. Author
+ "Title:" + Br. Title );
Another useful method is remove (). Its usage is almost the same as get (). It deletes entries from the table and returns them to the caller.
Your own class
If you want to use an original type as a key, you must create an object of the same type. For example, if you want to use an integer key, you should use the Constructor Integer (INT) to generate an object from the integer. All encapsulation classes ?? For example, integer, float, and Boolean all regard the original values as objects. they reload the equals () and hashcode () methods, so they can be used as keys. Many other classes provided in JDK are also like this (even hashtable and hashmap classes implement their own equals () and hashcode () methods ), but you should check the file before using any class object as hashtable keys. It is also necessary to check the class source and how equals () and hashcode () are implemented. For example, byte, character, short, and integer return the represented Integer as the hash code. This may or may not be suitable for your needs.
Use HashTables in Java
If you want to create a hashtable that uses a defined class object as the key, you should be sure that the equals () and hashcode () of this class () methods provide useful values. First, check your extended class to determine whether its implementation meets your needs. If not, you should overload the method.
The basic design constraint of any equals () method is that if the object passed to it belongs to the same class and its data field is set to indicate the same data value, then it should return true. You should also be sure that if you pass an empty parameter to this method, your code returns false: Public Boolean equals (Object O)
{
If (O = NULL)
|! (O instanceof myclass ))
{
Return false;
}
// Now compare data fields...
In addition, some rules should be remembered when designing a hashcode () method. First, the method must return the same value for a specific object, regardless of the number of times the method is called (of course, as long as the object content does not change between calls, this should be avoided when an object is used as a hashtable key ). Second, if the two objects defined by your equals () method are equal, they must also generate the same hash code. Third, it is more like a policy than a principle. You should try to design a method to generate different results for different object content. It doesn't matter if different objects occasionally generate the same hash code. However, if this method can only return values ranging from 1 to 10, only 10 lists can be used, regardless of the number of hashtable lists.
When designing equals () and hashcode (), another factor to remember is performance problems. Each call to put () or get () includes calling hashcode () to find the correct list. When get () scans the list to find the key, it calls equals () for each element in the list (). Implement these methods to make them run as quickly and effectively as possible, especially when you plan to make your class public and available, because other users may want to execute at an important speed, use your class in high-performance applications.
Hashtable Performance
The main factor affecting hashtable efficacy is the average length of the table list, because the average search time is directly related to the average length. Obviously, to reduce the average length, you must increase the number of lists in hashtable. If the number of lists is so large that most or all lists contain only one record, you will get the best search efficiency. However, this may be too much. If your hashtable list is much larger than the number of data entries, you do not have to spend such memory. In some cases, this is not acceptable.
In our previous example, we know in advance how many records we have 1,000. After knowing this, we can determine how many lists should be included in our hashtable to reach the best compromise between search speed and memory usage efficiency. However, in many cases, you do not know how many records you want to process in advance; the files read from the data may be extended, or the number of records may change greatly throughout the day.
As the number of entries increases, the hashtable and hashmap classes dynamically expand tables to solve this problem. Both classes have constructors that accept the initial number of table lists, and a load factor: Public hashtable (
Int initialcapacity,
Float loadfactor)
Public hashmap (
Int initialcapacity,
Float loadfactor)
Multiply the two numbers to calculate a critical value. Each time a new entry is added to a hash table, the count is updated. When the count exceeds the critical value, the table is reset (rehash ). (The number of items in the list is increased to double the previous number by 1, and all entries are transferred to the correct list .) The default constructor sets the initial capacity to 11 and the load factor to 0.75, so the critical value is 8. When the Ninth Record is added to the table, the hash table is adjusted to have 23 lists. The new critical value is 17 (an integer of 23*0.75 ). As you can see, the load factor is the upper limit of the average number of hash tables, which means that by default, there are very few hash tables that contain more than one record. Compare our original example. In that example, we have 1,000 records distributed in 10 lists. If the default value is used, the table will be expanded to include more than 1,500 lists. But you can control this. If the number of lists multiplied by the load factor is greater than the number of entries you process, the table will never be duplicated. So we can follow the example below: // table will not rehash until it
// Has 1,100 entries (10*110 ):
Hashtable myhashtable =
New hashtable (10,110.0 F );
You may not want to do this unless you do not have an empty list to save memory and do not mind additional search time. This may happen in the embedded system. However, this method may be useful because resetting takes up a lot of computing time, which ensures that this will never happen again.
Note: Although calling put () can increase the number of tables (the number of tables is increased), calling remove () will not have the opposite result. Therefore, if you have a large table and delete most of the entries from it, you will have a large but mostly empty table.
Hashtable and hashmap
There are three important differences between the hashtable and hashmap classes. The first difference is mainly due to the historical reasons. Hashtable is based on the obsolete dictionary class, And hashmap is an implementation of the map interface introduced by Java 1.2.
Perhaps the most important difference is that the hashtable method is synchronous, while the hashmap method is not. This means that, although you can use a hashtable in a multi-threaded application without taking any special actions, you must also provide external synchronization for a hashmap. A convenient method is to use the static synchronizedmap () method of the collections class to create a thread-safe map object and return it as an encapsulated object. The method of this object allows you to access the potential hashmap synchronously. The result is that when you do not need synchronization, you cannot cut off the synchronization in hashtable (for example, in a single-threaded application), and synchronization increases a lot of processing costs.
The third difference is that only hashmap allows you to use a null value as the key or value of a table entry. Only one record in hashmap can be an empty key, but any number of entries can be empty values. That is to say, if no search key is found in the table, or if a search key is found but it is an empty value, get () returns NULL. If necessary, use the containkey () method to differentiate the two cases.
It is recommended that hashtable be used when synchronization is required, and hashmap be used. However, when necessary, hashmap can be synchronized. hashmap has more functions than hashtable, and it is not based on an old class. Some people think that, in various cases, hashmap takes precedence over hashtable.
About Properties
Sometimes, you may want to use a hashtable to map the key string to the value string. There are some examples of Environment strings in DOS, windows, and Unix. For example, the key string path is mapped to the value string C:/windows; C:/Windows/system. HashTables is a simple method to represent these, but Java provides another method.
The Java. util. properties class is a subclass of hashtable and is designed for string keys and values. The usage of properties objects is similar to that of hashtable, but the class adds two time-saving methods, you should know.
The store () method saves the content of a properties object in a readable form to a file. The load () method is the opposite. It is used to read files and set properties objects to include keys and values.
Note: Because properties extends hashtable, you can use the put () method of the super class to add keys and values that are not a string object. This is not desirable. In addition, if you use store () for a properties object that does not contain a String object, store () will fail. As an alternative to put () and get (), you should use setproperty () and getproperty (), which use string parameters.
Now, I hope you can know how to use HashTables to accelerate your processing.
The map excuse has nothing to do with collection, so there is no add () method.
PS:
Hashmap hashes keys.
The put () method is used to add elements, and the get (key) method is used to obtain the elements.
Keyset () returns the key view in the map. The returned value is set and can be output through the iterator.
Values () returns the value view in the map. The returned value is collection and can be output through the iterator.
Entryset (). The returned value of map view is set, which can be output by the iterator.
Note that each element in this set is of the map. Entry type, and each element has getkey () and getvalue () to obtain the key and value respectively.
2. treemap
Treemap sort by key
Comparison between hashmap and treemap
Similar to set, hashmap is generally faster than treemap, and treemap is used only when sorting is required.
1 vector: Use arraylist instead of vector.
2 hashtable: replace hashtable with hashmap.
3 satck: replace stack with tranquility list, mainly because this class inherits the elementat (INDEX) method of vector and discards the stack features. We can use tranquility list to simulate stack and quene.
4. Use arraylist instead of vector.
Hashtable: replace hasht with hashmap and replace hashtable with hashmap. There are the same reasons
For more information about the cause, see the excerpt below tutorial1.5.
All the methods in the vector are synchronized, which will reduce the running speed. In a multi-threaded environment, the arraylist method synchronizedlist (list) is used)
1 hash
2. A hash table is also called a hash table. The basic idea of the hash algorithm is:
Take the keyword of a node as the independent variable, calculate the corresponding function value through a certain function relationship (hash function), and use this value as the address of the node stored in the hash list.
8. When the elements in the hash table are too full, they must be hashed again to generate a new hash table. All elements are stored in the new hash table, the original hash will be deleted. In Java, the load factor is used to determine when to re-hash the hash. For example, if the load factor is 0.75 and 75% of the positions in the hash table are full, the hash will be further hashed.
9 The higher the load factor (the closer it is to 1.0), the higher the memory usage efficiency, and the longer the element searching time. The lower the load factor (the closer it is to 0.0), the shorter the element search time, the more memory waste.
10 The default load factor of the hashset class is 0.75.
Zztu:
1. The implementation interfaces of hashmap and hashtable are different. One is the new map and the other is the original dictionary;
2. hashtable is thread synchronization, but hashmap is not. During multi-threaded coding, collections. synchronizedmaps must be used for synchronization. In general, hashmap is more efficient than hashtable.
3. hashmap can use null as the primary key and value.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.