Learn the basics of one of the most common collection types and how to optimize your map for data specific to your application.
Related downloads: · Jack's HashMap test. · Oracle JDeveloperG |
The collection class in Java.util contains some of the most commonly used classes in Java. The most common collection classes are List and Map. The specific implementation of the list includes ArrayList and vectors, which are variable-sized lists that are more suitable for building, storing, and manipulating any type of object element list. List is useful for cases where elements are accessed by numeric indexes.
MAP provides a more general method of storing elements. The Map collection class is used to store element pairs (called "Keys" and "values"), where each key is mapped to a value. Conceptually, you can treat a List as a Map with numeric keys. In fact, in addition to the List and Map are in the definition of java.util and foreign, there is no direct link between the two. This article will focus on the map included with the core Java distribution suite, and will also show you how to adopt or implement a dedicated map that is more appropriate for your application-specific data.
Understanding MAP Interfaces and methods
There are many predefined Map classes in the Java core class. Before we introduce the implementation, let's start by introducing the Map interface itself to understand the common denominator of all implementations. The map interface defines four types of methods, each of which is contained by each map. Below, we'll start by introducing these methods from two common methods (table 1).
Table 1: Overridden methods. We cover these two methods of this object to correctly compare the equivalence of the Map object.
Equals (Object o) |
Compare the equivalence of a specified object to this Map |
Hashcode () |
Returns the hash code for this MAP |
MAP Build
Map defines several transformation methods for inserting and deleting elements (table 2).
Table 2:map Update method: You can change the Map content.
Clear () |
Remove all mappings from the map |
Remove (Object key) |
Remove keys and associated values from a Map |
Put (object key, Object value) |
Associate a specified value with a specified key |
Clear () |
Remove all mappings from the map |
Putall (Map t) |
Copies all mappings in the specified map to this map |
Although you may notice that even assuming that the cost of building a Map that needs to be passed to Putall () is ignored, using Putall () is not usually more efficient than using a large number of put () calls, but the existence of Putall () is not uncommon. This is because Putall () needs to iterate over the elements of the map passed, in addition to the algorithm that the iteration put () does to add each key-value pair to the map. It should be noted, however, that Putall () can correctly resize the map before adding all the elements, so if you do not adjust the size of the map yourself (which we will briefly describe), Putall () may be more efficient than expected.
View Map
The elements in the iteration Map do not have a straightforward method. If you are querying a map to see which elements satisfy a particular query, or if you want to iterate over all of its elements, regardless of the cause, you first need to get the view of the map. There are three possible views (see table 3)
- All key-value pairs-see EntrySet ()
- All keys-see KeySet ()
- Values available-see values ()
The first two views return a Set object, and the third view returns the Collection object. In both cases, the problem does not end here because you cannot iterate directly over the Collection object or the Set object. To iterate, you must obtain a Iterator object. Therefore, to iterate over the elements of the MAP, it is necessary to do a cumbersome coding
Iterator keyvaluepairs = Amap.entryset (). Iterator (); Iterator keys = Amap.keyset (). Iterator (); Iterator values = Amap.values (). iterator ();
It is worth noting that these objects (Set, Collection, and Iterator) are actually views of the underlying Map, rather than a copy of all the elements. This makes them highly efficient to use. On the other hand, the ToArray () method of the Collection or Set object creates an array object that contains all the elements of the MAP, so it is not efficient except in cases where the elements in the array are really needed.
I ran a small test (included in the accompanying file) that used HashMap and compared the cost of the iteration Map element using the following two methods:
int mapsize = Amap.size ();
Iterator keyValuePairs1 = Amap.entryset (). Iterator (); for (int i = 0; i < mapsize; i++) {Map.entry Entry = (map.entry) keyvaluepairs1.next (); Object key = Entry.getkey (); Object value = Entry.getvalue (); ...}
object[] keyValuePairs2 = Amap.entryset (). ToArray (); for (int i = 0; i < REM; i++) {{Map.entry Entry = (map.entry) keyvaluepairs2[i]; Object key = Entry.getkey ();
profilers in Oracle JDeveloper Oracle JDeveloper includes an embedded monitor that measures memory and execution time, enabling you to quickly identify bottlenecks in your code. I used Jdeveloper's execution monitor to monitor the HASHMAP's ContainsKey () and Containsvalue () methods, and soon discovered that the ContainsKey () method was much slower than the Containsvalue () method (actually Take a few orders of magnitude slower! )。 (see Figure 1 and Figure 2, along with the classes in the accompanying file). |
Object value = Entry.getvalue (); ...}
This test uses two methods of measurement: one is to measure the time of an iterative element, and the other is to measure the additional overhead of creating an array using the ToArray call. The first method (ignoring the time it takes to create an array) indicates that an array iteration element that has been created from the ToArray call is about 30%-60% faster than the Iterator. But if the cost of creating an array using the ToArray method is included, then using Iterator is actually 10%-20% faster. Therefore, if for some reason you want to create an array of collection elements instead of iterating over those elements, you should use that array to iterate over the elements. But if you do not need this intermediate array, do not create it, but instead use the Iterator iteration element.
Table 3: Return to the Map method of the view: Using the objects returned by these methods, you can traverse the elements of the map, and you can also delete the elements in the map.
EntrySet () |
Returns the Set view of the map contained in the map. Each element in the Set is a Map.entry object that can be accessed using the GetKey () and GetValue () methods (and a SetValue () method) to access the key and value elements of the latter |
KeySet () |
Returns the Set view of the key contained in the MAP. Deleting the elements in the set also deletes the corresponding mappings (keys and values) in the map |
VALUES () |
Returns a Collection view of the values contained in the map. Deleting the elements in Collection also deletes the corresponding mappings (keys and values) in the map |
accessing elements
The Map access method is listed in table 4. A Map is generally suitable for keystrokes (rather than by value) for access. The MAP definition does not stipulate that this is certainly true, but usually you can expect this to be true. For example, you can expect the ContainsKey () method to be as fast as the Get () method. On the other hand, the Containsvalue () method is likely to need to scan the values in the Map, so it may be slower.
Table 4:map Access and test methods: These methods retrieve information about the map content but do not change the map content.
Get (Object key) |
Returns the value associated with the specified key |
ContainsKey (Object key) |
Returns TRUE if the map contains a mapping of the specified key |
Containsvalue (Object value) |
Returns TRUE if this map maps one or more keys to a specified value |
IsEmpty () |
Returns true if map does not contain a key-value mapping |
Size () |
Returns the number of key-value mappings in a map |
Testing for the time required to traverse all elements in the HASHMAP using ContainsKey () and Containsvalue () indicates that Containsvalue () takes much longer. It actually takes a few orders of magnitude! (see Figure 1 and Figure 2, as well as the accompanying file.) Therefore, if Containsvalue () is a performance issue in the application, it will soon appear and can be easily identified by monitoring your application. In this case, I believe you can come up with an effective replacement method to implement the equivalent functionality provided by Containsvalue (). But if you can't figure out a way, a workable solution is to create a map and use all the values of the first map as keys. Thus, the Containsvalue () on the first map will be the more efficient containskey () on the second map.
|
Figure 1: Create and run the Map test class using JDeveloper |
|
Figure 2: Performance monitoring using the execution monitor in JDeveloper to isolate bottlenecks in the application |
Core Map
Java comes with a variety of Map classes. These MAP classes can be categorized into three types:
- Generic map, used to manage mappings in an application, typically implemented in the Java.util package
- HashMap
- Hashtable
- Properties
- Linkedhashmap
- Identityhashmap
- TreeMap
- Weakhashmap
- Concurrenthashmap
- Private map, you do not usually have to create such a map yourself, but rather access it through some other class
- Java.util.jar.Attributes
- Javax.print.attribute.standard.PrinterStateReasons
- Java.security.Provider
- Java.awt.RenderingHints
- Javax.swing.UIDefaults
- An abstract class to help implement your own Map class
Internal hash: Hash mapping technique
Hash maps are used by almost all common maps. This is a very simple mechanism to map an element to an array, and you should understand how the hash map works in order to take advantage of the map.
The hash map structure consists of an internal array of storage elements. Because of the internal use of array storage, there must be an index mechanism for determining any key access to an array. In fact, the mechanism needs to provide an integer index value that is less than the size of the array. This mechanism is called a hash function. In a Java hash-based MAP, a hash function converts an object to an integer that fits within an internal array. You don't have to worry about finding an easy-to-use hash function: Each object contains a Hashcode () method that returns an integer value. To map the value to an array, simply convert it to a positive value, and then take the remainder after dividing it by the size of the array. The following is a simple Java hash function that applies to any object
int hashvalue = Maths.abs (Key.hashcode ())% Table.length;
(The% binary operator (called modulo) divides the value on the left by the value on the right and then returns the remainder as an integer.) )
In fact, before release 1.4, this is the hash function used by various hash-based MAP classes. But if you look at the code, you'll see
int hashvalue = (Key.hashcode () & 0x7FFFFFFF)% Table.length;
It is actually the same function that uses a faster mechanism to get positive values. In version 1.4, the HashMap class implementation uses a different and more complex hash function based on Doug Lea's Util.concurrent package (I'll revisit Doug Lea's class later in more detail).
|
Figure 3: How hashing Works |
This diagram describes the fundamentals of hash mapping, but we have not covered it in detail. Our hash function maps any object to an array location, but what happens if two different keys are mapped to the same location? This is a situation that must happen. In the terminology of hash mappings, this is called a conflict. MAP handles these conflicts by inserting a list of links at the index location and simply adding elements to this list of links. Therefore, the basic put () method for a hash-based MAP might look like this
public object put (object key, Object value) { //Our internal array is an array of Entry objects //entry[] table; Gets the hash code and maps to an index int hash = Key.hashcode (); int index = (hash & 0x7FFFFFFF)% Table.length; Loop through the list of links located at Table[index] to find out whether we have this key item-if owned, overwrite it for (Entry e = Table[index]; E! = null; e = e.next) {
//must check that the keys are equal because different key objects //may have the same hash if ((E.hash = = hash) && e.key.equals (key)) { //This is the same key that overrides the value //And returns an old value from this method Object old = E.value; E.value = value; return old; } } is still here, so it is a new key, just add a new Entry //entry object that contains the key object, the value object, an integer hash, //And a pointer to the next Entry in the list entry
//Create a new Entry that points to the beginning of the previous list, //and insert this new Entry into the table Entry e = new Entry (hash, key, value, Table[index]); Table[index] = e; return null;}
If you look at the source code of various hash-based maps, you'll see that this is basically how they work. In addition, there are a number of things to consider further, such as handling null keys and values, and adjusting internal arrays. The put () method defined here also contains an algorithm for the corresponding get () because the insertion includes the item at the search map index to find out if the key already exists. (The Get () method has the same algorithm as the put () method, but get () does not contain the insert and overwrite code. Using a linked list is not the only way to resolve conflicts, and some hash mappings use a different "open addressing" scheme, which is not described in this article.
Optimize Hasmap
If the internal array of the hash map contains only one element, all items are mapped to this array location, which makes up a longer list of links. This is inefficient because our updates and accesses use a linear search of the linked list, which is much slower than the case where each array index in the MAP contains only one object. The time to access or update a linked list is linearly related to the size of the list, whereas using a hash function to ask or update a single element in an array is independent of the size of the array-the former is O (n) in terms of the asymptotic nature (Big-o notation), and the latter is O (1). Therefore, it makes sense to use a larger array instead of having too many items clustered in too few array locations.
Resizing the MAP implementation
In a hash term, each location in an internal array is called a bucket, and the number of buckets available (that is, the size of the internal array) is called capacity (capacity). For the map object to efficiently handle any number of items, the map implementation can resize itself. But resizing is a big expense. Resizing requires that all elements be reinserted into the new array, because different array sizes mean that objects now map to different index values. Previously conflicting keys may no longer conflict, and other keys that previously did not conflict may now conflict. This clearly shows that if you resize the Map large enough, you can reduce or even no longer need to resize, which is likely to significantly increase the speed.
Use the 1.4.2 JVM to run a simple test that populates the HASHMAP with a large number of items (more than 1 million). Table 5 shows the results and normalizes all the time into a pre-sized server mode (in the associated file. For a pre-sized JVM, the client and server mode JVM runs almost the same time (after the JIT-compilation phase is discarded). However, using the default size of MAP will cause multiple resizing operations, with a high overhead, 50% more time in server mode, and almost twice as many times in client mode!
Table 5: Comparison of the time required to populate a pre-sized HashMap with a HashMap that fills the default size
|
Client mode |
Server mode |
Pre-set Size |
100% |
100% |
Default size |
294% |
157% |
Using Load factors
To determine when to resize, instead of counting the depth of the list of links in each bucket, a hash-based MAP uses an extra parameter and roughly calculates the density of the bucket. Before resizing, map uses a parameter called load factor to indicate how much load the map will bear, that is, its load level. The relationship between load factors, number of items (Map size) and capacity is straightforward:
- Adjust the map size if (load factor) x (capacity) > (map size)
For example, if the default load factor is 0.75 and the default capacity is 11, then each x 0.75 = 8.25, and the value is rounded down to 8 elements. Therefore, if you add a 8th item to this map, the map resizes itself to a larger value. Instead, calculate the initial capacity to avoid resizing, divide the number of items to be added by the load factor, and rounding up, for example,
- For 100 items with a load factor of 0.75, the capacity should be set to 100/0.75 = 133.33 and the result rounded up to 134 (or rounded to 135 to use odd numbers)
An odd number of buckets enables map to improve execution efficiency by reducing the number of collisions. Although I did the tests (the correlation file does not indicate that prime numbers can always get better efficiency, the ideal scenario is to take prime numbers for the capacity.) Some maps after version 1.4, such as HashMap and Linkedhashmap, rather than Hashtable or identityhashmap, use a hash function that requires a power of 2, but the next highest power capacity of 2 is computed by these maps, so you do not have to calculate it yourself.
The load factor itself is an adjustment tradeoff between space and time. Smaller load factors will take up more space, but will reduce the likelihood of collisions, which will speed up access and updates. It may be unwise to use a load factor greater than 0.75, while using a load factor greater than 1.0 is definitely not known because this is bound to cause a conflict. The benefits of using a load factor less than 0.50 are not significant, but as long as you adjust the MAP size effectively, you should not have a performance overhead on the small load factor, but only the memory overhead. However, a smaller load factor will mean that if you do not pre-adjust the size of the MAP, it results in more frequent resizing, which reduces performance, so be sure to pay attention to this when adjusting the load factor.
Select the appropriate MAP
What kind of Map should I use? Does it need to be synchronized? To get the best performance for your application, this is probably the two most important issue you face. When using a universal map, the map sizing and selection load factors are covered by the map adjustment option.
Here's a simple way to get the best Map performance
- Declare all your map variables as maps, not any specific implementation, i.e. not declared as HashMap or Hashtable, or any other map class implementation.
Map Criticalmap = new HashMap (); Good HashMap Criticalmap = new HashMap (); Poor
This allows you to easily replace any particular Map instance by changing just one line of code.
- Download Doug Lea's util.concurrent package (http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html). Use Concurrenthashmap as the default Map. When porting to version 1.5, use JAVA.UTIL.CONCURRENT.CONCURRENTHASHMAP as your default Map. Do not wrap the concurrenthashmap in a synchronized wrapper, even if it will be used for multiple threads. Use the default size and load factor.
- Monitor your application. If a map is found to cause bottlenecks, analyze the cause of the bottleneck and change some or all of the map's following: Map class, map size, load factor, Key object Equals () method implementation. The dedicated map basically requires a custom map implementation of a special purpose, otherwise the universal map will achieve your desired performance goals.
MAP selection
Perhaps you had expected more complex considerations, and was it actually too easy? OK, let's take it slow. First, what kind of Map should you use? The answer is simple: Do not select any particular map for your design, unless the actual design needs to specify a particular type of map. A specific MAP implementation is typically not required at design time. You probably know you need a Map, but you don't know which one to use. And that's exactly what it means to use the Map interface. Choose the map implementation until you need it-if you use a variable declared by "map" anywhere, changing the map implementation of any special map in your application requires only one line change, which is a very inexpensive adjustment option. Do you want to use the default MAP implementation? I will soon talk about this problem.
Sync Map
What's the difference between syncing or not? (for synchronization, you can either use a synchronized map, or you can use Collections.synchronizedmap () to convert the unsynchronized map to a synchronized map.) The latter uses "synchronized wrappers") This is an unusually complex choice, depending on how you access and update the MAP based on multi-threaded concurrency, along with maintenance considerations. For example, what if you started without updating a specific Map concurrently, but later changed to a concurrent update? In this case, it is easy to start with an unsynchronized map and forget to change the unsynchronized map to a synchronized map when you later add the concurrent update thread to the application. This will make your application prone to crashes (one of the worst errors to identify and track). However, if synchronization is the default, multithreaded applications will be serialized as a result of the dreaded performance that comes with it. It seems that we need some kind of decision tree to help us make the right choice.
Doug Lea is a professor at the Department of Computer Science at the University of New York at Auschwitz. He created a set of public domain packages (collectively known as Util.concurrent), which contains many utility classes that can simplify high-performance parallel programming. These classes contain two maps, Concurrentreaderhashmap and Concurrenthashmap. These map implementations are thread-safe and do not require concurrent access or updates to be synchronized, but also for most scenarios where map is required. They are far more scalable than synchronized maps (such as Hashtable) or using synchronized wrappers, and they are less disruptive to performance than HASHMAP. The Util.concurrent package forms the basis of the JSR166; JSR166 has developed a concurrency utility included in Java version 1.5, and Java version 1.5 will include these maps in a new java.util.concurrent path In a sequential package.
All of this means that you don't need a decision tree to decide whether to use a synchronized map or a non-synchronized map, just use Concurrenthashmap. Of course, in some cases, it is not appropriate to use CONCURRENTHASHMAP. However, these situations are rarely seen and should be dealt with concretely and concretely. This is the purpose of monitoring.
Conclusion
Oracle JDeveloper makes it very easy to create a test class that compares various MAP performance. More importantly, well-integrated monitors can quickly and easily identify performance bottlenecks during development-monitors that are integrated into the IDE are often used more frequently to help build a successful project. Now that you have a monitor and understand the basics of common Map and its performance, you can start running your own tests to find out if your application has a bottleneck with the map and where to change the map you are using.
Introduction to Java Map