Java Collection API
Java 7 provides a collection of at least 58 features and implementations of different types, and it is important to choose the appropriate collection type for each scenario. The performance of the program is greatly correlated with the selection of the collection type.
The first thing to consider about which collection type to choose is the algorithm and how the program uses it. This is actually the problem from the point of view of the data structure, regardless of the language used.
For example, LinkedList is not suitable for many search operations, if you need to get an element from the collection with the cost of O (1), use HashMap if the elements in the collection need to be ordered, use TreeMap instead of trying to sort the collection yourself If you want to be able to access elements through the index then consider ArrayList, if you need to insert elements in the middle of an ordered set frequently, then do not select ArrayList, and so on.
In addition to these common considerations for all languages, there are certainly other considerations when selecting the appropriate collection in Java.
Synchronous or non-synchronous
Almost all Java collections are non-synchronous. In addition to the Hashtable,vector and its associated synchronization collections.
historical
before Java 1.2, Hashtable and vectors are the only collection types. There was no concept of the Java Collection Framework (Java Collection Framework) at that time. Java was a new language, and most developers were unable to understand the threading mechanism (threading), so Java designers wanted to make the language design as simple as possible so that developers could avoid problems caused by the use of threads. As a result, these collection classes are designed to be synchronous, ensuring thread safety.
However, in early Java, the performance drawbacks of synchronization are serious, even in the absence of competitive synchronization (uncontended synchronization). So in the next version of Java, a very different design idea was used: all collection types are non-synchronous by default.
What about the performance of calling synchronous methods in a non-competitive environment? The following table is the performance of the synchronous method and the non-synchronous method in a non-competitive environment where 500 million CAS-based methods are called separately:
Mode |
Total Time |
Single operation time |
Baseline Percentage |
CAS operations |
6.6s |
13ns |
169.2% |
Synchronization method |
11.8s |
031 |
302.6% |
Non-synchronous methods |
3.9s |
7.8s |
100% |
In terms of word operation time, the difference is not too great. When an application needs to run for quite a long time and the corresponding methods are executed very frequently, the performance differences can be seen. Whether using relatively advanced CAS operations or traditional synchronization methods, performance in non-competitive environments is much less than the cost-synchronization approach. Therefore, it is necessary to carefully review and consider the method that is declared as synchronous in the program and whether the synchronous code block is really needed.
So in a non-competitive environment, the performance of ArrayList is about twice times better than that of a vector (100% vs 302.6%). At the same time, HashMap's performance is about 0.7 times times better than Concurrenthashmap (100% vs 169.2%).
Collection capacity (Collection size)
For collection types in Java, there are several ways to represent the elements in a collection:
- Use arrays to save elements of a collection, such as Arraylist,hashmap
- Use a custom type to save elements, such as the use of the node type in LinkedList to represent a single element
- A combination of arrays and custom types, such as HashMap, which uses arrays as the means of preserving elements, but the type of the array elements is hashmap$entry
How do you know if a collection type is using an array as its element's preserve? You can view the constructor for the type, and if it takes an integer variable of the initial space as a parameter, it is used internally by the array.
For collections that use arrays, you need to give an exact amount of capacity when initializing them. This can result in better performance. For example, the default capacity of an array of type ArrayList is 10, so when the 11th element needs to be stored, ArrayList will do one of several things:
- Calculating the new spatial value of an expanded array
- Create the array
- Copies all elements of the current array to the new array
The second and third steps above will have a greater impact on performance.
Other types such as HashMap the algorithm used is more complex when expanding its internal array, but in essence it follows the three steps described above.
The capacity of the array to be expanded is calculated by half of the current capacity. For example, for a ArrayList object, the initial empty capacity is 10, then you need to expand, the next time you will be allocated an array of 15 elements, the next is 22, then 33, and so on. In this way, the space utilization of the array is approximately 83.3% on average. So when the size of the array itself is already very large, each expansion will bring a lot of memory waste, thus increasing the pressure of the GC. This does not count as the performance loss of the copy array operation.
Capacity expansion in non-collection types
In addition to collection types, there are many types that use arrays internally to store and represent actual data. Typical examples are: Bytearrayoutputstream,stringbuilder and StringBuffer. For these types, it is also possible to find that their constructors can also accept a size as a parameter to specify the initial capacity. As a result, it is possible to estimate a reliable initial capacity for better performance when using them.
Collection and memory efficiency (collections and efficiency)
There is another, more extreme case when using array-based collections, where there are very few collection elements. In this case, the space utilization of the array is lower, resulting in unnecessary memory space waste and GC pressure. For this scenario, there are two solutions:
- Specify capacity when creating a collection (through constructors)
- Consider using a separate object for the representation
When the developer asks which sorting method is the quickest to sort an array, many people answer "quick sort." But a good developer will first ask how big the array is. If the capacity of an array is small enough, then using an insert sort is the quickest way to sort. In fact, in a quick sort, when the size of a sub-array is smaller than a certain threshold, the insertion sort is used instead. The Arrays.sort () method in the JDK assumes that the insertion order has better performance when the size of the array is less than 47 o'clock.
Deferred initialization of collections in JDK 7u40
Because in many applications, the collection space is not fully utilized. An optimization for ArrayList and HashMap implementations was introduced in JDK 7u40: If no capacity information is specified when creating instances of them, the internal array is no longer created. An internal array is created only when the collection is first manipulated. This is a typical application for deferred initialization. After a lot of testing, it is proven that in most applications, using this optimization can lead to better performance.
Summarize
- Use the most appropriate collection type as needed, and note whether the scenario really needs to synchronize the collection.
- For array-based collections, specifying capacity when creating them is critical, which leads to better performance.
[Java Performance] Java Collection API