"Go" Java Learning---Java core data structure (List,map,set) usage tips and optimizations

Last Update:2018-09-01 Source: Internet

Author: User

Tags comparable set set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Original" https://www.toutiao.com/i6594587397101453827/

Java Core Data structure (List,map,set) usage tips and optimizations

JDK provides a set of major data structure implementations, such as list, MAP, set and other common data structures. This data is inherited from the Java.util.Collection interface and is located in the Java.util package.

1. List interface

The three most important list interface implementations: ArrayList, Vector, LinkedList. Their class diagram is as follows:

As you can see, 3 lists come from the implementation of Abstratlist. The abstratlist directly implements the list interface and extends from Abstratcollection.

ArrayList and vectors use an array implementation, and it can be assumed that ArrayList encapsulates the operation of an internal array. For example, adding, deleting, inserting new elements or arrays of extensions and redefinition into an array. Manipulation of ArrayList or vectors is equivalent to manipulating an array of internal objects.

ArrayList and vectors use almost the same algorithm, and their only difference can be thought of as support for multithreading. ArrayList does not thread-synchronize a method, so it is not thread-safe. Most of the methods in the Vector are thread-synchronized, which is a thread-safe implementation. So the performance characteristics of ArrayList and vectors are comparable.

LinkedList uses a circular doubly linked list data structure. The LinkedList is connected by a series of table items. A table entry always contains 3 parts: The element content, the precursor table item, and the Back drive table entry. ：

LinkedList of the table entry source:

Whether or not the LinkedList is empty, the list has a Header table entry that is both the beginning of the linked list and the end of the list. Its back-drive table entry is the first element of the list, and the predecessor is the last element of the list. ：

Below compare the difference between ArrayList and LinkedList.

1. Add elements to the end of the list

For ArrayList, the efficiency of the Add () operation is very high as long as the current capacity is large enough.

Expansion is only required if the ArrayList demand for capacity exceeds the size of the current array. Scaling will make a large number of array copy operations. The System.arraycopy () method is ultimately called when copying, so the add () efficiency is quite high.

LinkedList because the structure of the linked list is used, it does not need to maintain the size of the capacity. This has an advantage over ArrayList, however, because each element increment requires a new node object and more assignment operations. In frequent system calls, performance can have a certain impact.

2. Inserting elements anywhere in the list

ArrayList is an array-based implementation, and the array is a contiguous memory space, and each time an insert operation, an array assignment is made. A large number of array copies can cause poor system performance.

The LinkedList is based on a linked list, and is inserted at any position and incremented at the end. Therefore, if the system application requires frequent insertion of a list object anywhere, consider replacing ArrayList with LinkedList.

3. Delete any position element

For ArrayList, an array reorganization is required each time the remove () element is removed. And the more overhead the element position is, the lower the cost of the element to delete.

In the implementation of LinkedList, the first step is to find the element to be deleted by looping. If the position of the element to be deleted is in the first half of the list, it is searched from the back, and then from the rear if it is in the second half. If you want to remove an element from an intermediate position, you need to traverse the half list, which is inefficient.

4. Capacity Parameters

Capacity parameters are unique performance parameters of array-based lists such as ArrayList and vectors, which represent the size of the initial array.

The reasonable setting capacity parameter can reduce the array expansion and improve the system performance.

The default ArrayList array initial size is 10.

private static final int default_capacity = 10;

5. Traversing the list

Three commonly used list traversal methods: foreach operations, iterators, and for loops.

In the case of a foreach operation, the de-compilation actually handles the Foreach loop body as an iterator. However, foreach has a one-step assignment over a custom iterator, and performance is not as straightforward as using an iterator.

By using the For loop to iterate through the list randomly, ArrayList behaves very well and is the fastest, but the LinkedList performance is very poor and should be avoided because the random access to the LinkedList is always done with a list traversal.

2. Map Interface

Map is a very common data structure. Around the map interface, the main implementation classes are Hashtable, HashMap, Linkedhashmap and TreeMap, in Hashtable, as well as the implementation of the Properties class.

The difference between Hashtable and HashMap is that most of Hashtable's methods are thread-synchronized and HashMap not, so Hashtable is thread-safe and hashmap is not. Second, Hashtable does not allow the use of NULL values for key or value, while HashMap can. Thirdly, they are internally different from the hash algorithm of key and the mapping algorithm of hash value to memory index.

Because HashMap is widely used, this paper takes HashMap as an example and expounds its realization principle.

1. HashMap Principle of implementation

Simply put, HashMap is the key to the hash algorithm, and then map the hash value to the memory address, directly get the data corresponding to the key. In HashMap, the underlying data structure uses an array. The so-called memory address is the subscript index of the array.

The code is simply represented as follows:

Object[key_hash] = value;

2. Hash conflict

When the two elements that need to be stored 1 and 2 are hashed, the corresponding address in memory is found. What will HashMap do to ensure the complete storage of the data at this time?

The array is used at the bottom of the hashmap, but the elements within the array are not simple values, but an object of an entity class. Each of the entity table entries includes several key,value,next,hash. Notice the next section here, which points to another entity. When a put () operation is in conflict, the new entity replaces the original value, and in order to ensure that the old value is not lost, next is pointed to the old value. This enables multiple value entries to be stored within an array space. Therefore, HashMap is actually an array of linked lists. When a get () operation is performed, if the array element that is anchored to does not contain a linked list (the current entry's next points to null), it is returned directly; if the array element that is anchored to contains a linked list, you need to traverse the linked list and find it by the Equals method of the Key object.

3. Capacity Parameters

Like ArrayList, an array-based structure inevitably needs to be extended when the array space is insufficient. And the reorganization of the array is time-consuming, so it is necessary to make some optimization.

HashMap provides two constructors that can specify the size of an initialization:

HashMap (int initialcapacity)

Constructs an empty HashMap with the specified initial capacity and default load factor (0.75).

HashMap (int initialcapacity, float loadfactor)

Constructs an empty HashMap with the specified initial capacity and load factor.

Where HashMap uses the smallest integer greater than or equal to initialcapacity and is the exponential power of 2 as the size of the built-in array.

The load factor is also called the fill ratio, which is a floating-point number between 0 and 1.

Load factor = number of actual elements/total size of internal array

The role of the load factor is to determine the threshold value of the HashMap (threshold).

Threshold = Array Total capacity x load factor

When the actual capacity of the hashmap exceeds the threshold, it will be expanded, with each expansion setting the new array size to 1.5 times times the original size.

By default, the initial size of HashMap is 16 and the load factor is 0.75.

static final int default_initial_capacity = 1 << 4; aka 16

Static final float default_load_factor = 0.75f;

4. Linkedhashmap

Linkedhashmap inherits from the HashMap, therefore, it has the good characteristics of hashmap, and on this basis, Linkedhashmap added a list inside, to hold the order of elements. as a result, linkedhashmap can be simply understood as a hashmap that maintains the order table of elements.

LINKEDHASHMAP provides two types of order: the first is the order in which the elements are inserted, and the second is the order of the most recent visits.

Linkedhashmap (int initialcapacity, float loadfactor, Boolean accessorder)

Constructs an empty Linkedhashmap instance with the specified initial capacity, load factor, and sort mode

When Accessorder is true, it is sorted by the last access time of the element, and when Accessorder is false, in the order of insertion. The default is False.

In internal implementations, Linkedhashmap implements Linkedhashmap.entity by inheriting the Hashmap.entity class, adding before and after properties for hashmap.entity to record the precursors and successors of a table item. and form a circular linked list.

5. TreeMap

TreeMap can be simply understood as a map implementation that can be sorted . Unlike Linkedhashmap, Linkedhashmap are sorted according to the order in which the elements are added or accessed, while TreeMap is sorted according to the key of the element. To determine the sorting algorithm for key, you can specify it in two ways:

(1) Inject a comparator into the TreeMap constructor:

TreeMap (comparator<? Super K> Comparator)

(2) Use a Key that implements the comparable interface.

The internal implementation of the TREEMAP is based on the red-black tree. The red and black trees are a kind of balanced search tree, which does not introduce too much here.

TreeMap other sort interfaces are as follows:

SubMap (k Fromkey, K Tokey)

Returns a partial view of this map whose key values range from Fromkey (including) to Tokey (not included).

Tailmap (K Fromkey)

Returns a partial view of this map whose key is greater than or equal to Fromkey.

Firstkey ()

Returns the current first (lowest) key in this map.

Headmap (K Tokey)

Returns a partial view of this map whose key value is strictly less than tokey.

A simple example is as follows:

3. Set interface

The set does not add extra action on top of the collection interface, and theelements in the set set cannot be duplicated .

The most important one is the realization of HashSet, Linkedhashset and TreeSet. This is no longer one by one, because all of these set implementations are just one package for the corresponding map.

4. Optimize the collection access code

1. Detach the code that is called repeatedly in the loop

For example, when we are going to use a for loop to iterate through the collection

for (int i =0;i<collection.size (); i++) {

//.....

}

Obviously, the size () method is called each time the loop is used, and the same value is returned each time. Separating all similar code has a positive meaning for improving the loop performance. As a result, you can transform the previous code into

int size= collection.size ();

for (int i =0;i<size;i++) {

//.....

}

The more the number of elements, the more meaningful the process will be.

2. Omit the same action

Let's say we have a similar operation as follows

int size= collection.size ();

for (int i =0;i<size;i++) {

if (List.get (i) ==1| | List.get (i) ==2| | List.get (i) ==3) {

//...

}

}

Although the return value of Get (i) is different each time the loop is called, the results are the same in the same call, so the same operations can be extracted.

int size= collection.size ();

int k=0;

for (int i =0;i<size;i++) {

if (k = List.get (i)) ==1| | k==2| | k==3) {

//...

}

}

3. Reduce method calls

The method call needs to consume the system stack, and if possible, access the internal elements as much as possible, rather than calling the corresponding interface, which requires consuming system resources and directly accessing the elements is more efficient.

Assuming that the above code is part of the code for a subclass of Vector.class, then you can rewrite

int size = This.elementcount;

Object K=null;

for (int i =0;i<size;i++) {

if ((k = elementdata[i]) = = "1" | | k== "2" | | k== "3") {

//...

}

}

As you can see, the original size () and get () methods are directly substituted for accessing the raw variables, which is very useful for improving system performance.

5. Randomaccess Interface

The Randomaccess interface is a flag interface and does not provide any methods in itself, and any object that implements the Randomaccess interface can be considered an object that supports fast random access. The primary purpose of this interface is to identify those list implementations that can support fast random access .

In the JDK, any array-based list implementation implements the Randomaccess interface, while the linked list-based implementation does not. It's good to understand that only the array can be accessed quickly and randomly (for example, by Object[5],object[6), and the random access to the list needs to traverse the linked list.

In practice, it is possible to determine whether an object implements a Randomaccess interface based on the list instanceof randomaccess, so that it can be accessed using either random access or iterator iterators.

In an application, if you need to do random access to the List by index subscript, try not to use linkedlist,arraylist and vectors as a good choice.

Reference

Java Program Performance optimization (Ge Yi)

"Go" Java Learning---Java core data structure (List,map,set) usage tips and optimizations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More