Java Core Data Structure summary, java Core Data Structure

Source: Internet
Author: User

Java Core Data Structure summary, java Core Data Structure

JDK provides a Set of major data structures, such as List, Set, and Map. These structures are inherited from the java. util. collection interface.

  • List Interface

List has three different implementations. ArrayList and Vector use arrays to encapsulate operations on internal arrays. The linked list uses the data structure of the circular two-way linked list. The linked list is composed of a series of linked list items. A linked list item consists of three parts: the linked list content, the preferrals, and the post-driving table items.

The table item structure of the shortlist

Connections between items in the shortlist table

  

It can be seen that no matter whether the linked list is empty or not, the linked list has a header table item, which indicates the beginning of the linked list and the end of the linked list. The post-drive table item of the table item header is the first element of the linked list, and its pre-Table item is the last element of the linked list.

Compare different implementations of two lists based on linked List and array:

1. add elements to the end of the list:

The source code in ArrayList is as follows:

1 public boolean add(E e) {2         ensureCapacityInternal(size + 1);  // Increments modCount!!3         elementData[size++] = e;4         return true;5     }

The performance of the add () method depends on the performance of the grow () method:

 1 private void grow(int minCapacity) { 2         // overflow-conscious code 3         int oldCapacity = elementData.length; 4         int newCapacity = oldCapacity + (oldCapacity >> 1); 5         if (newCapacity - minCapacity < 0) 6             newCapacity = minCapacity; 7         if (newCapacity - MAX_ARRAY_SIZE > 0) 8             newCapacity = hugeCapacity(minCapacity); 9         // minCapacity is usually close to size, so this is a win:10         elementData = Arrays.copyOf(elementData, newCapacity);11     }

As you can see, when ArrayList requires more capacity than the current array size, it will resize the array. During the expansion process, a large number of array copies are required. array replication calls System. the arraycopy () method is very efficient.

Add () method in the worker list source code:

1 public boolean add(E e) {2         linkLast(e);3         return true;4     }

The linkLast () method is as follows:

 1 void linkLast(E e) { 2         final Node<E> l = last; 3         final Node<E> newNode = new Node<>(l, e, null); 4         last = newNode; 5         if (l == null) 6             first = newNode; 7         else 8             l.next = newNode; 9         size++;10         modCount++;11     }

Because the capacity list is implemented based on the linked list, you do not need to maintain the capacity. However, each time you add new elements, you must create a new Node object and assign a series of values. During frequent system calls, it has a certain impact on the system. Performance tests show that adding an element at the end of the list is better than adding a list. Because the array is continuous and adding an element at the end, array resizing is performed only when space is insufficient, in most cases, the append operation efficiency is relatively high.

2. Add an element to any position in the list:

The List interface also provides the void add (int index, E element) method to insert elements at any location, there is a certain difference between ArrayList and rule list in this method. Because ArrayList is implemented based on arrays, while arrays are a piece of continuous memory, if an element is inserted at any position of the array, all the elements after the position will be re-ordered, its efficiency is relatively low.

ArrayList source code implementation:

1 public void add(int index, E element) {2         rangeCheckForAdd(index);3         ensureCapacityInternal(size + 1);  // Increments modCount!!4         System.arraycopy(elementData, index, elementData, index + 1,5                          size - index);6         elementData[index] = element;7         size++;8     }

We can see that array replication is performed each time we insert data. A large number of array replication operations lead to low system performance efficiency. In addition, the higher the insert position of the array, the higher the overhead of array replication. Therefore, inserting elements near the end of the method as much as possible can improve the performance of this method.

Source code implementation of the shortlist:

 1 public void add(int index, E element) { 2         checkPositionIndex(index); 3  4         if (index == size) 5             linkLast(element); 6         else 7             linkBefore(element, node(index)); 8     } 9 void linkBefore(E e, Node<E> succ) {10         // assert succ != null;11         final Node<E> pred = succ.prev;12         final Node<E> newNode = new Node<>(pred, e, succ);13         succ.prev = newNode;14         if (pred == null)15             first = newNode;16         else17             pred.next = newNode;18         size++;19         modCount++;20     }

Insert at the end of the shortlist is the same as insert data at any position, and the efficiency is not low because of the insertion position. Therefore, if an element is often inserted to any position in an application, you can consider using the rule list to mention ArrayList.

3. delete any element:

The List interface also provides the remove (int index) method to delete elements at any position. In ArrayList, the remove () method is the same as the add () method. array replication is required to remove elements from any position.

The source code of the remove () method of ArrayList is as follows:

 1 public E remove(int index) { 2         rangeCheck(index); 3  4         modCount++; 5         E oldValue = elementData(index); 6  7         int numMoved = size - index - 1; 8         if (numMoved > 0) 9             System.arraycopy(elementData, index+1, elementData, index,10                              numMoved);11         elementData[--size] = null; // clear to let GC do its work12 13         return oldValue;14     }

It can be seen that each delete operation in the ArrayList requires array reorganization. The higher the position of the deleted element, the higher the overhead of array reorganization.

Source code of the remove () method of the revoke list:

 1 public E remove(int index) { 2         checkElementIndex(index); 3         return unlink(node(index)); 4     } 5 E unlink(Node<E> x) { 6         // assert x != null; 7         final E element = x.item; 8         final Node<E> next = x.next; 9         final Node<E> prev = x.prev;10 11         if (prev == null) {12             first = next;13         } else {14             prev.next = next;15             x.prev = null;16         }17 18         if (next == null) {19             last = prev;20         } else {21             next.prev = prev;22             x.next = null;23         }24 25         x.item = null;26         size--;27         modCount++;28         return element;29     }
 1 Node<E> node(int index) { 2         // assert isElementIndex(index); 3  4         if (index < (size >> 1)) { 5             Node<E> x = first; 6             for (int i = 0; i < index; i++) 7                 x = x.next; 8             return x; 9         } else {10             Node<E> x = last;11             for (int i = size - 1; i > index; i--)12                 x = x.prev;13             return x;14         }15     }

In the explain list, first find the element to be deleted through the loop. If the element is in the first half segment, find it from the front and back; if the element is in the second half segment, search from the back to the back, however, to remove the intermediate element, it is almost necessary to traverse the half List. All elements are relatively efficient, but they are very low in the middle, regardless of whether they are in the front or back.

4. capacity parameters:

The capacity parameter is an array-based List-specific performance parameter, such as ArrayList and Vector. It indicates the size of the initialized array. When the number of elements stored in the array exceeds its original size, it will expand, that is, an array copy. Therefore, setting the array size properly helps reduce the number of resizing times, this improves system performance.

5. traverse the list:

After JDK1.5, there are at least three ways to traverse the list: forEach operation, iterator, and for loop. Tests show that forEach's overall performance is not as good as that of the iterator. When the for loop traverses the list, ArrayList has the best performance, while the partial list has poor performance, because the partial list performs random access, A list traversal operation is always performed.

The ArrayList is implemented based on arrays and the random access efficiency is fast. Therefore, random access is limited. The shortlist is implemented based on the linked list. The random access performance is poor and should be avoided.

  • Map Interface

The main implementation classes around the Map interface include HashMap, hashTable, LinkedHashMap, and TreeMap. The implementation of the Properties class is also available in the subclass of HashMap.

1. HashMap and Hashtable

First of all, the difference between HashMap and Hashtable: Most methods of Hashtable are synchronized, but HashMap does not. Therefore, HashMap is NOT thread-safe. Secondly, Hashtable does not allow null values for key or value, while HashMap does. Third, internal algorithms are different. They map key hash algorithms and hash values to memory indexes differently.

HashMap uses the key as the hash algorithm and maps the hash value to the memory address to directly obtain the data corresponding to the key. At the underlying layer of HashMap, arrays are used. The so-called memory address is the subscript index of the array.

Hash conflicts must be mentioned in HashMap. elements 1 and 2 in HashMap need to be stored in the hash calculation, and the corresponding memory address is the same. For example:

  

HashMap uses arrays at the underlying layer, but the elements in the array are not simple values but an Entry object. As shown in:

It can be seen that HashMap maintains an Entry array internally. Each entry table item includes key, value, next, and hash. Next indicates pointing to another Entry. In the put () method of HashMap, we can see that when the put () method conflicts, the new entry will still be placed in the corresponding index subscript and replace the original value, in addition, to ensure that the old value is not lost, the next of the new entry is directed to the old value. In this way, multiple values are stored in the memory of an array index space.

Source code of the put () Operation of HashMap:

1 public V put (K key, V value) {2 if (table = EMPTY_TABLE) {3 inflateTable (threshold); 4} 5 if (key = null) 6 return putForNullKey (value); 7 int hash = hash (key); 8 int I = indexFor (hash, table. length); 9 for (Entry <K, V> e = table [I]; e! = Null; e = e. next) {10 Object k; 11 if (e. hash = hash & (k = e. key) = key | key. equals (k) {12 V oldValue = e. value; // get the old value 13 e. value = value; 14 e. recordAccess (this); 15 return oldValue; // return the old value 16} 17} 18 19 modCount ++; 20 addEntry (hash, key, value, I ); // Add the current table item to position I 21 return null; 22} 23 void addEntry (int hash, K key, V value, int bucketIndex) {24 if (size> = threshold) & (null! = Table [bucketIndex]) {25 resize (2 * table. length); 26 hash = (null! = Key )? Hash (key): 0; 27 bucketIndex = indexFor (hash, table. length); 28} 29 30 createEntry (hash, key, value, bucketIndex); 31} 32 void createEntry (int hash, K key, V value, int bucketIndex) {33 Entry <K, V> e = table [bucketIndex]; 34 table [bucketIndex] = new Entry <> (hash, key, value, e ); // place the new element in the I position and point its next to the old value 35 size ++; 36}

This implementation based on HashMap can minimize conflicts as long as the implementation of hashCode () and hash () methods is good enough, the operation on HashMap is equivalent to the operation on Random Access to the array, which has good performance. However, if the processing is not good, HashMap degrades to several linked lists in the case of a large number of conflicts, with poor performance.

2. capacity parameters:

Because HashMap and Hashtable are implemented based on arrays at the bottom layer, when the array space is insufficient, array resizing will be performed, and array resizing will be performed on Array replication, which will greatly affect the performance.

HashMap constructor:

1 public HashMap(int initialCapacity)2 public HashMap(int initialCapacity, float loadFactor)

InitialCapacity specifies the initial capacity of HashMap. loadFactor refers to the load factor (number of elements/total number of elements). HashMap also defines a threshold value, which is the product of the current array capacity and load factor, when the actual size of the array exceeds the threshold, the array is expanded.

In addition, the performance of HashMap depends to some extent on the Implementation of hashCode (). A good implementation of hashCode () can minimize conflicts and improve the access speed of hashMap.

3. LinkedHashMap

A major drawback of HashMap is its disorder. The data put into it is unordered during traversal and retrieval. To ensure the order of input elements, use LinkedHashMap.

LinkedHashMap inherits from HashMap, so its performance is better. On the basis of HashMap, a chain table is added in LinkedHashMap to store the order of elements. LinkedHashMap provides two types of sequence: one is the sequence of element insertion and the other is the sequence of recent access.

1  public LinkedHashMap(int initialCapacity,2                          float loadFactor,3                          boolean accessOrder)

Where, if accessOrder is true, it is sorted by the last access time of the element. If accessOrder is false, it is sorted by the insertion order.

4. TreeMap

TreeMap can sort elements. TreeMap is ordered based on the inherent sequence of elements (determined by Comparable or Comparator ).

TreeMap sorts keys. To determine the sort algorithm of keys, you can specify the following two methods:

1: inject Comparator into the TreeMap constructor.

TreeMap (Comparator <? Super K> comparator );

2: Use a key that implements the Comparable interface.

TreeMap is implemented internally Based on the red-black tree, while the red-black tree is a balanced search tree. Its statistical performance is superior to that of the balanced binary tree.

  • Set Interface

 Elements in the set cannot be repeated. The main implementation is HashSet, javashashsrt, and TreeSet. View the Set interface implementation class, and you can find that some implementations of all sets are an encapsulation of the corresponding Map.

Set features:

  • Optimization suggestions for Set Operations

1. Code that is repeatedly called in the separation loop. For example, for (int I = 0; I <list. size (); I ++), you can separate list. size.

2. Omit the same operation

3. Reduce the call of methods. System stacks are consumed when a method is called, which sacrifices system performance.

  • RandomAccess Interface

  The RandomAccess interface is an identity interface and does not provide any methods. The main purpose is to identify the implementation of lists that support fast random access. For example, you can select different traversal implementations when using the RandomAccess interface to improve the performance.

  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.