collection type
Set collection: The collection element cannot be duplicated . Elements are not in order . So it cannot access elements based on location. TreeSet and HashSet are its implementation classes.
List collection: The collection elements can be duplicated . The elements are ordered . So it can access elements based on location. ArrayList and LinkedList are its implementation classes.
MAP: It contains key-value pairs. The key of the map cannot be duplicated. Map does not guarantee the order of storage. HashMap and TreeMap are its implementation classes.
How to choose?
In fact, choosing set,list or map is dependent on your data structure. If the data you are going to store is not duplicated and does not need a sequence, you can choose to use set. If the data you are going to store needs to be guaranteed in order, you can choose to use list. If you have a key-value pair that associates two different objects or identifies an object with an identifier, you can choose Map.
As an example:
The collection of colors is best placed in the set.
The team had better put it in the list. Because the team's appearance needs order.
The collection of Web sessions is best on the map; the only session ID is better referencing the actual object.
When we choose which collection to use, our main concern is the speed of the set:
- Speed of accessing elements
- Speed of adding a new element
- The speed at which an element is removed
- The speed of the iteration
In addition, there are issues of consistency. Some implementations will guarantee the speed of access, while some implementations will have a change in speed. These speeds that we care about depend on the specific implementation of the collection.
Linked list Implementation
Specific classes: LinkedList, Linkedhashset
Internal principle: Each node holds a pointer to an element and the next element. Such as:
- If we want to add an element to the second position, it is very simple. Just like this, it simply points the pointer in the first node in the original image to the newly added element, and points the new element's pointer to the second node in the original. This speed is very fast! You do not need to copy, move, and record elements in the original collection.
- Removing elements is also the same, as long as the pointer in the first node in the original image points to the element of the second node in the original image.
When we want to access the elements in the collection is very slow. First look at the source code of LinkedList:
/** * Returns the element at the specified index position */ Node<e> node (int index) {//assert iselement Index (index); if (Index < (size >> 1 )) {node<e> x = first; for (int i = 0 ; i < index; i++) x = X.nex T return x; } else {node<e> x = last; for (int i = size-1 ; i > Index; i--) x = X.prev; return x; } }
When we want to get an element of the specified position, the program first shifts the size of the collection to a right one (that is, the size is reduced by half), and then determines whether the index position is close to the first node position or the last node position, and then begins the traversal to get the specified node.
For example, if you have 50 elements in the collection, and if you want to get the 20th element, then it will go through the first element in the collection until it gets to the 20th element; If you want to get the 40th element, it will start traversing the last element in the collection until it gets to the 40th element.
So, for a set of linked list implementations, it accesses elements at a very slow speed. and access to different locations of elements, the speed is not consistent. One more thing we should bear in mind is that when we do add and remove operations, we call the node method above to iterate over the previous node of the node that we want to manipulate, so this has a certain effect on speed.
So, when you need to add/remove faster, and don't care much about access time, a collection of lists like LinkedList is more appropriate. This is a good choice if you are going to have a lot of add/remove element actions in your collection.
Array implementations
Specific class: ArrayList
ArrayList is the only array-based implementation in the collection class.
See: In ArrayList, when we add a new element to the fourth position, it moves the elements in each position after the fourth position (including the fourth position) backward, then inserts the newly added element into the fourth position. This is slow, and it doesn't guarantee time, it depends on how many elements need to be copied. By the same token, removing an element is to move all the elements forward.
There is a worse situation. When we create the ArrayList object, its array length is fixed (this length can be set in the constructor, if not set to 10 by default). When we manipulate a collection, if its capacity exceeds this fixed length, it will have to create a larger array, and then copy all the elements in the current collection to the newly created collection. It's very, very slow.
However, the speed at which ArrayList accesses elements is fast. For an array, its position in the memory space is continuous, so we do not need to traverse the entire collection to accurately calculate the position of the element reference in memory. And the time spent is consistent.
So ArrayList is a good choice if you have some elements that are not modified and need to be accessed quickly.
Hash Implementation
Specific classes: HashSet, HashMap
HashSet is based on HashMap, so I'll just explain hashmap here.
How HashMap works: The HashMap is an array structure (called Entry table), and each item in the array is a linked list (called a bucket). When a new HashMap is created, an array is initialized.
public V put (K key, V value) {... int hash = hash (key); int i = indexfor (hash, table.length); for (entry<k,v> e = table[i]; E! = null ; e = e.next) {Object k; if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value; E.value = value; E.recordaccess (this ); return oldValue; }} ... addentry (hash, key, value, I); return null ; }
Above is a partial code for the Put method in HashMap (JDK7). When we put the element into the HashMap, we first recalculate the hash value according to key, which is worth the position of the element in the array (i.e. subscript), and then accesses the bucket by this subscript and begins to traverse the whole bucket (that is, the entire list). If the value of the newly added key already exists in the bucket, the original value is set to the newly added value and the old value is returned. Otherwise, the newly added element is placed in the chain header, and the first join is placed at the end of the chain.
In general, the program first determines where the entry is stored in the Entry table, based on the key of the element to be placed (through the hash algorithm). If the newly added entry key and the original entry key return True by Equals, then the value of the new add entry overrides the value of entry in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain (that is, the bucket's chain head).
Similarly, when getting elements from HashMap, first the hash value computed by key is used to determine where the entry is stored in the Entry table, and then the Equals method of key is used to find the desired element in the linked list of the corresponding position.
Summary: The HASHMAP bottom uses a entry[] array to hold all Entry objects (containing the hash value, Key-value pair, and the next Entry pointer), when a Entry object needs to be stored, Determines its storage position in the array according to the hash algorithm, and determines its storage position in the linked list on the array location according to the Equals method; When a entry is needed, the hash algorithm is used to find its storage location in the array. The entry is then removed from the linked list at that location according to the Equals method.
HashMap's rehashing
When there are more and more elements in the HashMap, the probability of hash collisions becomes higher, because the length of the array is fixed. So in order to improve the efficiency of the query, we need to expand the HashMap array. The array capacity will automatically expand by twice times, and when the array is expanded, all the original entry will recalculate the index value, and the order of the entry chain will also be reversed (if it is in the same chain), and the newly added entry index value will be recalculated, which is the most consumed performance, This process is rehashing.
So when is the hashmap going to be enlarged? When the number of elements in the HashMap exceeds the array small loadfactor, the array is expanded, and the default value of Loadfactor is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds 16*0.75=12, the size of the array is expanded to 2*16=32, that is, it expands by one time, and then recalculates the position of each element in the array, which is a very performance-intensive operation, So if we have predicted the number of elements in HashMap, then the number of preset elements can effectively improve the performance of HashMap.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Java Collection data structure