Java Collection Framework Essentials Overview (Core knowledge of Java Collection)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

What collection classes are there?
- Set class
- Queue class
- List class
- Map class
HashMap implementation principle, whether thread-safe, how to make it thread-safe
- The realization principle of HashMap
HashMap Thread-safety issues

This article mainly refer to:

"Crazy Java Handout lite"-Li Gang
HASHMAP Realization Principle Analysis

What collection classes are there?

The end of a picture

Both Set,queue and list inherit the collection, which is the root interface of most collection classes. and map is a separate interface that radiates out.

Set class

HashSet: The hash algorithm stores the elements in the collection, so access and lookup performance is good. But it is not synchronous, and you need code to keep it synchronized when there are more than two threads operating the hashset simultaneously. The collection element value can be null. When Hastset deposits an element, it calls the object's Hashcode () method to get its hash value, and then uses the hash value to determine where the object is stored in HashSet. If two elements are returned true by the Equals method, but their Hashcode method return values are not equal, HashSet stores them in different locations. Therefore, the HashSet set determines that two elements are equal by two objects that are equal by the Equals method, and the return value of the Hashcode method is equal. (therefore, when overriding the Equals method of a class, you should also override its Hashcode method, where the rule is: If two objects equals True, the hashcode value of the two objects should also be the same.) If you do not, cause equals to return true, and Hashcode is the same, then HashSet will save the two objects in the hash table in the unused position, so that two objects are added successfully, violating the rules of the set set. ）
Linkedhashset:hashset subclasses, unlike HashSet, use lists to maintain the order of elements, so that elements appear to be stored in the order in which they are inserted. Because of the need to maintain the insertion order of elements, the performance is slightly lower than that of HashSet.
The implementation class for the Treeset:sortedset interface. Guarantees an orderly arrangement of elements. Two kinds of sorting methods are supported, one is natural sort and the other is custom sort. The natural sort is the CompareTo method of the TreeSet call element to compare the element size, which requires that the class corresponding to the element must implement the comparable interface and implement the CompareTo (Object obj) method. Some common Java classes such as Bigdecimal,biginteger, all numeric wrapper classes, character,boolean,string,date,time and so on have implemented the comparable interface. Custom sorting, however, takes advantage of the CompareTo method of passing in comparator anonymous objects and implementing them when building TreeSet.

Queue class

The data structure of the simulation queue. The queue is a "FIFO" type of container, with the trailing element and the head out of the element.

The implementation class of the Arrayqueue:deque interface, the Deque interface is a sub-interface of the queue interface, represents a double-ended queue, and defines a number of methods for double-ended queues, which allow the elements of the queue to be manipulated from both ends. Arraydeque from the name, we know that it is a double-ended queue that is implemented using arrays. (similar to ArrayList, the underlying uses a dynamic, reallocated object[] array to store the collection elements, and when the collection element exceeds the capacity of the array, the system allocates an object array at the bottom to store the collection elements. ）。 Arraydeque can be used as a stack because of the push and pop methods.
Priorityqueue: A more standard queue implementation class, but the order in which the queue elements are saved is not in the order in which they are queued, but rather by the size of the queue elements. Therefore, when the Peek method or the poll method is called to take out the elements in the queue, it is not necessarily to take out the first entry into the queue element, from this point of view, Priorityqueue has violated the queue FIFO basic rules.

List class

Linear table interface.

ArrayList: A collection element is stored internally in an array form. The thread is not secure.
Vector: Also uses the array form to save the collection element, but because realizes the thread synchronization function, but realizes the mechanism is not good, therefore each aspect performance is quite poor.
Stack:vector subclasses, simulating stack structures. The same is thread-safe, poor performance, so should be used sparingly. If you need to use the "stack" structure, consider using Arraydeque.
Fixed-length list: Tool class Arrays provides a method aslist can convert a expedited group or a set number of objects into a list collection, which is not an instance of the ArrayList or vector implementation class, Instead, an instance of the arrays inner class ArrayList is a fixed-length list collection that only iterates through the elements in the collection and cannot add or remove elements from the collection.
LinkedList: A relatively special collection class that implements both the Deque interface and the list interface (which can access elements based on an index), so it can be used as a queue and stack. The collection element is stored internally as a linked list, so it performs poorly when accessing the collection elements randomly, but performs better when inserting and deleting elements.

If multiple threads need to access the elements in the list collection at the same time, consider wrapping the collection into threads using the Collections tool class
Secure collection.

Map class

Key-value type storage mode.

Map and set are very close, and if you treat the value in the map as a dependency of the key, you can look at the map as if it were a set. From the Java source, it is also true that the map was first implemented, and then a set of the value is a null map.

Hashtable: From the name, we know that it is an old class, because there is no rule to honor the first letter of each word in the class name.
Both hashmap:hashtable and HashMap are typical implementations of map, with relationships similar to those of ArrayList and vectors. The former is an ancient map implementation class, and is thread-safe, so performance is worse than the latter. And Hashtable does not allow NULL as a key or value (throws a null pointer exception), and HashMap can. Similar to HashSet, an object used as a key must implement the Hashcode method and the Equals method, whereas a method that judges two value equals is relatively simple, as long as two objects are compared by the Equals method to return true.
Linkedhashmap:hashmap subclasses, which maintain the order of key-value pairs with a doubly linked list, thus performing slightly less than HashMap.
A subclass of Properties:hashtable, which is a type of key and value that is a string-class map. As stated in the name, this object is particularly handy when working with properties files, you can write key-value pairs in the map to a property file, or you can load the property name = attribute value in the property file into the Map object.
The implementation class for the Treemap:sortedmap interface. itself is a red-black tree data structure, each KV pair as a red-black tree node, store key-value pairs when the node is sorted according to key. Also divided into natural sorting and custom-ordered two ways.

HashMap implementation principle, whether thread-safe, how to make it a thread-safe hashmap implementation principle

This section is from the HashMap implementation principle analysis

Data structure of HashMap

There are arrays and linked lists in the data structure that can be stored, but these are basically two extremes.

Array

The array storage interval is continuous and occupies a serious memory, so the space is very complex. But the binary finding time of the array is small and the complexity is O (1); The array is characterized by: easy addressing, insertion and deletion difficulties;

Linked list

The storage interval of the list is discrete, the memory is relatively loose, so the space complexity is very small, but the time complexity is very large, up to O (N). The list is characterized by difficult addressing, easy insertion and deletion.

Hash table

Can we combine the characteristics of both to make a data structure that is easy to address, insert and delete? The answer is yes, and that's the hash table we're going to mention. Hash table (hash table) not only satisfies the data search convenience, but also does not occupy too much content space, the use is very convenient.

There are a number of different implementations of the hash table, and what I'll explain next is the most commonly used method-the Zipper method, which we can understand as "arrays of linked lists",

From what we can find is that the hash table consists of an array + linked list, an array of length 16, each of which stores the head node of a linked list. So what rules are these elements stored in the array? The general situation is obtained by hash (key)%len, that is, the hash value of the key of the element is modeled by the array length. For example, in the above hash table, 12%16=12,28%16=12,108%16=12,140%16=12. So 12, 28, 108, and 140 are all stored in the position labeled 12 below the array.

HashMap is actually a linear array, so it can be understood that the container where the data is stored is a linear array. This may be confusing to us, how does a linear array implement key-value pairs to access data? Here HashMap has to do some processing.

First HashMap inside the implementation of a static internal class entry, its important attributes are key, value, next, from the property key,value we can clearly see entry is the HashMap key value of the implementation of a basic bean, What we said above is that the basis of hashmap is a linear array, which is the contents of Entry[],map are stored in entry[].

/** * The table, resized as necessary. Length MUST Always be a power of two. */transient Entry[] table;

Access implementation of HashMap

Since it is a linear array, why random access? Here HashMap uses a small algorithm, which is generally implemented as follows:

// 存储时:int hash = key.hashCode(); // 这个hashCode方法这里不详述,只要理解每个key的hash是一个固定的int值int index = hash % Entry[].length;Entry[index] = value;// 取值时:int hash = key.hashCode();int index = hash % Entry[].length;return Entry[index];

1) put

Question: If two keys get the same index through Hash%entry[].length, will there be a risk of coverage?
Here HashMap uses a concept of chained data structure. We mentioned above that there is a next property in the entry class that refers to the downward one entry. For example, the first key value to a comes in, by calculating the hash of its key to get the index=0, remember to do: entry[0] = A. After a while. A key value pair B, by calculating its index is also equal to 0, now what? HashMap will do this: B.next = a,entry[0] = B, if it comes in again C,index is equal to 0, then C.next = b,entry[0] = C; So we find that the index=0 place actually accesses the A,b,c three key-value pairs, They are linked by the next attribute. So don't worry about it. This means that the last inserted element is stored in the array. So far, the general realization of HASHMAP, we should have been clear.

  public V put (K key, V value) {if (key = = null) return Putfornullkey (value);//null is always placed first in the array        A list of int hash = hash (Key.hashcode ());        int i = indexfor (hash, table.length);            Traverse the list for (entry<k,v> e = table[i]; E! = null; e = e.next) {Object K;                If key already exists in the linked list, replace with new value if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)) {                V oldValue = E.value;                E.value = value;                E.recordaccess (this);            return oldValue;        }} modcount++;        AddEntry (hash, key, value, I);    return null;    } void AddEntry (int hash, K key, V value, int bucketindex) {entry<k,v> e = Table[bucketindex]; Table[bucketindex] = new entry<k,v> (hash, key, value, E); Parameter e, is entry.next//if size exceeds threshold, the table size is expanded. Re-hash if (size++ >= threshold) Resize (2 * table.length);}

Of course HashMap also contains some optimization aspects of the implementation, here also say. For example: entry[] The length of a certain, with the map inside the data more and more long, so that the same index chain will be very long, will affect performance? HashMap inside a factor, as the map size becomes larger, entry[] will be extended with a certain length of rules.

2) Get

 public V get(Object key) {        if (key == null)            return getForNullKey();        int hash = hash(key.hashCode());        //先定位到数组元素，再遍历该元素处的链表        for (Entry<K,V> e = table[indexFor(hash, table.length)];             e != null;             e = e.next) {            Object k;            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))                return e.value;        }

3) access to null key
A null key is always stored in the first element of the entry[] array.

   private V putForNullKey(V value) {        for (Entry<K,V> e = table[0]; e != null; e = e.next) {            if (e.key == null) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }        modCount++;        addEntry(0, null, value, 0);        return null;    }     private V getForNullKey() {        for (Entry<K,V> e = table[0]; e != null; e = e.next) {            if (e.key == null)                return e.value;        }        return null;

4) determine the array index:hashcode% table.length modulo
When HashMap is accessed, it is necessary to calculate which element of the current key should correspond to the entry[] array, that is, the array subscript, as follows:

   /**     * Returns index for hash code h.     */    static int indexFor(int h, int length) {        return h & (length-1);    }

The bitwise take and, the function is equivalent to modulo mod or take the remainder%. (The ingenious use of bitwise arithmetic: Because the bit operation does not need to convert the number to decimal, therefore the speed is faster, and x mod/% n = x & (n-1), therefore uses the bitwise and the operation instead of the modulo operation)
This means that the array subscript is the same and does not indicate that the hashcode is the same.

5) Table Initial Size

  public HashMap(int initialCapacity, float loadFactor) {        .....        // Find a power of 2 >= initialCapacity        int capacity = 1;        while (capacity < initialCapacity)            capacity <<= 1;        this.loadFactor = loadFactor;        threshold = (int)(capacity * loadFactor);        table = new Entry[capacity];        init();    }

Note that the table initial size is not initialcapacity!! in the constructor

But >= initialcapacity 2 of the power of n!!!!

Why are ———— so designed? ——

Solution to hash conflict

Open addressing Method (linear detection re-hash, two-time detection and re-hash, pseudo-random detection and hashing)
Re-hash method
Chain Address method
Create a public overflow zone
The solution to HashMap in Java is to use the chain address approach.

Re-hashing the rehash process

When the hash table capacity exceeds the default capacity, the table must be resized. When the capacity has reached the maximum possible value, then the method will adjust the capacity to Integer.max_value return, at this time, you need to create a new table, the original table map to the new table.

   /** * Rehashes The contents of this map to a new array with a * larger capacity.     This method was called automatically when the * number of the keys on this map reaches its threshold. * * If current capacity are maximum_capacity, this method does isn't * resize the map, but sets threshold to Integer.     Max_value.     * This have the effect of preventing future calls.     * * @param newcapacity The new capacity, must be a power of; * Must be greater than current capacity unless current * capacity are maximum_capacity (in which case Val     UE * is irrelevant).        */void Resize (int newcapacity) {entry[] oldtable = table;        int oldcapacity = Oldtable.length;            if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value;        Return        } entry[] newtable = new Entry[newcapacity];        Transfer (newtable);        Table = newtable; threshold = (int) (newcapacity * LoadfactoR);     }/** * Transfers all entries from the current table to newtable.        */void Transfer (entry[] newtable) {entry[] src = table;        int newcapacity = Newtable.length;            for (int j = 0; J < Src.length; J + +) {entry<k,v> e = src[j];                if (E! = null) {SRC[J] = null;                    do {entry<k,v> next = E.next;                    Recalculate index int i = indexfor (E.hash, newcapacity);                    E.next = Newtable[i];                    Newtable[i] = e;                e = next;            } while (E! = null); }        }    }

Brief summary

对于HashSet及其子类而言，它们采用hash算法来决定集合中元素的存储位置，并通过hash算法来控制集合的大小；对于HashMap、Hahstable及其子类而言，它们采用hash算法来决定Map中key的存储，并通过hash算法来增加key集合的大小。hash表里可以存储元素的位置被称为桶（bucket），通常情况下，每个桶里存储一个元素，此时有最好的性能，hash算法可以根据hashCode值计算出桶的存储位置，接着从桶中取出元素。但hash表的状态是open的：在发生hash冲突的情况下，单个桶会存储多个元素，这些元素以链表形式存储，必须按顺序搜索。HashSet和HashMap的hash表都包含如下属性：- 容量capacity:hash表中通的数量- 初始化容量initial capacity：创建hash表时桶的数量。- 尺寸size：当前hash表中记录的数量- 负载因子load factor：负载因子=size/capacity，是一个0-1数值。负载因子为0时表示空的hash表，0.5表示半满的hash表，因此，轻负载的hash表具有冲突少，适宜插入与查询的特点。除此之外，hash表里有一个负载极限值，当负载因子达到这个值时，hash表会自动成倍增加容量，并将原有的对象重新分配，放入新的桶内，称为再哈希Rehashing。

HashMap Thread-safety issues

The hashmap itself is not thread-safe (Hashset,treeset,arraylist,arraydeque,linkedlist is also not thread-safe) and can be packaged as a collection of thread synchronizations using collections-provided class methods.

Collection c=Collections,synchronizedCollection(new ArrayList());List list=Collections.synchronizedList(new ArrayList());Set s=Collections.synchronizedSet(new HashSet());Map m=Collections,synchronizedMap(new HashMap());

Java Collection Framework Essentials Overview (Core knowledge of Java Collection)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Collection Framework Essentials Overview (Core knowledge of Java Collection)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java Collection Framework Essentials Overview (Core knowledge of Java Collection)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support