Java container learning notes (2) Summary of the Set interface and its implementation classes

Source: Internet
Author: User
Tags repetition

In Java container Study Notes (1), I have outlined the basic concepts and interface implementation of collection, and summarized the implementation and usage of an important subinterface list and its sub-classes.

This article mainly summarizes the usage of the Set interface and its implementation classes, including hashset (unordered and non-repeated), javashashset (sorted and non-repeated in order), and treeset (sorted and non-repeated in red/black tree mode ), enumset, concurrentskiplistset (from Java. util. concurrent package), copyonwritearrayset (from Java. util. concurrent package.

 

2. Set interface and its implementation class
List of methods in the Set interface:

Both the Set set and list sets store the sequence of individual elements, but the Set set does not allow repeated elements in the Set (mainly dependent on the equals method ).

The parent interfaces of the Set interface are collection and iterable. The subinterfaces that directly implement this interface include sortedset and navigableset.

Important classes that implement the set interface include hashset (unordered and not repeated), sorted hashset (sorted and not repeated in order), treeset (sorted and not repeated in red/black tree mode), enumset, concurrentskiplistset (from Java. util. concurrent package), copyonwritearrayset (from Java. util. concurrent package ).

No method is added to the set interface. All methods are from the parent interface. It cannot provide bitwise access methods like list. In mathematics, a set has three properties: certainty, the opposite sex, and no sequence.

ØFeatures, implementation mechanisms, and usage of hashset

A) features of hashset:

The elements stored in the hashset are unordered, and the underlying layer is implemented by hashmap. The key is the element to be placed, and the value is a constant named present of the object type, because hash functions are used, the access speed is very fast. When the address space is large, its access speed can reach the O (1) level. If you first understand the implementation method of hashmap, the implementation of hashset is very simple.

B) Implementation Mechanism of hashset:

First, you need to understand the usage of hash or hash. We know that when the data volume is large, the hash function computing results will be repeated and stored as shown in.

There is a loadfactor (load factor) in the hashset. for a total of 11 locations shown, four locations have been stored, that is, 40% of the space is used.

In the default Implementation of hashset, the initial capacity is 16 and the load factor is 0.75. That is to say, when 75% of the space is used, it will be hashed again ), the previous hash (array) will be deleted. The new hash is twice the length of the previous hash, and the maximum value is integer. max_value.

The higher the load factor, the larger the memory usage, and the longer the element searching time.

The lower the load factor, the smaller the memory usage, and the shorter the element search time.

It can be seen that when the hash value is the same, it will be stored in the same location and linked in sequence using the linked list.

(When the interviewer asked this question, my answer was "re-hash". In fact, I don't know how hashset is actually implemented. I only know that I learned to re-hash the data structure, that is, when the hash table is full, you need to re-create a hash table to facilitate access, because a large number of values are placed in a single position and it turns into a linked list query, which is almost O (n/2) level, but I have not mentioned the hash process and how to store the hash values at the same time ......~~ O (>_<) O ~~).

To demonstrate that hashset is implemented as above in Java, the source code of the two important methods in JDK is attached below: (the source code is from hashmap, because hashset is implemented based on hashmap)

/**     * Rehashes the contents of this map into a new array with a     * larger capacity.  This method is called automatically when the     * number of keys in this map reaches its threshold.     *     * If current capacity is MAXIMUM_CAPACITY, this method does not     * resize the map, but sets threshold to Integer.MAX_VALUE.     * This has the effect of preventing future calls.     *     * @param newCapacity the new capacity, MUST be a power of two;     *        must be greater than current capacity unless current     *        capacity is MAXIMUM_CAPACITY (in which case value     *        is irrelevant).     */    void resize(int newCapacity) {        Entry[] oldTable = table;        int oldCapacity = oldTable.length;        if (oldCapacity == MAXIMUM_CAPACITY) {            threshold = Integer.MAX_VALUE;            return;        }        Entry[] newTable = new Entry[newCapacity];        transfer(newTable);        table = newTable;        threshold = (int)(newCapacity * loadFactor);    }    /**     * Transfers all entries from current table to newTable.     */    void transfer(Entry[] newTable) {        Entry[] src = table;        int newCapacity = newTable.length;        for (int j = 0; j < src.length; j++) {            Entry<K,V> e = src[j];            if (e != null) {                src[j] = null;                do {                    Entry<K,V> next = e.next;                    int i = indexFor(e.hash, newCapacity);                    e.next = newTable[i];                    newTable[i] = e;                    e = next;                } while (e != null);            }        }    }

Hashset Implements Five constructor methods and provides four constructor methods. You can see detailed instructions on these methods in the API. Because hashset is implemented based on hashmap, We only care about the key we put. value is a constant of the object type. Therefore, the iterator method uses the hashmap keyset Method for iteration.

C) How to Use hashset:

From the characteristics and implementation of hashset, we know that we have no reason to choose not to use hashset unless we do not need to put repeated data and do not care about the order in which the elements are ordered or not. In addition, hashset allows empty values.

So how does a hashset ensure that it is not repeated? The following is an example:

Import Java. util. hashset; import Java. util. iterator; public class exampleforhashset {public static void main (string [] ARGs) {hashset <Name> HS = new hashset <Name> (); HS. add (new name ("Wang", "Wu"); HS. add (new name ("Zhang", "San"); HS. add (new name ("Wang", "San"); HS. add (new name ("Zhang", "Wu"); // the output of this sentence is 2system. out. println (HS. size (); iterator <Name> it = HS. iterator (); // The following two rows are output: Zhang: San and Wang: wuwhile (it. hasnext () {system. out. println (it. next () ;}} class name {string first; string last; public name (string first, string last) {This. first = first; this. last = last ;}@ overridepublic Boolean equals (Object O) {If (null = O) {return false ;}if (this = O) {return true ;} if (O instanceof name) {name = (name) O; // in this example, if first is the same, it is equal if (this. first. equals (name. first) {return true ;}} return false ;}@ overridepublic int hashcode () {int prime = 31; int result = 1; // The implementation of hashcode must correspond to the implementation of the equals method return prime * result + first. hashcode () ;}@ overridepublic string tostring () {return first + ":" + Last ;}}

The preceding example is briefly described as follows:

As mentioned above, elements in a hashset cannot be repeated. What are repeated elements and what are repeated definitions?

In the above example, a simple class name class is implemented, and the equals and hashcode methods are overwritten. Does the repetition refer to the equals method? If equals is the same, is it repeated? Of course not. If we rewrite the hashcode method, change the return value

Return prime * result + first. hashcode () + last. hashcode ()

The size in the hashset will change to 4, but the name ("Wang", "Wu") and name ("Wang", "San ") in fact, the comparison using the equals method is actually the same.

Name n1 =NewName ("W", "x ");

Name n2 =NewName ("W", "Y ");

System.Out. Println (n1.equals (N2 ));

That is to say, the above Code outputs true.

In this case, can we think that if the hashcode is the same, we can judge whether the return value of equals is true. If it is true, it is the same, that is, the above mentioned repetition. If hashcode is different, it must be unique?

As a result, equals is the same, hashcode is not necessarily the same, and the return values of equals and hashcode are not absolutely correlated? Of course, we must implement the equals method according to the hashcode method and establish an association. That is to say, if equals is the same under normal circumstances, the return value of hashcode should be the same.

ØLinkedhashsetFeatures, implementation mechanisms, and usage

A) features of linkedhashset:

Linkedhashset ensures that it inherits from hashset according to the insertion order and does not implement any new methods that can be used.

B) linkedhashset implementation mechanism:

Linkedhashset is inherited from hashset. During construction, the constructor ignored in hashset is used:

/**  * Constructs a new, empty linked hash set.  (This package private  * constructor is only used by LinkedHashSet.) The backing  * HashMap instance is a LinkedHashMap with the specified initial  * capacity and the specified load factor.  *  * @param      initialCapacity   the initial capacity of the hash map  * @param      loadFactor        the load factor of the hash map  * @param      dummy             ignored (distinguishes this  *             constructor from other int, float constructor.)  * @throws     IllegalArgumentException if the initial capacity is less  *             than zero, or if the load factor is nonpositive  */HashSet(int initialCapacity, float loadFactor, boolean dummy) {map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);}

From the JDK code above, we can see that the bottom layer of the linkedhashset is implemented using linkedhashmap.

Therefore, the implementation is relatively simple. It is based on the dummy parameter. We do not need to input it. Select whether to construct hashset or javashashset.

C) How to Use linkedhashset:

Because the linkedhashset inherits from the hashset and no additional methods are provided for use, it is basically the same as the hashset in use, but only faces the problem of choice. We select different data structures as needed to meet our needs.

ØFeatures, implementation mechanism and usage of copyonwritearrayset

A) features of copyonwritearrayset:

Copyonwritearrayset is a class in the Java. util. Concurrent package that inherits from abstractset. It is implemented using copyonwritearraylist at the underlying layer. It has the characteristics of set and arraylist, and is a thread-safe class.

B) Implementation Mechanism of copyonwritearrayset:

During implementation, the copy method at write time is used and the thread synchronization is implemented using the re-entry lock. The bottom layer uses copyonwritearraylist to construct an instance object, when adding an element, call the addifabsent method of copyonwritearraylist to ensure that data is not duplicated. Other implementations are similar to copyonwritearraylist.

C) How to Use copyonwritearrayset:

This is still a problem of choice. The bottom layer of hashset is also implemented using arrays. It has the advantage of high access efficiency. When the load factor is very small, it can almost reach O (1) but it is not thread-safe. This class can be used when we need to use it in a multi-threaded concurrent environment. Of course, this is not the only method to implement thread security.

ØFeatures, implementation mechanisms, and usage of treeset

A) features of treeset:

The elements in the treeset are ordered and cannot be repeated.

B) Implementation Mechanism of treeset:

How does treeset keep elements in an orderly and non-repetitive manner?

First, the treeset underlying layer is implemented using treemap. Like hashset, each element to be put is placed at the key position, and the value position is a constant of the object type.

The JDK Source Code contains the following comments:

/**     * Constructs a new, empty tree set, sorted according to the     * natural ordering of its elements.  All elements inserted into     * the set must implement the {@link Comparable} interface.     * Furthermore, all such elements must be <i>mutually     * comparable</i>: {@code e1.compareTo(e2)} must not throw a     * {@code ClassCastException} for any elements {@code e1} and     * {@code e2} in the set.  If the user attempts to add an element     * to the set that violates this constraint (for example, the user     * attempts to add a string element to a set whose elements are     * integers), the {@code add} call will throw a     * {@code ClassCastException}. */

From the annotations, we can see that the key factor to ensure non-repetition is not the hashcode and equals methods, but compareto. That is to say, the elements to be added must implement the comparable interface.

C) How to Use treeset:

When summing up the usage of hashset, we used an example, which is also a problem of choice when using treeset, whether we want to ensure that the inserted elements are in order (not in order of insertion, but according to the return value of compareto) is a standard of the set type. (I am not an expert. I am just a cainiao. You are welcome to shoot bricks)

ØFeatures, implementation mechanism and usage of concurrentskiplistset

A) features of concurrentskiplistset:

First of all, it must be said that the name of this class is very strange to me, just like copyonwritearraylist, I think it is a long name, but when I checked the meaning of copy-on-write, I was no longer surprised, and even let me guess its implementation mechanism.

So what does Concurrent-Skip mean? Parallel skipping?

Like most other concurrent collection implementations, this class does not allow null elements, because it cannot reliably distinguish null parameters and return values from nonexistent elements.

B) concurrentskiplistset implementation mechanism:

The underlying concurrentskiplistset is implemented using concurrentskiplistmap. So what does parallel skipping mean? I cannot summarize it for the time being. ⊙ B Khan

C) How to Use concurrentskiplistset:

⊙ B Khan

 

The blog content is summarized during the course of study. I am just a newbie and I have never been a mentor. Thank you for your advice !!!! If an error is found, comment below or mail:
Bluesky_taotao@163.com

In-depth research

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.