In-depth analysis of common Java data structures (vector, arraylist, list, and map)

Source: Internet
Author: User
Tags concurrentmodificationexception
On the Internet, I accidentally saw an article about the Common Data Structures in Java, which has been thoroughly analyzed.
:) Linear tables, linked lists, and hash tables are common data structures. During Java Development, JDK has provided a series of corresponding classes for us to implement basic data structures. These classes are in the Java. util package. This article attempts to explain the functions of each class and how to use these classes correctly through a simple description...

Collection
Shortlist
│ Invalid parameter list
│ ├ Arraylist
│ Vector
│ Elastic Stack
Sorted set
Map
├ Hashtable
├ Hashmap
└ Weakhashmap

Collection Interface

Collection is the most basic collection interface. A collection represents a group of objects, namely, elements of the collection ). Some collections allow the same elements while others do not. Some can be sorted, while others cannot. The Java SDK does not provide classes that directly inherit from collections. The classes provided by the Java SDK are the "subinterfaces" that inherit from collections, such as list and set.
All classes that implement the collection interface must provide two standard constructor: A non-parameter constructor is used to create an empty collection, A constructor with the collection parameter is used to create a new collection, which has the same elements as the imported collection. The next constructor allows you to copy a collection.
How to traverse every element in the collection? Regardless of the actual type of collection, it supports an iterator () method. This method returns an iterator, and each element in the collection can be accessed one by one using this iterator. The typical usage is as follows:
Iterator it = collection. iterator (); // obtain an iterator
While (it. hasnext ()){
Object OBJ = it. Next (); // obtain the next element.
}
The two interfaces derived from the collection interface are list and set.

List Interface

List is an ordered collection, which can be used to precisely control the insert position of each element. You can use an index (the position of an element in the list, similar to an array subscript) to access the elements in the list, which is similar to an array in Java.
Unlike the set mentioned below, the list can have the same element.
In addition to the iterator () method required for the collection interface, list also provides a listiterator () method to return a listiterator interface. Compared with the standard iterator interface, listiterator has some more add () you can add, delete, and set elements to traverse forward or backward.
Common classes that implement the list interface include the list, arraylist, vector, and stack.

Sort list class

The listlist interface allows null elements. In addition, the values list provides additional get, remove, and insert methods at the beginning or end of the values list. These operations enable the queue list to be used as a stack, queue, or two-way Queue (deque ).
Note that the synchronized list method is not available. If multiple threads access a list at the same time, they must implement access synchronization by themselves. One solution is to construct a synchronized list when creating a list:
List list = collections. synchronizedlist (new collections list (...));

Arraylist class

Arraylist implements an array of variable sizes. It allows all elements, including null. Arraylist is not synchronized.
Size, isempty, get, set method running time is constant. However, the overhead of the add method is the constant of the allocation. It takes O (n) to add n elements. The running time of other methods is linear.
Each arraylist instance has a capacity, that is, the size of the array used to store elements. This capacity can automatically increase with the addition of new elements, but the growth algorithm is not defined. When a large number of elements need to be inserted, you can call the ensurecapacity method before insertion to increase the arraylist capacity to improve the insertion efficiency.
Like the synchronized list, arraylist is also non-synchronous (unsynchronized ).

Vector

The vector is very similar to the arraylist, but the vector is synchronized. Although the iterator created by vector is the same interface as the iterator created by arraylist, because vector is synchronous, when an iterator is created and in use, another thread changes the state of the vector (for example, adding or deleting some elements). When calling the iterator method, concurrentmodificationexception is thrown. Therefore, this exception must be caught.

Stack

Stack inherits from vector to implement a post-import, first-out stack. Stack provides five additional methods to make the Vector used as a stack. The basic push and pop methods also include the elements of the peek method to get the top of the stack. The empty method tests whether the stack is empty. The search method checks the position of an element in the stack. The stack is empty after being created.

Set Interface

Set is a collection that does not contain repeated elements, that is, the two elements E1 and E2 both have e1.equals (E2) = false, and set has a maximum of null elements.
Obviously, the set constructor has a constraint that the imported collection parameter cannot contain repeated elements.
Note: You must be careful when operating mutable objects ). If a variable element in a set changes its state, object. Equals (object) = true may cause some problems.

Map Interface

Note that map does not inherit the collection interface. Map provides the key ing between key and value. A map cannot contain the same key, and each key can only map one value. The map interface provides three sets of views. The map content can be treated as a set of keys, a set of values, or a set of key-value ing.

Hashtable class

Hashtable inherits the map interface and implements a key-value ing hash table. Any non-null object can be used as a key or value.
Put (Key, value) is used for adding data, and get (key) is used for retrieving data. The time overhead of these two basic operations is constant.
Hashtable uses the initial capacity and load factor parameters to adjust the performance. Generally, the default load factor 0.75 achieves a better balance between time and space. Increasing the load factor can save space, but the corresponding search time will increase, which affects operations such as get and put.
A simple example of hashtable is as follows: Put 1, 2, 3 into hashtable, and their keys are "one", "two", and "three ":
Hashtable numbers = new hashtable ();
Numbers. Put ("one", new INTEGER (1 ));
Numbers. Put ("two", new INTEGER (2 ));
Numbers. Put ("three", new INTEGER (3 ));
To retrieve a number, such as 2, use the corresponding key:
Integer n = (integer) numbers. Get ("two ");
System. Out. println ("Two =" + n );
As the key object is determined by calculating its hash function, any object used as the key must implement the hashcode and equals methods. The hashcode and equals Methods inherit from the root class object. If you use a custom class as the key, be very careful. According to the definition of the hash function, if the two objects are the same, that is, if obj1.equals (obj2) = true, their hashcode must be the same, but if two objects are different, their hashcode is not necessarily different. If the hashcode of two different objects is the same, this phenomenon is called a conflict. A conflict will increase the time overhead for operating the hash table. Therefore, the hashcode () method should be defined as much as possible to speed up the operation of the hash table.
If the same object has different hashcode, operations on the hash table will produce unexpected results (the expected get method returns NULL). To avoid this problem, you only need to remember one: the equals and hashcode methods must be rewritten at the same time, instead of writing only one of them.
Hashtable is synchronous.

Hashmap class

Hashmap is similar to hashtable. The difference is that hashmap is non-synchronous and allows null, that is, null value and null key. However, when hashmap is treated as a collection (the values () method can return the collection), its iteration suboperation time overhead is proportional to the capacity of hashmap. Therefore, if the performance of iterative operations is very important, do not set the hashmap initialization capacity too high or the load factor too low.

Weakhashmap class

Weakhashmap is an improved hashmap that implements "weak references" to keys. If a key is no longer referenced by external entities, it can be recycled by GC.

Summary:

If operations such as stacks and queues are involved, you should consider using the list. For elements that need to be inserted and deleted quickly, you should use the random list. If you need to quickly access elements randomly, you should use the arraylist.
If the program is in a single-threaded environment or the access is only performed in one thread, the efficiency of non-synchronous classes is high. If multiple threads may operate on one class at the same time, synchronous classes should be used.
Pay special attention to the operations on the hash table. The equals and hashcode methods should be correctly rewritten as the key object.
Try to return the interface rather than the actual type. For example, if the list is returned rather than the arraylist, the client code does not need to be changed if you need to replace the arraylist with the explain list later. This is for abstract programming.

Appendix: synchronization> 〉The vector is synchronized. Some methods in this class ensure that the objects in the vector are thread-safe. Arraylist is asynchronous, so the objects in arraylist are not thread-safe. Because the synchronization requirements will affect the execution efficiency, it is a good choice to use arraylist if you do not need a thread-safe set, this avoids unnecessary performance overhead due to synchronization.

Data growth> 〉In terms of the internal implementation mechanism, both arraylist and vector use arrays to control objects in the set. When you add elements to these two types, if the number of elements exceeds the current length of the internal array, both of them need to extend the length of the internal array, by default, vector automatically doubles the length of the original array, and arraylist is 50% of the original length. Therefore, the space occupied by this set is always larger than what you actually need. Therefore, if you want to save a large amount of data in the collection, using vector has some advantages, because you can avoid unnecessary resource overhead by setting the initialization size of the collection.

Usage mode> 〉In arraylist and vector, it takes the same time to search for data from a specified position (through an index) or add or remove an element at the end of the set, this time is represented by O (1. However, if an element is added or removed from another position in the Set, the time consumed will grow linearly: O (n-I), where N represents the number of elements in the set, I indicates the index location where the element is added or removed. Why? It is assumed that all elements after the I and I elements in the collection must be displaced during the above operations. What does all this mean?
This means that you can only search for elements at a specific position or add or remove elements at the end of the set. You can use vector or arraylist. For other operations, you 'd better select another set operation class. For example, does the linklist set class take the same time to add or remove any element from the set? O (1), but it is slow to index an element-O (I), where I is the index position. it is also easy to use arraylist, because you can simply use indexes instead of creating iterator objects. Linklist also creates an object for each inserted element, and you need to understand that it also brings additional overhead.
Finally, in practical Java, Peter Haggar recommends using a simple array instead of vector or arraylist. This is especially true for programs with high execution efficiency requirements. Array is used to avoid synchronization, additional method calls, and unnecessary Space reallocation.

Collection
Shortlist
│ Invalid parameter list
│ ├ Arraylist
│ Vector
│ Elastic Stack
Sorted set
Map
├ Hashtable
├ Hashmap
└ Weakhashmap

Collection Interface

Collection is the most basic collection interface. A collection represents a group of objects, namely, elements of the collection ). Some collections allow the same elements while others do not. Some can be sorted, while others cannot. The Java SDK does not provide classes that directly inherit from collections. The classes provided by the Java SDK are the "subinterfaces" that inherit from collections, such as list and set.
All classes that implement the collection interface must provide two standard constructor: A non-parameter constructor is used to create an empty collection, A constructor with the collection parameter is used to create a new collection, which has the same elements as the imported collection. The next constructor allows you to copy a collection.
How to traverse every element in the collection? Regardless of the actual type of collection, it supports an iterator () method. This method returns an iterator, and each element in the collection can be accessed one by one using this iterator. The typical usage is as follows:
Iterator it = collection. iterator (); // obtain an iterator
While (it. hasnext ()){
Object OBJ = it. Next (); // obtain the next element.
}
The two interfaces derived from the collection interface are list and set.

List Interface

List is an ordered collection, which can be used to precisely control the insert position of each element. You can use an index (the position of an element in the list, similar to an array subscript) to access the elements in the list, which is similar to an array in Java.
Unlike the set mentioned below, the list can have the same element.
In addition to the iterator () method required for the collection interface, list also provides a listiterator () method to return a listiterator interface. Compared with the standard iterator interface, listiterator has some more add () you can add, delete, and set elements to traverse forward or backward.
Common classes that implement the list interface include the list, arraylist, vector, and stack.

Sort list class

The listlist interface allows null elements. In addition, the values list provides additional get, remove, and insert methods at the beginning or end of the values list. These operations enable the queue list to be used as a stack, queue, or two-way Queue (deque ).
Note that the synchronized list method is not available. If multiple threads access a list at the same time, they must implement access synchronization by themselves. One solution is to construct a synchronized list when creating a list:
List list = collections. synchronizedlist (new collections list (...));

Arraylist class

Arraylist implements an array of variable sizes. It allows all elements, including null. Arraylist is not synchronized.
Size, isempty, get, set method running time is constant. However, the overhead of the add method is the constant of the allocation. It takes O (n) to add n elements. The running time of other methods is linear.
Each arraylist instance has a capacity, that is, the size of the array used to store elements. This capacity can automatically increase with the addition of new elements, but the growth algorithm is not defined. When a large number of elements need to be inserted, you can call the ensurecapacity method before insertion to increase the arraylist capacity to improve the insertion efficiency.
Like the synchronized list, arraylist is also non-synchronous (unsynchronized ).

Vector

The vector is very similar to the arraylist, but the vector is synchronized. Although the iterator created by vector is the same interface as the iterator created by arraylist, because vector is synchronous, when an iterator is created and in use, another thread changes the state of the vector (for example, adding or deleting some elements). When calling the iterator method, concurrentmodificationexception is thrown. Therefore, this exception must be caught.

Stack

Stack inherits from vector to implement a post-import, first-out stack. Stack provides five additional methods to make the Vector used as a stack. The basic push and pop methods also include the elements of the peek method to get the top of the stack. The empty method tests whether the stack is empty. The search method checks the position of an element in the stack. The stack is empty after being created.

Set Interface

Set is a collection that does not contain repeated elements, that is, the two elements E1 and E2 both have e1.equals (E2) = false, and set has a maximum of null elements.
Obviously, the set constructor has a constraint that the imported collection parameter cannot contain repeated elements.
Note: You must be careful when operating mutable objects ). If a variable element in a set changes its state, object. Equals (object) = true may cause some problems.

Map Interface

Note that map does not inherit the collection interface. Map provides the key ing between key and value. A map cannot contain the same key, and each key can only map one value. The map interface provides three sets of views. The map content can be treated as a set of keys, a set of values, or a set of key-value ing.

Hashtable class

Hashtable inherits the map interface and implements a key-value ing hash table. Any non-null object can be used as a key or value.
Put (Key, value) is used for adding data, and get (key) is used for retrieving data. The time overhead of these two basic operations is constant.
Hashtable uses the initial capacity and load factor parameters to adjust the performance. Generally, the default load factor 0.75 achieves a better balance between time and space. Increasing the load factor can save space, but the corresponding search time will increase, which affects operations such as get and put.
A simple example of hashtable is as follows: Put 1, 2, 3 into hashtable, and their keys are "one", "two", and "three ":
Hashtable numbers = new hashtable ();
Numbers. Put ("one", new INTEGER (1 ));
Numbers. Put ("two", new INTEGER (2 ));
Numbers. Put ("three", new INTEGER (3 ));
To retrieve a number, such as 2, use the corresponding key:
Integer n = (integer) numbers. Get ("two ");
System. Out. println ("Two =" + n );
As the key object is determined by calculating its hash function, any object used as the key must implement the hashcode and equals methods. The hashcode and equals Methods inherit from the root class object. If you use a custom class as the key, be very careful. According to the definition of the hash function, if the two objects are the same, that is, if obj1.equals (obj2) = true, their hashcode must be the same, but if two objects are different, their hashcode is not necessarily different. If the hashcode of two different objects is the same, this phenomenon is called a conflict. A conflict will increase the time overhead for operating the hash table. Therefore, the hashcode () method should be defined as much as possible to speed up the operation of the hash table.
If the same object has different hashcode, operations on the hash table will produce unexpected results (the expected get method returns NULL). To avoid this problem, you only need to remember one: the equals and hashcode methods must be rewritten at the same time, instead of writing only one of them.
Hashtable is synchronous.

Hashmap class

Hashmap is similar to hashtable. The difference is that hashmap is non-synchronous and allows null, that is, null value and null key. However, when hashmap is treated as a collection (the values () method can return the collection), its iteration suboperation time overhead is proportional to the capacity of hashmap. Therefore, if the performance of iterative operations is very important, do not set the hashmap initialization capacity too high or the load factor too low.

Weakhashmap class

Weakhashmap is an improved hashmap that implements "weak references" to keys. If a key is no longer referenced by external entities, it can be recycled by GC.

Summary:

If operations such as stacks and queues are involved, you should consider using the list. For elements that need to be inserted and deleted quickly, you should use the random list. If you need to quickly access elements randomly, you should use the arraylist.
If the program is in a single-threaded environment or the access is only performed in one thread, the efficiency of non-synchronous classes is high. If multiple threads may operate on one class at the same time, synchronous classes should be used.
Pay special attention to the operations on the hash table. The equals and hashcode methods should be correctly rewritten as the key object.
Try to return the interface rather than the actual type. For example, if the list is returned rather than the arraylist, the client code does not need to be changed if you need to replace the arraylist with the explain list later. This is for abstract programming.

Appendix: synchronization> 〉The vector is synchronized. Some methods in this class ensure that the objects in the vector are thread-safe. Arraylist is asynchronous, so the objects in arraylist are not thread-safe. Because the synchronization requirements will affect the execution efficiency, it is a good choice to use arraylist if you do not need a thread-safe set, this avoids unnecessary performance overhead due to synchronization.

Data growth> 〉In terms of the internal implementation mechanism, both arraylist and vector use arrays to control objects in the set. When you add elements to these two types, if the number of elements exceeds the current length of the internal array, both of them need to extend the length of the internal array, by default, vector automatically doubles the length of the original array, and arraylist is 50% of the original length. Therefore, the space occupied by this set is always larger than what you actually need. Therefore, if you want to save a large amount of data in the collection, using vector has some advantages, because you can avoid unnecessary resource overhead by setting the initialization size of the collection.

Usage mode> 〉In arraylist and vector, it takes the same time to search for data from a specified position (through an index) or add or remove an element at the end of the set, this time is represented by O (1. However, if an element is added or removed from another position in the Set, the time consumed will grow linearly: O (n-I), where N represents the number of elements in the set, I indicates the index location where the element is added or removed. Why? It is assumed that all elements after the I and I elements in the collection must be displaced during the above operations. What does all this mean?
This means that you can only search for elements at a specific position or add or remove elements at the end of the set. You can use vector or arraylist. For other operations, you 'd better select another set operation class. For example, does the linklist set class take the same time to add or remove any element from the set? O (1), but it is slow to index an element-O (I), where I is the index position. it is also easy to use arraylist, because you can simply use indexes instead of creating iterator objects. Linklist also creates an object for each inserted element, and you need to understand that it also brings additional overhead.
Finally, in practical Java, Peter Haggar recommends using a simple array instead of vector or arraylist. This is especially true for programs with high execution efficiency requirements. Array is used to avoid synchronization, additional method calls, and unnecessary Space reallocation.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.