This article describes how to analyze Java traversal sets (implementation principle, algorithm performance, and application scenarios). For more information, see
Overview
Java provides a data Set framework, which Defines abstract data types such as List and Set, different implementation methods are adopted at the underlying layer, such as ArrayList and rule list.
In addition, Java provides several different methods for Traversing data sets. Developers must be clear about the features, applicability, and performance of each Traversal method at different underlying layers. The following is a detailed analysis of this part.
How are data elements stored in the memory?
Data elements are stored in memory in two ways:
1. Sequential storage, Random Access (Direct Access ):
In this way, the adjacent data elements are stored in the adjacent memory address, and the whole memory address is continuous. The memory address can be directly calculated based on the element location for direct reading. The average time complexity of reading an element at a specific position is O (1 ). Normally, this feature is available only for collections implemented based on arrays. In Java, ArrayList is represented.
2. Chain storage, Sequential Access:
In this way, each data element is not required to be in an adjacent location in the memory, and each data element contains the memory address of its next element. The memory address cannot be directly calculated based on the element location. Only elements can be read in order. The average time complexity of reading an element at a specific position is O (n ). It is mainly represented by a linked list.
In Java, the rule list is represented.
What traversal methods are provided in Java?
1. Traditional for loop traversal, based on counter:
The traversal maintains a counter outside the set, reads the elements at each position in sequence, and stops reading the last element. It is mainly to read elements by element location. This is also the original set Traversal method.
Written:
for (int i = 0; i < list.size(); i++) {list.get(i);}
2. Iterator traversal, Iterator:
Iterator is originally a design model of OO. It aims to shield the characteristics of different data sets and traverse the interfaces of the Set in a unified manner. Java, as an OO language, naturally supports the Iterator mode in Collections.
Written:
Iterator iterator = list.iterator();while (iterator.hasNext()) {iterator.next();}
3. foreach loop traversal:
The Iterator and counter explicitly declared are blocked.
Advantage: the code is concise and error-free.
Disadvantage: You can only perform simple traversal and cannot operate (delete or replace) data sets during the traversal process.
Written:
for (ElementType element : list) {}
What is the implementation principle of each Traversal method?
1. Traditional for loop traversal, based on counter:
The traversal maintains a counter outside the set, reads the elements at each position in sequence, and stops reading the last element. It is mainly to read elements by element location.
2. Iterator traversal, Iterator:
Generally, corresponding Iterator is required for each specific data set. Compared with traditional for loops, Iterator disables explicit traversal counters. Therefore, Iterator Based on the sequential storage set can directly access data by location. The normal implementation of Iterator Based on the chained storage set is to save the current traversal location. Move the pointer forward or backward based on the current position.
3. foreach loop traversal:
Based on the decompiled bytecode, we can find that foreach adopts the Iterator method internally, but the Java compiler helps us generate the code.
What is the performance of different traversal methods for different storage methods?
1. Traditional for loop traversal, based on counter:
Because it is based on the element location, read by location. So we can know that for sequential storage, because the average time complexity of reading elements at a specific position is O (1), the average time complexity of traversing the entire set is O (n ). For chained storage, because the average time complexity of reading elements at a specific position is O (n), the average time complexity of traversing the entire set is O (n2) (n square ).
Code read by location in ArrayList: Read by element directly.
transient Object[] elementData;public E get(int index) {rangeCheck(index);return elementData(index);}E elementData(int index) {return (E) elementData[index];}
Code read by position in the shortlist: each time, the code needs to be read backward from the first 0th elements. In fact, it also made a small internal optimization.
Transient int size = 0; transient Node
First; transient Node
Last; public E get (int index) {checkElementIndex (index); return node (index). item;} Node
Node (int index) {if (index <(size> 1) {// the query position is in the first half of the chain table. Start from the chain table header to find the Node
X = first; for (int I = 0; I <index; I ++) x = x. next; return x;} else {// the query position is in the second half of the linked list. Search for the Node from the end of the linked list.
X = last; for (int I = size-1; I> index; I --) x = x. prev; return x ;}}
2. Iterator traversal, Iterator:
Therefore, the RandomAccess type set does not have much meaning, but it increases the running time due to some additional operations. However, for the Sequential Access set, it is of great significance, because the Iterator maintains the current traversal position internally, so each traversal, reading the next position does not need to start from the first element of the set. You only need to move the pointer to the next position. In this way, the time complexity of traversing the entire set is reduced to O (n );
(Here, we only use the sorted list as an example.) The internal implementation of the iterator In the sorted list is to maintain the position of the current traversal, and then move the pointer:
Code:
public E next() {checkForComodification();if (!hasNext())throw new NoSuchElementException();lastReturned = next;next = next.next;nextIndex++;return lastReturned.item;}public E previous() {checkForComodification();if (!hasPrevious())throw new NoSuchElementException();lastReturned = next = (next == null) ? last : next.prev;nextIndex--;return lastReturned.item;}
3. foreach loop traversal:
The Analysis of Java bytecode shows that the internal implementation principle of foreach is also implemented through Iterator, but this Iterator is generated by the Java compiler for us, so we do not need to write it manually. However, since the type conversion check is performed every time, it takes a little longer than the Iterator. The time complexity is the same as that of Iterator.
The Iterator bytecode:
Code:new # // class java/util/ArrayListdupinvokespecial # // Method java/util/ArrayList."
":()Vastore_aload_invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;astore_goto aload_invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;popaload_invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Zifne return
Use the foreach bytecode:
Code:new # // class java/util/ArrayListdupinvokespecial # // Method java/util/ArrayList."
":()Vastore_aload_invokeinterface #, // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;astore_goto aload_invokeinterface #, // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;checkcast # // class loop/Modelastore_aload_invokeinterface #, // InterfaceMethod java/util/Iterator.hasNext:()Zifne return
What are the applicable scenarios of different traversal methods?
1. Traditional for loop traversal, based on counter:
Sequential storage: high read performance. Applicable to traversing sequential storage sets.
Chained storage: time is too complex to apply to a set of Hierarchical Storage.
2. Iterator traversal, Iterator:
Sequential storage: if you do not care too much about time, we recommend this method. After all, the code is more concise and prevents the issue of Off-By-One.
Chain storage: it is of great significance. The average time complexity is reduced to O (n), which is quite attractive. Therefore, this Traversal method is recommended.
3. foreach loop traversal:
Foreach only makes the code more concise, but it has some disadvantages, that is, it cannot operate data sets (delete) during traversal, so it is not used in some occasions. Besides, it is implemented based on Iterator. However, due to the type conversion problem, it is a little slower than using Iterator directly, but it is better, with the same time complexity. So how to choose, refer to the above two methods to make a choice.
What are Java best practices?
In the Java data collection framework, a RandomAccess interface is provided, which has no method but a flag. It is usually used by the implementation of the List interface to mark whether the implementation of the List supports Random Access.
A Data Set implements this interface, which means it supports Random Access. The average time complexity of reading elements by location is O (1 ). For example, ArrayList.
If this interface is not implemented, it indicates that Random Access is not supported. For example, revoke list.
So it seems that JDK developers have noticed this problem, so we recommend that you first determine whether Random Access, that is, List instanceof RandomAccess, is supported if you want to traverse a list.
For example:
If (list instanceof RandomAccess) {// use the traditional for loop traversal .} Else {// use Iterator or foreach .}
The above is a small series of Java traversal Set Method Analysis (implementation principle, algorithm performance, applicable occasions), I hope to help you!