Turn: Thinking about Java Collection framework

Source: Internet
Author: User
Tags api manual ibm developerworks

Said jungleford.

Java Collection framework (JCF) is familiar to Java players. Similar concepts in C ++ are standard template library (STL ), it encapsulates some data structures and related algorithms. Some time ago, I saw a question about the Java Collection framework in j2se. At that time, I gave a brief explanation of some concepts. Considering that this is a tool that will be frequently used by beginners of Java, so there are some text below. I mainly refer to a tutorial on IBM developerworks, which may be explained more clearly. Here is a concentrated one, for more details, see the "the collections framework" in the JDK documentation.

Source of the problem

Set: object container and Data Structure
Let's recall what we may face in program design. There are two types: basic and composite. The latter's common organization method is class. Different from the basic types, class objects usually need to be allocated dynamically. For example, if a new object is added to the heap space of the memory, this will be used as soon as we write the OO program. At the same time, we are not only facing a single basic type or object. What is the common organization mode for multiple such data? Yes, it's an array. This is an old concept of program design. The advantages of arrays are obvious. operations such as searching elements based on subscripts are easy, but the disadvantages are also obvious: the space is fixed and cannot grow dynamically (strong languages such as Java are sensitive to arrays out of bounds). It is difficult to insert or delete elements. Therefore, arrays are not a convenient tool to solve all set problems. We may need some new tools to study the data structure. In particular, arrays are a linear and ordered data structure.
The mathematical basis of data structure is set theory. Why? Above ?? Why? What is the inactive curtain of the Emperor? Are you sure you want to fix the problem? Why are you annoyed? Why? See the lower limit? From the perspective of O, a set is also an object, but it is a special object: The object container (note: we have not continued to discuss the set of basic types here, because the basic types and storage allocation methods are essentially different from those of objects ). A fundamental problem in set theory is that given an element, a set must be able to answer whether or not the element belongs to the set. Another problem is also very important: if an element belongs to a set, its position in the set should be unique, or it is uniquely identified. Of course there are other problems, such as searching, traversing, sorting, etc. This is related to the specific set type, which will be discussed later.

Unordered set, ordered set, and ing
When talking about the type of set, the concept of set we learned in high school is one of them, called "unordered set". That is to say, all elements of the set are equal and there is no sequential difference, therefore, identical elements are never allowed in unordered sets. Otherwise, when this element is obtained, you do not know which one should be taken. This violates the "unique confirmation" principle above.
When we go to college, we start to know another set type, called "Ordered Set" (or "linear table"), which is different from the "Tree" we encounter in the future ", non-linear data structures such as "graphs"). If you are a computer professional and have probably learned the "algebraic structure" in discrete mathematics, you will know more clearly, "Ordered Set" is actually a type of "binary relationship", specifically "partial order relationship", which can contain the same elements, because the sequence numbers of two identical elements can be different, the sequence numbers can still be used to uniquely determine an element. arrays are an ordered set, another characteristic of an ordered set is that any two elements can determine their order.
Is there a third possibility of unordered and ordered sets? It still appears in our high school algebra textbook, called "ing ". Is ing a set? In fact, since Conway, the theory of set has considered that "Everything is a collection" (but that is, this assertion has led to an embarrassing situation after the theory of set. If you are interested, let's look at some conclusions of Russell or Godel, or Google's "set theory paradox "). Ing is actually a set of "element pairs", just like f (a) = B, F (c) = D ,... it is equivalent to a set (unordered set) {(A, B), (C, D ),...}, in "ing", it can be seen as a set of (original image, image). In other words, it is a set of (Key keyword, value. So we can draw beautiful functional images on the Cartesian coordinate plane of the flute, because in the set theory, the function (ing) is a point on the two-dimensional plane, understand? In this way, we can understand the "Ordered Set" above. The partial order relation is A> B> C> D>... (If you do not know the "partial order relationship", consider them as arrays X [1] = A, X [2] = B, X [3] = C, X [4] = d... okay) it is equivalent to the unordered set {(1, A), (2, B), (3, C), (4, D ),...}, therefore, all sets are equivalent to unordered sets! So high school only taught us a collection, haha ......

The family of JCF

Okay, well, we all know that we didn't go into the math class. Why haven't we gotten into the question after talking so much nonsense? I haven't seen the JCF shadow yet! Do not worry about column-level officials. Here we will give you an explanation. In fact, the above concepts are very important for understanding JCF.
JCF is a relatively large family. You can see its class hierarchy diagram. The following figure (figure 1) is taken from the famous thinking in Java:


Figure 1 JCF hierarchy

Wow, there are so many interfaces and classes that you cannot start. What we really need to remember is a super simple structure (Figure 2 ):


Figure 2

This figure looks much more comfortable, right? But What problems does it explain? How can it grasp the entire JCF? We put the collection interface at the top, which means: Collection is actually the "ancestor" of the entire JCF family. Almost all JCF members are from this interface, or there is a close relationship with it. Collection provides APIs for some common operations on the set, including the insert () method and delete (remove () method) determine whether an element is a member (contains () method), traversal (iterator () method), and so on. Note that the previous "nonsense" will be reflected here: the set interface embodies the concept of "unordered set", which does not allow repeated elements; the list interface represents the "Ordered Set", while the map interface is the "ing" (not called map in earlier Java versions, which we will see later). In fact, map. the entry interface represents an "element pair". We can use the entryset () method of map to obtain such a set object composed of "element pairs. We noticed that both set and list are derived from the "ancestor" collection, but map is not. After all, operations on a pair of elements are different from operations on a single element, however, if you carefully compare the source code of collection and map, as well as the source code of their direct descendant abstractcollection and abstractmap, you will find many similarities, therefore, we can still regard map as an interface with a kinship with collection, while it is in a parallel position with set and list.
With "unordered set", "ordered set", and "ing", we can define various abstract data structures, such as vectors, linked lists, and stacks shown in step 1, hash table, balanced binary tree, etc. But what we need to remember is that only figure 2 is placed on other members. When it is used, just check the API manual? However, it is easier for beginners to use some classes, such as vector, arraylist, and hashmap.
You may not be familiar with some concepts, for example, what is "historical set", what are the differences and links between hashtable, hashmap, and treemap, it doesn't matter how to implement fast traversal, element search, or sorting of a specific set. We will study it one by one below.

Details: goals and Efficiency

It is not enough to have a JCF hierarchy. It is important to perform specific operations on the objects contained in the set. In the past, when we learned the data structure, the teacher may always ask you to calculate the time complexity of an algorithm, you may be impatient with this O (f (n), but in fact algorithm efficiency is an important factor.

1. Focus: traverse vs. Search
There are two main applications for a set: I need to know which elements of the set are, and find a specific element based on the conditions. Algorithms are usually called "traversal" and "Search ". Don't think we are not commonly used in our lives! For example, in CCTV's lucky 52, LI Yong asked contestants to report the exact price of a PDA. What would he do? "2000" "high" "1000" "low" "1500" "low "...... Until correct. Many people may choose this strategy, whether he is a computer professional or not, or whether he understands "Data Structure" or "half-lookup ", not to mention whether he knows there are algorithms that are more time-complex than O (log n) at no initial cost, but we often use this method naturally, this has nothing to do with a person's industry, unless this person's RP is super powerful, haha ......
After talking about a bunch of other things, traversal and modification seem to be a conflict. A data structure that can efficiently insert and delete elements is usually not the best in traversal performance. Therefore, JCF implements two customized data structures based on the user's goals: hash tables (including hashset and hashmap) and balanced binary trees (including treeset and treemap ). Because sortedset and sortedmap are unique requirements, they are subinterfaces of abstractset and abstractmap, treeset and treemap are their implementations respectively. People familiar with data structures may be familiar with it. It is very fast to insert, delete, and search hash tables. the time complexity of hash tables is constant level O (1 ); although the insertion and deletion operations of balanced binary trees are troublesome (at the cost of O (log n), The traversal and sorting operations are fast. The choice lies in the user's focus. However, due to the convenience of type conversion, we usually use a hash table to construct a set and then convert it into a corresponding tree set for traversal, to achieve better results.

Set set1 = new hashset ();
Set1.add (elem1); // construct a set by inserting Elements
Set1.add (elem2 );
Set1.add (elem3 );
Set set2 = new treeset (SET );
Iterator all = set2.iterator ();
While (all. hasnext ())
{// Traverse the set
All. Next ();
...
}
 
2. Historical implementation vs. New Implementation
The historical implementation (legacy implementations) is a term in JCF. The exact meaning is not very clear, but it can be considered in Java 2 (JDK 1.2) A prototype framework of JCF appeared in earlier versions. After Java 2, JCF began to improve and become robust. In the new implementation, some new classes were used to replace members in earlier versions. However, for various reasons, in the old version, many classes represent the essence of the traditional data structure and some security reasons, so they are still used by us.

Enumeration vs. iterator
Enumeration is a traditional set traversal tool. iterator is used in the new JCF. iterator also has the traversal function and contains a remove () method to delete the current element.

Dictionary vs. Map
Dictionary is a class that has been marked as deprecated. It implements the ing function in earlier versions and is now completely replaced by map. The difference between them is: dictionary does not mean that keys and values cannot be null, but Map allows null keywords and values, which directly affects their descendants: hashtable and hashmap.

Vector vs. arraylist
Vector and arraylist are the manifestations of arrays in JCF. Do you still remember the shortcomings of arrays mentioned earlier? Vector and arraylist are an array that can be dynamically increased. The main difference between a vector and an arraylist is that a vector is a synchronous set (or thread-safe), but the arraylist is not synchronized. Because synchronization requires a certain cost, therefore, arraylist seems to be more efficient than vector access. We will also talk about synchronization.

Hashtable vs. hashmap
Hashtable is a subclass of dictionary and belongs to the historical implementation. hashmap is a subclass of map and a new implementation. In addition to whether the key and value mentioned above can be empty, there is also a synchronization difference between them. hashtable is synchronous, but hashmap is not. But do not look down on hashtable because it is an "old-generation". We often use it as a famous subclass of properties.

3. Synchronize vs. Do not synchronize
From the above description, we seem to be able to draw the impression that historical implementations seem to be synchronous, but they are not in the new implementations. The reason for synchronous operations is that multiple threads may operate on the same set: for example, a thread is traversing a set, but at the same time, when another thread inserts or deletes the set, the traversal result of the first thread is unpredictable. For a synchronization set, it throws a concurrentmodificationexception, JCF regards this mechanism as "Fail-fast ". By comparing the source code of vector and arraylist, we can find that many methods of vector are modified with the synchronized keyword, but the arraylist does not.

4. Easy-to-Forget tools: collections and Arrays
In Figure 1, there are two classes in the lower right corner: collections (note, not collection !) And arrays, which are powerful tools in JCF, are often ignored by beginners. According to the JCF documentation, these two classes provide wrapper implementations, data structure algorithms, and array-related applications.
Surely you will not forget the classic algorithms mentioned above, such as "half-lookup" and "sorting, the collections class provides a variety of static methods to help us easily complete these annoying tasks in the data structure class:

Binarysearch: semi-query.
Sort: sorting, which is similar to a fast sorting method. The efficiency is still O (N * log n), but it is a stable sorting method.
Reverse: this is a classic question of the previous data structure!
Rotate: "Rotate" the linear table with an element as the axis "?? Wow, this feature is so cool!
Swap: swap the positions of two elements in a linear table.
......

Another important feature of collections is wrapper, which provides some methods to convert a set into a special set:

Unmodifiablexxx: convert to a read-only set. Here XXX represents six basic set interfaces: Collection, list, MAP, set, sortedmap, and sortedset. If you insert or delete a read-only set, an unsupportedoperationexception is thrown.
Synchronizedxxx: converts to a synchronization set.
Singleton: Creates a set with only one element. Here, Singleton generates a single element set, singletonlist, and singletonmap to generate a list and map of a single element respectively.
Empty set: expressed by the static attributes empty_set, empty_list, and empty_map of collections.

In addition, we know that the toarray () method of collection can be used to convert a set to an object array, we can also easily convert an object array into a linear table (do not tell me that you are adding one by one): arrays. aslist ().

5. Generic
One of the important features of JCF we know at present is that all objects added to the set will lose their own characteristics on the surface, and it seems to be just an object, unless you forcibly convert the data type to their original objects. This is quite natural. The collection, the object container, holds a variety of objects, not just some specific type of objects. After the emergence of j2se 5.0, JCF began to introduce the generic feature. For example, we often encounter such an application, which is to convert the set into a specific array. Although the collection has a toarray () method, unfortunately, all the elements in this array are of the object type. We usually use a for loop to forcibly convert each element in the array, although feasible, however, it looks clumsy. With generics, We can pre-specify the expected type, and then we can get the expected array with toarray, all the elements in it are of the specified type. I am not very familiar with 5.0. For details, refer to the JCF document of j2se 5.0.

Summary

I am here to take a look at some of the main concepts of JCF here. Java veterans may be bored, and new users may feel more like reviewing the data structure of high school mathematics classes and universities, haha. This is just a small example. It can be seen that basic knowledge is quite instructive for practical applications. Masters read mathematics and think it is a very beautiful and artistic thing. The West always distinguishes mathematics from other natural sciences, and thinks it is closer to philosophy, people like me who are still struggling to find a job all day can't do it, sigh ......

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.