Reprint: Java Collection class data structure analysis

Source: Internet
Author: User

Arrays are the most commonly used data structures. The array is characterized by a fixed length, subscript index, and the same type of all elements. An array of common scenarios are: Reading an employee's information from a database is stored as employeedetail[], converting a string into a byte array for easy manipulation and processing, and so on. Try to encapsulate the array in a class to prevent the data from being messed up by the wrong operation. In addition, this is also suitable for other data structures.

lists and arrays are similar, except that their size can be changed. Lists are typically implemented by a fixed-size array, and are automatically resized when needed. The list can contain duplicate elements. Common scenarios include adding a new line of items to the order list, moving all expired items out of the product list, and so on. Typically, the list is initialized to a suitable size to reduce the number of times it is resized.

A collection is similar to a list, but it cannot put duplicate elements. You can use collections when you need to store different elements.

The stack allows only the last inserted element to be manipulated (that is, LIFO first, Out–lifo). If you remove an element from the top of the stack, you can manipulate the second-to-last element, and so on. This last-in-first-out approach is achieved through the mandatory limitations of the only peek (), push (), and Pop () methods. This structure is very useful in many scenarios, such as parsing a mathematical expression such as 4+2, the methods and exceptions in the source code in the order they appear on the stack, check your code to see if the parentheses and braces are not matched, and so on.

This mechanism of last-in-first-out (LIFO), which is implemented with stacks, is very useful in many places. For example, expression evaluation and parsing, validating and parsing XML, undo action in a text editor, browsing history in a browser, and so on. Here are some Java-side questions about stacks.

The queue and stack are somewhat similar, except that the first inserted element in the queue is also the first deleted element (that is, FIFO). This FIFO structure is achieved by providing only peek (), offer () and poll () methods to access the data to limit it. For example, waiting queues for buses, banks, or supermarkets, etc., can be expressed in a queue. Examples of thread blocking are typical;

A linked list is a data structure that consists of multiple nodes, and each node contains data and a reference to the next node, and a reference to the previous node in the bidirectional list. For example, you can implement stacks and queues with unidirectional and doubly linked lists, because both ends of the list can be inserted and deleted. Of course, there are also scenes in which nodes are frequently inserted and deleted in the middle of the list. Apache's class library provides a treelist implementation, which is a good alternative to the list because it takes up only a bit of memory, but the performance is much better than the list. That is to say, the list is not a good choice from this point of view.

Here are a few examples of the collection classes and data structures in Java:

ArrayList is a good implementation of the list. Compared to Treelist, ArrayList is much faster than treelist in the case of inserting or deleting elements in the middle of a list. The implementation of Treelist is the use of a tree-shaped structure internally to ensure that all insertions and deletions are of an O (log n) complexity. This implementation allows treelist to perform much better than ArrayList and LinkedList when inserting and deleting elements frequently.

HashMap access time is nearly stable, it is a key-value pair mapping data structure. This data structure is implemented by an array. It uses the hash function to locate the element, and the collision detection algorithm is used to deal with the value of the hash to the same position. For example, saving an employee's information can be used as key with the employee's ID, and the attribute-attribute value read from the properties file can be saved with the key/value pair, and so on. HashMap at initialization time, given a suitable size can reduce the number of resizing.

A tree is a data structure composed of nodes, each of which contains data elements, and has one or more child nodes, each of which points to a parent node (the Translator notes: In addition to the root node) can represent hierarchical relationships or order relationships of data elements. Common scenarios are the hierarchy of employees in an organization, the hierarchical relationships of XML elements, and so on. If the tree has a maximum of two leaf nodes per child node, then this tree is called a binary tree. Binary tree is a very common tree structure, because of its structure makes the insertion and deletion of nodes are very efficient. The edge of the tree represents the shortcut path from one node to another.

Java does not directly provide the implementation of the tree, but it is easy to implement in the following way. You just need to create a node object that contains a arraylist that points to the leaf node.

Package Bigo;import Java.util.arraylist;import Java.util.list;public class Node {    private String name;    Private list<node> children = new arraylist<node> ();    Private Node parent;    Public Node getParent () {        return parent;    }    public void SetParent (Node parent) {        this.parent = parent;    }    Public Node (String name) {        this.name = name;    }    public void  AddChild (Node child) {        children.add (child);    }    public void RemoveChild (Node child) {        children.remove (child);    }    Public String toString () {        return name;    }}

As long as the relationship between the data elements can be expressed as nodes and edges of the network structure, it can be represented by the graph . A tree is a special diagram in which all nodes can have only one parent node. Unlike a tree, the shape of a graph is determined by the actual problem or abstract of the problem. For example, the nodes (or vertices) in the graph can represent different cities, while the edges of the graphs can represent routes between two cities. To construct a diagram in Java, you need to solve the problem of how data is saved and accessed. The data structure mentioned above is also used in the diagram. The Java API does not provide a diagram implementation. But there are a lot of third-party libraries available, such as JUNG,JGRAPHT, and JDSL;

Q: What do you know about the Big O symbol, can you cite some examples according to different data structures?

A: The large O symbol can represent the efficiency of an algorithm, and it can also be used to describe the performance of the algorithm in the worst case scenario when the data element increases. The large o symbol can also be used to measure performance, such as memory consumption. Sometimes you might choose a slower algorithm to reduce memory usage. The large o symbol can represent the performance of the program in the case of large amounts of data. However, the only practical way to measure the performance of a program over a large amount of data is to use a larger data set for performance benchmarks, which can include things that are not considered in the Big O complexity analysis, such as when the system will change pages when virtual memory is used more often. Although the benchmark is more practical than the result of the large o symbol, it is not suitable for the design phase, so large o complexity analysis is the most appropriate choice at this time.

The performance of various data structures in search, insert, and delete algorithms can be represented in the following way: Constant complexity O (1), Linear complexity O (n), logarithmic complexity O (log n), exponential complexity O (c^n), polynomial complexity O (n^c), square complexity O (n^2), and factorial complexity O ( n!), in which N refers to the number of elements in the data structure. Performance and memory consumption can be weighed against each other. Here are some examples.

Example 1: The time complexity of finding an element in HashMap is constant, that is, O (1). This is because the lookup element uses a hash function, and the time to calculate a hash value is unaffected by the number of elements in the HashMap.

Example 2: a linear search for an array, a list, and a chain table are all complex linear, that is, O (n), which is required to traverse the entire list when searching. That is, if a list is twice times the length of the original, then the search takes twice times the same amount of time.

Example 3: a sorting algorithm that needs to compare all the elements in an array is polynomial, that is, O (n^2). This is because the complexity of a nested for loop is O (n^2). There is such an example in the search algorithm.

Example 4: binary Search The complexity of an array or array list is logarithmic, which is O (log n). The complexity of querying a node in a list is usually O (n). Compared to the performance of O (log n) for array lists and arrays, the complexity of O (n) in the list is poor as the number of elements increases. The time complexity of the logarithm is that if the 10 elements spend time in x units, the 100 elements take up to 2x units of time, while 10,000 elements take up to 4x units of time. If you draw a graphic on a planar coordinate, you will find that the time increase is no faster than n (the number of elements).

Q: What are the differences in performance between HashMap and TreeMap? Which one do you prefer to use?

A: the performance of a balance tree is O (logn). The TreeMap in Java uses a red-black tree to keep the key/value sort. The red-black tree is a balanced binary tree. Ensure the balance of the binary tree, so that the insertion, deletion and lookup are relatively fast, time complexity is O (log n). However, it does not hashmap fast, hashmap time complexity is O (1), but the advantage of TreeMap is that it is in the order of the key values, which provides some other useful functions.

Q: How to choose which one to use?

A: The use of unordered hashset and HashMap, or the use of ordered TreeSet and TreeMap, depends largely on your actual usage scenarios, and partly on the size of the data and the operating environment. One of the more practical reasons is that if both inserts and updates are frequent, then ensuring that the elements are ordered can improve the performance of fast and frequent lookups. If the requirements for a sort operation, such as a batch program that produces a report partner, are not very frequent, store the data in an unordered fashion and then use Collections.sort (...) when sorting is required. Can be more efficient than storing them in an orderly fashion. This is an optional way, and no one can give you a definitive answer. Even if the theory of complexity, such as O (n), is set up in cases where n is sufficiently large. As long as n is small enough, even an O (n) algorithm can be more efficient than the O (log n) algorithm. In addition, an algorithm may be faster on an AMD processor than on an Intel processor. If your system has a swap area, you should also consider the performance of the disk. The only way to determine the performance test is to test and measure the performance and memory usage of the program with the right size data. Testing both of these metrics on the hardware you choose is the most appropriate approach.

Q: How do you trade out an unordered array or an ordered array?

A: The greatest advantage of an ordered array is that when the n is larger, the time it takes to search for an element (log n) is much less than the time required by the unordered Group of O (N). The disadvantage of an ordered array is that the insertion time overhead is large (usually O (n)) because all values that are larger than the inserted element are moved backwards. The insertion time cost of an unordered array is constant time, that is, the speed of the insertion is independent of the number of elements. The following code snippet shows the insertion of elements into an ordered array and an unordered array.

Q: What are the best practices for the Java collection framework?

A: According to the actual use of the selection of the appropriate data structure, such as fixed-size or need to increase the size, there are repeating elements or not, need to keep order or not need, traverse is forward or bidirectional, the insertion is at the end or any position, more inserts or more read , whether parallel access is required, whether modifications are allowed, if the element type is the same or different, and so on. In addition, you need to consider factors such as multithreading, atomicity, memory usage, and performance as early as possible.

Do not assume that the number of elements in your collection will remain small, and it may grow over time. Therefore, your collection is best able to be given a suitable size.

Programming for interfaces is better than programming for implementations. For example, LinkedList may be the best choice in some cases, but later ArrayList may become more appropriate for performance reasons

Reprint: Java Collection class data structure analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.