"Do you choose the right data structure every time?" "-Jeffery Zhao
. NET Face Question series catalogue
Icollection<t> inherits Ienumerable<t>. On the basis of this, we can modify the contents of the collection by adding Add,remove and other methods. Ienumerable<t> 's direct successors are stack<t> and queue<t>.
All standard generic collections implement Icollection<t>. There are several main inheritance classes that have ilist<t>,idictionary<k,t>,linkedlist<t>.
Note that,stack<t> and queue<t> do not inherit Icollection<t>, which is because Icollection<t> has add,remove and other methods, and stacks and queues are not allowed to add deleted elements.
Stack<t>
When you need to use a last-in-first-out order (LIFO) data structure. NET provides us with a stack<t>. The Stack<t> class provides push and Pop methods to enable access to the stack<t>.
The elements stored in the stack<t> can be represented by a vertical set of images. When the new element is pressed into the stack (Push), the new element is placed at the top of all other elements. When a stack (pop) is needed, the element is removed from the top.
The default capacity of the stack<t> is 10. and queue<t> the initial capacity similar to,stack<t> can also be specified in the constructor. The capacity of the stack<t> can be automatically expanded (doubled) according to the actual use, and the capacity can be reduced by the TrimExcess method.
The most basic of the stack is to add data items to the stack and delete data items from the stack. The Push (stack) operation is to add data items to the stack. The data item is taken out of the stack by a Pop (out of the stack) operation. The data for each push into the stack is at the top of the stack. Pop only takes data from the top of the stack.
Another basic operation of the stack is to look at the data items at the top of the stack. The POP operation returns the data item at the top of the stack, but this action also removes the data item from the stack. If you just want to see the data item at the top of the stack instead of actually removing it, there is an operation called Peek (Fetch) in the C # language. Of course, this operation may take other names (such as Top) in other languages and implementations.
If the number of elements in the stack<t> count is less than its capacity, the complexity of the push operation is O (1). If the capacity needs to be expanded, the complexity of the push operation becomes O (n), because you need to move the existing elements to make room for the new elements. The complexity of the pop operation is always O (1).
The implementation of a stack itself is relatively simple, can be stored with the help of list<t>.
Stack<t> Application Example: Testing a palindrome string
A palindrome is a string that is exactly the same as the forward and backward spellings. For example, "Dad", "Madam" and "sees" are palindrome, and "Hello" is not a palindrome. One way to check if a string is a palindrome is to use a stack. The general algorithm is a character-by-string read, and each character is pressed onto the stack as it is read. This produces the effect of storing strings in reverse.
The next step is to stack each character in the stack, and compare it to the original string from the beginning of the corresponding letter. If at any time two characters are found to be different, then this string is not a palindrome, and this terminates the program. If the comparison is always the same, then this string is a palindrome.
The program implementation is simple and the code is left as an exercise.
Queue<t>
. NET gives us a queue<t> when we need to use a first-in-order (FIFO) data structure. The Queue<t> class provides Enqueue and Dequeue methods to enable access to queue<t>. Another main operation of the queue is to view the starting data item. Just like the corresponding operation in the Stack class, the Peek method is used to view the starting data item. This method simply returns the data item, and does not actually remove the data item from the queue.
Inside the queue<t>, a ring array that holds the T object is built and is pointed to the head and tail of the array through the head and tail variables.
By default,,queue<t> has an initial capacity of 32, or you can specify the capacity through the constructor function.
The Enqueue method will determine if there is sufficient capacity in the queue<t> to hold the new element. If so, add the element directly and increment the index tail. Here tail uses the modulo operation to ensure that the tail does not exceed the array length. If the capacity is insufficient, the queue<t> expands the array capacity based on the specific growth factor.
By default, the value of the growth factor (growth factor) is 2.0, so the length of the internal array is incremented by one times. You can also specify the growth factor through the constructor. The capacity of the queue<t> can also be reduced by the TrimExcess method.
The Dequeue method returns the current element based on the head index, then points the head index to null and then increments the head value.
The way the queue is implemented and the way the stack is implemented is similar.
To implement a queue with priority, you only need to add a priority attribute to the queue itself, and you must specify a priority when you enqueue. When the team is out, the queue is traversed along the priority level, with the highest level and the top-ranked members being moved out of the queue.
Ilist<t>
Ilist<t> is all about positioning: It provides an indexer , InsertAt and RemoveAt (same as Add,remove, but can specify a location), and indexof.
Note that C # has no list, only ilist,ilist<t> and list<t>. Where the third inherits the second one. The first is a non-generic version of the second. ArrayList inherits the first one.
The most common implementation of the ilist<t> data structure is list<t>. But it is not a linked list. Its internal implementation is an array. The data structure implemented by the linked list is linkedlist<t>.
List<t>
In most cases, this is the default list selection. The internal list<t> is implemented by arrays. It is the difference between the array and the indefinite length, but they are all type-safe. So if you don't know the length of the collection, you can choose List<t>.
Insert: O (N)
Delete: O (N)
Access specific members by indexer: O (1)
Find: O (N)
Array
The array keyword is basically not used, and usually we declare an array with a type and []. Although it looks awkward, Array actually inherit from Ilist<t> . the advantage of arrays over list<t> is that they don't waste space (if you know the length beforehand).
There is no difference between the two declarative methods. In the compiler's opinion, the types A and B are all system.int32[].
New int [+]; int New int [+]; Console.WriteLine (A.gettype ()); Console.readkey ();
When declaring an array, the length must be given, so the initialization of the array is very fast. The time complexity of the array is exactly the same as the list<t>.
Insert: O (N)
Delete: O (N)
Access by indexer: O (1)
Find: O (N)
Linkedlist<t>
This is a data structure that is implemented internally using a doubly linked list . Note that this class inherits from icollection<t> , and did not achieve ilist<t> , so you cannot access the linked list through the indexer. this is usually done when there are very many insertions and deletions that are performed at the ends of the tail, but with very few access operations. (for example, indexers are not required). If insert deletion is always done in the middle, the performance and array of the list is similar.
In a linked list (Linked list), each element points to the next element, which forms a chain (chain).
When creating a linked list, we only have to hold the head node reference, so that all nodes can be found by traversing the next node one by one.
The list has the same lookup time O (N) as the array. Similarly, the progressive time of removing a node from a linked list is also linear O (n) . Because we still need to iterate from head to find the node that needs to be deleted before we delete it. The delete operation itself becomes simple, which means that the next pointer to the left node of the deleted node points to its right node.
The progressive time of inserting a new node into the list depends on whether the linked list is ordered. If the list does not need to be kept in order, then the insert operation is a constant time O (1), and a new node can be added to the head of the list. If you need to maintain the sequential structure of the list, you need to find the location where the new node is inserted, which makes it necessary to traverse through the head of the linked list one by one, and the result is that the operation becomes O (N).
Two-way linked list linkedlist<t>:
Insert: O (1) (at the head end), O (N) (in other locations)
Delete: O (1) (at the head end), O (N) (in other locations)
Access by indexer: no indexer (because ilist<t> is not implemented )
Find: O (N)
On the algorithm surface of the linked list is a variety of questions, to achieve a one-way or two-way list, and realize their several main functions, is an excellent programming practice.
Idictionary<k,t> and Dictionary<k,t>
The Hashtable class is a loosely-coupled type of data structure that developers can specify as either a Key or an Item. When. NET introduces generic support, the type-safe dictionary<k,t> class appears. dictionary<k,t> use strong typing to limit Key and Item , when creating dictionary<k,t> instance, you must specify the Key and Item of the type.
The dictionary stores key-value pairs, and depends on the value of the key to find the corresponding value directly. Find, insert, delete speed O (1). The implementation of the dictionary has been said before, it and the hash table implementation principle is different, but its biggest advantage is the generics.
Sortedlist<k,t> and Sorteddictionary<k,t>
Sortedlist<k,t> is essentially a continuously maintained array that is maintained to be sorted at all times.
Sorteddictionary<k,t> is an orderly red-black tree at any time, and the difference between Sortedlist<tkey and tvalue> is in memory usage, as well as the speed of insertions and deletions:
- Less memory is used than Sorteddictionary<tkey, Tvalue>,sortedlist<tkey, tvalue>. Because SortedDictionary is a tree, when you create a new member, you assign a tree node on the heap.
- Assuming that there are many unsorted elements to be inserted into each of these two classes, the Sorteddictionary<tkey, tvalue>, is faster because its average speed is O (log n). Sortedlist<tkey, tvalue> is only quick when the insertion takes place in the head, and if the element is not sorted, we cannot expect the insertion to always occur in the head, for example, the insertion generally occurs in the middle, and at this point the velocity is O (n).
- Assuming that there are many sorted elements to be inserted into each of these two classes, then Sortedlist<tkey, tvalue> 's insertion speed is always O (1), apparently faster than Sorteddictionary<tkey, tvalue>.
Both of these data structures expose their keys and values using a separate collection. But SortedList exposes a collection of keys and values that are implemented by ILIST<T>, so you can use the sort key indexer to access entries efficiently.
sortedlist<stringstringnew sortedlist<stringstring>() ; Books. ADD ("Aladdin""[email protected]"); books["Aladdin" "haha_new";
Iset<t>
This is an interface used to simulate a collection in mathematics. It provides a set of various operations (whether subset, intersection, and, complement, etc.). The members of the collection are unique and do not appear more than once.
Hashset<t> and Sortedset<t>
The former is a dictionary that does not contain values, and the latter is a sorteddictionary<tkey with no value, tvalue>.
Derived classes for ienumerable<t>: summary
|
Access mode |
Inherit from |
Characteristics |
Ienumerable<t> |
by ElementAt |
No |
All generic collections inherit from this interface There are non-generic versions Provide traversal (via GetEnumerator) LINQ basis, many LINQ commands are his extension methods |
Icollection<t> |
by ElementAt |
Ienumerable<t> |
All generic collections inherit from this interface There are non-generic versions Provide the Count method Provides functions such as add, remove, insert, etc. Provides conversion to IQueryable method |
Linkedlist<t> |
No index, through the Find method |
Icollection<t> |
List implemented internally using a linked list Do not inherit from ilist<t> No indexer |
Dictionary<t, k> |
Key-value pairs |
Idictionary<t> |
Generic version of Hashtable |
Ilist<t> |
Indexer |
Icollection<t> |
Partial generic collection Inherits this interface Provide an indexer |
List<t> |
Indexer |
Ilist<t> |
Inherit the ilist<t> (and other interfaces) Generic version of ArrayList Most common collections of generics If you don't need a strong feature, consider using ienumerable<t> instead as the return type |
Iqueryable<t> |
by indexof |
Ienumerable<t> |
Get filtered data from the far end, and ienumerable<t> different,iqueryable<t> return all data before filtering Differences can be seen through SQL Profiler |
Note: There are also several important derived classes such as the concurrent type, which are put into multithreaded synchronization.
How to choose a data structure
Choosing the right data structure in different situations will improve the performance of your program. In the interview, if you are in the data structure this piece of fluent, will let the interviewer think you are a solid foundation, time to the program performance consciousness, and pay attention to the details of people, because most people are not very important to this piece. Of course, the data structure in addition to the C # implementation of these, there are a variety of trees and graphs, but in the non-algorithmic engineer interview, those content will not appear.
Linear tables and linked lists (most used objects):
- Array (t[]): When the number of elements is fixed and the indexer needs to be used.
- Linked list (linkedlist<t>): When the number of elements is not fixed and there is a large number of actions added to the tail of the list. Otherwise, use list<t>.
- Resizable array list (list<t>): When the number of elements is not fixed and the indexer needs to be used.
Stacks and queues (only considered when simulating stacks and queues):
- Stack (stack<t>): When you need to implement LIFO (last on first out).
- Queue (queue<t>): When a FIFO (first in first out) is required.
Hash (requires large-scale lookup):
- Hash table (dictionary<k,t>): When you need to use key-value pairs (Key-value) to quickly add and find, and the elements do not have a specific order. with a generic version of the dictionary, we almost never need to use a non-generic HashTable .
- tree-based Dictionary (sorteddictionary<k,t>): When you need to use key-value pairs (Key-value) to quickly add and find, and the elements always need to be based on key to sort.
Collection (holds a unique set of values/simulated set operations):
- Hash table based set (hashset<t>): When you need to save a unique set of values, and the elements do not have a specific order.
- Tree based Set (sortedset<t>): When you need to save a unique set of values, and the elements always need to be sorted.
Time complexity of common data structure operation
These time complexities are not difficult to understand and can easily be inferred, rather than rote.
Reference: http://www.cnblogs.com/gaochundong/p/data_structures_and_asymptotic_analysis.html
http://blog.csdn.net/suifcd/article/details/42869341
Data Structure |
Add |
Find |
Delete |
Getbyindex |
Array (t[]) |
O (N) |
O (n) (compare one by one) |
O (N) |
O (1) |
Linked list (linkedlist<t>) |
The word is O (1), the other place is O (n) |
O (n) (per-node lookup) |
The word is O (1), the other place is O (n) |
No indexer |
List<t> (same as Array) |
O (N) |
O (N) |
O (N) |
O (1) |
Stack (stack<t>) |
O (1) |
Only access to the top of the stack |
O (1) Can only be removed from the top of the stack |
No indexer |
Queue (queue<t>) |
O (1) |
Only access to the team header |
O (1) Can only be removed from the end of the team |
No indexer |
Dictionary<k,t> |
O (1) (Generally, it can take a little more time if there is a hash conflict) |
O (1) |
O (1) |
No indexer |
tree-based Dictionary (sorteddictionary<k,t>) |
O (log n) (because you want to maintain the sort, so the insertion is slow) |
O (log n) |
O (log n) (because the order is maintained, so the deletion is slow) |
No indexer |
Hash table based set (hashset<t>) HashSet is a dictionary with no value, so the complexity is exactly the same as the dictionary |
O (1) |
O (1) |
O (1) |
No indexer |
Tree based set (sortedset<t>) SortedSet is a sorteddictionary with no value, so the complexity is exactly the same as it |
O (log n) |
O (log n) |
O (log n) |
No indexer |
IEnumerable: summary
- IEnumerable and its generic versions are the basis for all collections. It gives the ability to set iterations. An iteration is an operation that takes an element out of the head of a collection until it is all taken. Iterations cannot be reversed, only forward. IEnumerable is the implementation of the iterator pattern.
- Often, the elements taken out of the iteration are called iterator.
- To implement the IEnumerable interface, it must implement its only method GetEnumerator.
- The GetEnumerator method returns an output of type IEnumerator. The IEnumerator type is another interface, so we'll also write a class that inherits the IEnumerator interface (which implements its 2 methods), creates a new instance of the class, and passes in an array (as the source of the iteration) as the return value of the method GetEnumerator.
- The IEnumerator interface has a current property, and we need to implement its Get method to return the present iterator.
- We need to add a value of type int to the IEnumerator type to record the current position. The initial value of this type is-1. The reset method of the IEnumerator type sets this value to-1. The reset method is usually not implemented, which is to prevent multiple iterations.
- The MoveNext method of the IEnumerator interface increments the position by one and returns whether the next element is still there.
- The implementation of GetEnumerator can be simplified by the yield simplification method. Yield is essentially a state machine that returns a completely new object each time.
- Using foreach in C # will implicitly call the MoveNext method. You can learn the whole process of foreach operation by looking at IL.
- ienumerable<t> is the entire LINQ the basis. The entire LINQ is based on the ienumerable<t> extension method. C # Most of the data structures are implemented ienumerable<t>.
- The derived classes of IEnumerable are not generally considered for use because they do not have a generic type.
- The implementation of dictionaries, hashset, and hash Tables (Hashtable) is very different.
- HashSet is a dictionary that does not contain a value. Because the collection must guarantee the uniqueness of the element, it is appropriate to use a dictionary that does not contain a value. The hash is always a weapon when encountering an array-checking problem: https://www.zhihu.com/question/31201024
- One of the most important derived classes of ienumerable<t> is the Ilist<t> interface. It also has two main derived classes, array and list<t>. The internal implementation of list<t> is an array rather than a linked list. Linkedlist<t> is the C # link list implementation. Linkedlist<t> does not implement the Ilist<t> interface.
- Arrays are only considered when the number of collection elements is known and invariant.
- The advantage of a linked list is that it does not require the entire table to be shifted backwards or forwards when inserting deletes. The doubly linked list ensures that insertions are removed at the end of the speed and as fast as in the head.
- Consider using linkedlist<t> instead of list<t> when the collection element is unknown and there is often an insert or delete action.
. NET surface question series [one]-derived classes of ienumerable<t>