A set is composed of independent data items with common features. Through a set, we can use the same call code to process all elements of a set, instead of processing each item separately .. . Net collections such as (System. array class and System. collections namespace) array, list, queue, stack, hash table, dictionary, and even (System. data) DataSet, able, and the generic version (System. collections. generic and System. collections. objectModel), the set of valid thread security operations introduced in 4.0 (System. collections. concurrent ).
In the face of so many collections, do you understand the advantages of each set and which set is used in a specific scenario? This article attempts to discuss this issue in general and does not involve in-depth investigation of the memory data structure, hoping to bring you some benefits.
Set Interface
Before discussing various collections, let's first discuss the commonalities of the collection and the inheritance levels of the entire collection system.
The ICollection interface is the base interface of classes in the System. Collections namespace, and the corresponding ICollection <T> is the base interface of all generic version sets. All collection classes inherit from them directly or indirectly.
ICollection inherits IEnumerable to provide convenient enumeration functions, but it is worth noting that ICollection provides thread security control for Synchronous access:
IsSynchronized: gets a value indicating whether to synchronize access to ICollection (thread safety ).
SyncRoot: Get the object that can be used to synchronize access to ICollection.
For example, we can perform thread-safe access to the set through the following methods, but some sets provide the Synchronized Method to encapsulate the thread-safe set.
Copy codeThe Code is as follows: ICollection myCollection = someCollection;
Lock (myCollection. SyncRoot)
{
// Insert your code here.
}
However, the set is not thread-safe by default. To perform scalable and efficient multi-threaded access to a collection, use a class in the System. Collections. Concurrent namespace.
Unlike non-generic versions, the set of generic versions not only implements generic interfaces, but also implements non-generic interfaces. For example, ICollection <T> implements IEnumerable and IEnumerable <T>, but generic sets do not provide thread security control for Synchronous access, that is, synchronous access to generic sets, we must synchronize data or use System. collections. A class in the Concurrent namespace.
In addition, IList and IDictionary inherit from ICollection and IList implementers (such as Array, ArrayList, or List <T>) each element of the ICollection implementer (such as Queue, ConcurrentQueue <T>, Stack, ConcurrentStack <T>, or consumer list <T>) is a value, each element of the IDictionary implementer (such as the Hashtable and SortedList classes, Dictionary <TKey, TValue>, and SortedList <TKey, TValue> generic classes) is a key-value pair.
Next, we will discuss and compare some common sets.
Array
Array is not part of System. Collections, But it inherits from the IList interface .. Net Array can have multi-dimensional arrays, staggered arrays, or even a one-dimensional Array whose lower limit is not 0. By default, we recommend that you use a one-dimensional Array whose lower limit is 0. This commonly used Array is optimized, the highest performance.
And System. different from Collections, Array has a fixed capacity. To increase the capacity, you must create a new Array object with the required capacity and copy the elements in the old Array object to the new object, delete the old Array. When the collection under System. Collections reaches the current capacity, the capacity can be automatically expanded: the memory is re-allocated, and elements are copied from the old collection to the new collection. This reduces the Code required to use the set, but the performance of the set may still be negatively affected. Therefore, we should set the initial capacity to the estimated size of the set to avoid poor performance caused by multiple reallocation.
Collection classes under System. Collections
This type of set has the sorting function and most of them are indexed. The system can automatically handle memory management and increase the capacity as needed.
ArrayList and List <T>: List <T> are generic versions of ArrayList. They are accessed Based on indexes like Array, and each data item stores only one data value, however, they provide more powerful functions and operations than Array, making them easier to use. In terms of performance, generic versions always take precedence over non-generic versions, unless the Member type is the object type, because generic versions do not require the boxing and unpacking operations; if you do not need to reallocate the set capacity, the performance of List <T> is very similar to that of arrays of the same type. In addition, ArrayList can easily create a synchronization version, but the synchronization of Array and List <T> must be completed by yourself.
Hashtable and Dictionary set types: each item in these sets is a key-value pair. Dictionary <Tkey, Tvalue> is a generic version of Hashtable. A Hashtable object is composed of buckets that contain collection elements. Each bucket is associated with a hash code generated using the element key based on the hash function and contains multiple elements. Therefore, this type of set is faster than most other sets in searching and retrieving data. The same Dictionary <Tkey, Tvalue> always has better performance than Hashtable. Therefore, we recommend that you use ConcurrentDictionary <TKey, TValue> class for multi-thread synchronization.
Sorted set type: System. collections. sortedList class, System. collections. generic. sortedList <TKey, TValue> generic class and System. collections. generic. sortedDictionary <TKey, TValue> generic classes, both of which implement the IDictionary interface. The two generic classes also implement the System. collections. generic. IDictionary <TKey, TValue>, similar to Hashtable, each element is a key-value pair, but they maintain elements in key-based sorting order without the O (1) of the hash table) insert and search features. Non-generic enumeration items are DictionaryEntry objects, while two generic types return KeyValuePair <TKey, TValue> objects. Their most important point is that they are sorted according to the implementation of System. Collections. IComparer or System. Collections. Generic. IComparer <T>. SortedList allows us to access through indexes and keys, while SortedDictionary can only access through keys, and SortedList saves more memory.
Queue and stack: I will not discuss more about it. If you want to temporarily store data, you can use this type of set if you only give up after one access. The difference between a queue and a stack is that the access sequence is different. I believe everyone knows. They also have their own generic and thread-safe versions: System. collections. queue class, System. collections. generic. queue <T> class and System. collections. concurrent. concurrentQueue <T>, System. collections. stack class and System. collections. generic. stack <T> and System. collections. concurrent. concurrentStack <T>.
Set: the two types of HashSet <T> and SortedSet <T> of this type Set all implement the ISet <T> interface. The Set is the closest to the Set in mathematics. It is used to perform Set operations in mathematics, such as Union and intersection operations. Hashset <T> is not sorted and cannot contain duplicate elements. It can be regarded as a version of the Dictionary <TKey, TValue> that does not contain values. It provides high-performance Set operations based on the hash key. SortedSet <T> provides a Set of sorted Set operations. Here we need to mention that some sets also provide the Set operation extension method and the Set operation also provided by LINQ, but they all return the new IEnumerable <T> Set, the Set operation of a Set modifies the current Set and provides a larger and more reliable Set of operations.
This is not all about the. net set. It also has a bit set and a dedicated set.
Bit Set
Each element of the table is a identifier rather than an object. BitVector32 and BitArray are available.
BitVector32 is a structure that can only store 32-bit data. It can be used to store bit identifiers or small integers. It is a value type, so the performance is better.
BitArray is a reference type, and its capacity is always the same as the count. You can use the Length attribute to allocate or delete elements.
Dedicated set
NameValueCollection is based on NameObjectCollectionBase; however, NameValueCollection accepts one-click multi-value while NameObjectCollectionBase only accepts one-click value.
Some strong Collections in the System. Collections. Specialized namespace include StringCollection and StringDictionary, which all contain a set of string values and dictionaries.
The CollectionsUtil class provides a series of static methods that can be used to create instances of case-insensitive Hashtable or SortedList sets.
Some sets can be converted. For example, if the HybridDictionary class is initially a ListDictionary class, it becomes Hashtable after it is increased.
In addition, KeyedCollection <TKey, TItem> is a hybrid type between lists and dictionaries. It provides a method to store objects containing their own keys. When the number of elements reaches the specified threshold, it can also create a search dictionary.
ListDictionary: Implements IDictionary using a one-way link list. It is recommended that a set of less than 10 items be included. When there are few data items, it provides better performance than Hashtable.
LINQ to Objects
The System. Collections. IEnumerable or System. Collections. Generic. IEnumerable <T> interface object in the memory can be accessed using the LINQ query.
It provides a common data access mode. Compared with the standard foreach loop, it is generally more concise and readable. It provides powerful filtering, sorting, and grouping functions.
How to choose
We must first make it clear that if a generic version exists, it is preferred.
Before selecting, please confirm several questions:
Do you need sequential access?
What is the sequence of first-in-first-out or later-in-first-out and random access?
Is index-based access or key-based access?
Is it only a value or a key-Value Pair form?
Is it one-to-one or one-to-many?
Are duplicates allowed?
Does it matter whether it is saved in the order of entry or the order needs to be sorted according to certain rules?
Does it require faster retrieval and access?