C # explanations and differences between HashSet and SortedSet in programming

Source: Internet
Author: User
Tags foreach data structures hash

What is a set in C?

A set containing non-repeating elements is called "set )".. NET4 contains two sets (HashSet <T> and SortedSet <T>), both of which implement the ISet <T> interface. hashSet <T> is an unordered list containing non-repeating elements. SortedSet <T> contains an ordered list of non-repeating elements.

The ISet <T> interface provides methods to create a collection or intersection, or to provide information about the superset or subset of another set.

Case:

// Use HashSet: repeated elements are automatically removed but not sorted.

Var set = new HashSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9 }; foreach (var item in set) {Console. writeLine (item);} Console. readKey ();


In the same code, replace HashSet with SortedSet:

// Use SortedSet: repeated elements are automatically removed and sorted.

V

Ar set = new SortedSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9 }; foreach (var item in set) {Console. writeLine (item);} Console. readKey ();


Summary:

1. hashSet and SortedSet are used to calculate the intersection, union, and difference set of two sets. set contains a group of elements that do not repeat and have no feature order. the former will not be automatically sorted, and the latter will be automatically sorted after adding elements

2. Neither of them can access an element from a specific location.

3. You can use its search function:

Set. Contains ("value") // return true or false

4. Operations on the set:

A. Parse ricexceptwith: contains only the object or the elements in the specified set (but cannot contain the elements in both). Remove the intersection and the remaining two set elements.

B. UnionWith: contains the object itself and all elements existing in the set. Union

C. Remove all elements in the specified set from the current HashSet <T> object. Difference set

D. IntersectWith: contains only the elements in the object and the specified set. Intersection

5. SortedSet object. You can call the GetViewBetween, Max, and Min methods.

6. In addition to SortedSet, the System. Collections. Generic namespace also provides two classes: SortedDictionary and SortedList.

 
Some built-in methods for testing HashSet:

Using System; using System. collections. generic; using System. linq; using System. text; using System. threading. tasks; namespace {class Program {static void Main (string [] args) {HashSet <char> setA = new HashSet <char> (); hashSet <char> setB = new HashSet <char> (); setA. add ('A'); setA. add ('B'); setA. add ('C'); setB. add ('C'); setB. add ('D'); setB. add ('e'); Show ("Initial content of setA:", setA); Show ("Initial content of setB:", setB); setA. symmetricExceptWith (setB); // columns Show ("setA after each Ric difference with setB:", setA); setA. unionWith (setB); // list all the elements of setA and setB (union) Show ("setA after union with setB:", setA); setA. exceptWith (setB); // remove the setB element in setA from Show ("setA after subtracting setB:", setA); Console. writeLine (); Console. read ();} static void Show (string msg, HashSet <char> set) {Console. write (msg); foreach (char ch in set) Console. write (ch + ""); Console. writeLine ();}}}


Method for testing SortedSet:

Using System; using System. collections. generic; using System. linq; // This is required to call the using System for the Max () and Min () methods. text; using System. threading. tasks; namespace set {class Program {static void Main (string [] args) {var set = new SortedSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9}; foreach (int element in set) Console. writeLine (string. format ("{0}", element); Console. writeLine ("Max:" + set. max (); Console. writeLi Ne ("Min:" + set. Min (); Console. Write ("<br> 2 ~ Value between 5: "); // value range: 2 ~ Element var subSet = set. getViewBetween (2, 5); foreach (int I in subSet) {Console. write (I + "");} Console. writeLine (); Console. read ();} static void Show (string msg, HashSet <char> set) {Console. write (msg); foreach (char ch in set) Console. write (ch + ""); Console. writeLine ();}}}

Generic classes of HashSet and SortedSet sets

Microsoft added a HashSet class in. NET 3.5 and a SortedSet class in. NET 4. This article introduces their features and compares their similarities and differences.

. The HashSet and SortedSet generic classes of the NET Collection function library all implement System. collections. generic. ISet interface; however, Java versions earlier than 1.2 (or earlier) have provided classes with the same name to implement these two data structures [10], there is also a more rigorous TreeSet (the items stored in it, and the connection types must be consistent. There were no generics ).


Set refers to a Set. Its mathematical definition is that there is no specific sequence for storing elements in it and duplication is not allowed. Let's take a look at the following examples of HashSet and SortedSet:

Var set = new HashSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9 }; foreach (int element in set) Response. write (string. format ("{0}", element ));


Execution result:


Figure 1 repeated elements are automatically removed


In the same code, change HashSet to SortedSet as follows:

Var set = new SortedSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9 }; foreach (int element in set) Response. write (string. format ("{0}", element ));


Execution result:


Figure 2 duplicate elements are automatically removed and sorted internally


We can see that the HashSet and SortedSet classes are indeed not allowed to be repeated, but the former will not be automatically sorted, and the latter will sort the added elements, sortedSet can still maintain the data arrangement order when inserting, deleting, and searching elements. Therefore, if you have the patience to read this article, you will learn one more trick: If you want to filter repeated elements during normal programming, you can use these two Set classes, because Set does not allow repeated elements. Not found in SortedSet. in the NET 3.5 era, we must use HashSet to remove duplicate items and then sort them. NET 4, we can use SortedSet to remove duplicates and sort them step by step.

Of course, if you have different requirements (performance issues are not considered for the moment, as mentioned in the next section), you can use the Sort method of the List class instead of automatically sorting by default or removing duplicate elements. In addition, the List structure is ordered in data storage, and the SortedSet class is also ordered, but the HashTable structure and the storage of the HastSet class are unordered. In addition, you can use HashSet as the HastTable for key/value pairs. Only keys do not have the values structure, because the HastSet class implements the HashTable data structure with only the key, therefore, in addition to excellent performance, the storage of elements is unordered and repeat is not allowed (the key must be unique ).

In addition, Set has no mathematical limit on the number of elements,. the Set in. NET is limited by the maximum memory available for variables [14], although.. NET HashSet object capacity can be automatically increased with the addition of elements [2].

------------------------------------------------------------------------

The following lists the features and performance comparison of the HashSet and SortedSet classes on the. NET platform respectively:


Features of HastSet[11]:

It implements the HashTable [6], [12], [13] with only keys but no values in the data structure.
Its Contains method (determining whether the HashSet object Contains the specified element) is fast, because it is a hash-based lookup ).
(The Time Complexity of the HashTable class is close to O (1) because the HashTable class implements a hash table)
Its Add method (Add the specified element to the HashSet object). If the number of elements is smaller than the capacity of the internal array, the operation complexity of this method is O (1 ). If you must adjust the size of the HashSet <T> object, the calculation complexity of this method is O (n ). [4]
Its Add method. If an existing item is added, it is ignored and False is returned.
It is a disordered container, that is, the storage of elements is unordered (this is the Set feature in Mathematics) [2], [6], [12], [14]. The HashSet of Java is also true.
It cannot store duplicate elements, and will be automatically ignored when the inserted elements are duplicated.
You cannot access an element from a specific location.


Features of SortedSet[6]:

It implements the "Red-Black tree" [6], [8], and [13] in the data structure.
Its Contains method (determining whether the SortedSet object Contains the specified element) is fast, because it is hash-based lookup [6] (I am not sure about this, to be verified ).
Its Add method (adding the specified element to the SortedSet object). If the number of elements is smaller than the capacity of the internal array, the operation complexity of this method is O (1 ). If you must adjust the size of the SortedSet <T> object, the complexity of this method is O (n ). [3]
Its Add method. If an existing item is added, it is ignored and False is returned.
The elements it stores are ordered, although it is also called Set [1], [6]. Java SortedSet is also true.
It cannot store duplicate elements, and will be automatically ignored when the inserted elements are duplicated.
You cannot access an element from a specific location.

 
For more information, see [15:

In data structure theory, the HashTable implemented by the HashSet class is a fast "insert" and "search" structure, regardless of the number of items, its "insert and search" time is close to the constant time, that is, O (1). Therefore, it is suitable for a large number of elements. However, because HashTable evolved from an Array, it has poor scalability and will cause low performance when its space is full. Therefore, let's look at the Add and Contains methods of HastSet members in the msdn document [4]. Microsoft says their computing complexity is O (1). But as mentioned in the Add method, if you must adjust the size of the HashSet object, the calculation complexity of this method will be an O (n), where n is the number of elements. However, if we can predict the number of items in advance during programming without having to browse the content in sequence, the HashTable structure and HashSet class will be a very suitable and high-performance choice.

In the data structure theory, the "red/black tree" implemented by the SortedSet class is the most efficient binary tree for storing data in memory during execution. However, its "insert" operation is relatively slow, and its "delete" operation is also complicated (after deleting a node, you must re-establish the correct structure of the red/black tree ). However, the time for the red/black tree to sort data does not exceed O (n ). However, in the msdn document [3], the Add and Contains methods of SortedSet members are mentioned. Microsoft says their computing complexity is O (1 ), however, if the SortedSet class is actually a "red/black tree", it is doubtful that msdn is correct [8].

------------------------------------------------------------------------

Here are some examples of HashSet in. NET.:


Example 1: Test the search function:

Var set = new HashSet <char> ("I love programming"); Response. write (set. contains ('me'); // TrueResponse. write (set. contains ('you'); // False


In the preceding example, we can pass a string or even text to the HashSet <char> constructor because string implements the IEnumerable <char> interface, the HastSet class also implements IEnumerable <T>.


Example 2: Test the built-in usage of HashSet:

SymmetricExceptWith: contains only the objects or elements in the specified set (but cannot contain both ).
UnionWith: contains the object and all elements in the specified set.
Inclutwith: removes all elements in the specified set from the current HashSet <T> object.
IntersectWith: contains only the object and elements in the specified set.

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/--> using System; using System. collections. generic; class HashSetDemo {static void Main () {HashSet <char> setA = new HashSet <char> (); HashSet <char> setB = new HashSet <char> (); setA. add ('A'); setA. add ('B'); setA. add ('C'); setB. add ('C'); setB. add ('D'); setB. add ('e'); Show ("Initial content of setA:", setA); Show ("Initial content of setB:", setB); setA. symmetricExceptWith (setB); // columns Show ("setA after each Ric difference with setB:", setA); setA. unionWith (setB); // list all the elements of setA and setB (union) Show ("setA after union with setB:", setA); setA. exceptWith (setB); // remove the setB element in setA from Show ("setA after subtracting setB:", setA); Console. writeLine (); Console. read ();} static void Show (string msg, HashSet <char> set) {Console. write (msg); foreach (char ch in set) Console. write (ch + ""); Console. writeLine ();}}


Execution result:




Figure 3 test the slow ricexceptwith, UnionWith, and slow twith methods

SetA. IntersectWith (setB); // list the setB elements in setA. Show ("setA after intersect with setB:", setA );


Execution result:




Figure 4 test the IntersectWith method


Because HastSet <T> implements the IEnumerable <T> interface, we can use any other set as a parameter and pass it into the operation methods of other set classes.


In addition, LINQ has set operations similar to the preceding example, such as Intersect, Except T, Union, and Distinct. If you are interested in comparing the two features, refer to the msdn or online article [5]. The main difference is that the LINQ set operation always returns a new IEnumerable <T> set, while the HashSet <T> is used to modify the current set, and the HashSet provides many set operators.

------------------------------------------------------------------------


Arrived. the SortedSet class created only in NET 4. In addition to some useful methods such as javasricexceptwith, UnionWith, javastwith, and IntersectWith owned by the aforementioned HashSet class, there are also "GetViewBetween (set scope) "," Max (maximum value) "," Min (minimum value) ", and other new easy-to-use methods.


Here is an example of the three SortedSet methods:


Example 3-test the GetViewBetween, Max, and Min methods:

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/--> using System; using System. collections. generic; using System. linq; // required reference for the Max () and Min () methods var set = new SortedSet <int> () {5, 9, 2, 1, 2, 2, 3, 7, 4, 9, 9}; foreach (int element in set) Response. write (string. format ("{0}", element); Response. write ("<p>"); Response. write ("Max:" + set. max () + "<br>"); Resp Onse. Write ("Min:" + set. Min () + "<br>"); Response. Write ("<br> 2 ~ Value between 5: <br> "); // valid value range: 2 ~ Element var subSet = set. GetViewBetween (2, 5); foreach (int I in subSet) {Response. Write (I + ",");}


Execution result:



Figure 5 test the exclusive GetViewBetween, Max, and Min methods of the SortedSet class


This GetViewBetween () method is also suitable for processing strings and characters as elements in SortedSort.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.