Linq distinct is not enough !, Linqdistinct
Problem cause: In practice, a problem occurs. Set deduplication is required. The reference type is stored in the set and deduplication is performed based on the id. At this time, the distinct of linq is not enough. For the reference type, it directly compares the address. The test data is as follows:
class Person { public int ID { get; set; } public string Name { get; set; } } List<Person> list = new List<Person>() { new Person(){ID=1,Name="name1"}, new Person(){ID=1,Name="name1"}, new Person(){ID=2,Name="name2"}, new Person(){ID=3,Name="name3"} };
We need to deduplicate according to the Person ID. Of course, there is still a way to achieve this if you do not use linq Distinct. You can use GroupBy to split the group and then retrieve the first data. For example:
list.GroupBy(x => x.ID).Select(x => x.FirstOrDefault()).ToList()
It is also possible to implement it through GroupBy. After all, the operation in the memory is still very fast. But here we will implement it in other ways and find the best implementation method.
1. IEqualityComparer Interface
The extended method Distinct of IEnumerable <T> is defined as follows:
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source);public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer);
We can see that the Distinct method has an overload parameter IEqualityComparer <T>. The interface is defined as follows:
// Type parameter T: Type of the object to be compared. Public interface IEqualityComparer <T> {bool Equals (T x, T y); int GetHashCode (T obj );}
By implementing this interface, we can implement our own comparator and define our own comparison rules.
Here is a problem. The T of IEqualityComparer <T> is the type of the object to be compared. Here it is Person. How can we obtain the property id of Person? Or, for any type, How do I know which attribute to compare? The answer is:Delegate. Through delegation, the attribute to be compared is specified externally. This is also the design of the linq extension method. parameters are of the delegate type, that is, the rules are defined externally and only called internally. OK. Let's look at the final implementation code:
// The same is true if you inherit the EqualityComparer class. Class CustomerEqualityComparer <T, V>: IEqualityComparer <T> {private IEqualityComparer <V> comparer; private Func <T, V> selector; public CustomerEqualityComparer (Func <T, V> selector): this (selector, EqualityComparer <V>. default) {} public mermerequalitycomparer (Func <T, V> selector, IEqualityComparer <V> comparer) {this. comparer = comparer; this. selector = selector;} public bool Equals (T x, T y) {return this. comparer. equals (this. selector (x), this. selector (y);} public int GetHashCode (T obj) {return this. comparer. getHashCode (this. selector (obj ));}}
(Supplement 1) I didn't post the extension method before, and some friends mentioned the case-insensitive problem of comparing strings (in fact, there are two constructors above to solve this problem ). The extension method can be written as follows:
Static class EnumerableExtention {public static IEnumerable <TSource> Distinct <TSource, TKey> (this IEnumerable <TSource> source, Func <TSource, TKey> selector) {return source. distinct (new CustomerEqualityComparer <TSource, TKey> (selector);} // The last parameter above 4.0 can be written as the default parameter EqualityComparer <T>. default. The two extensions Distinct can be combined into one. Public static IEnumerable <TSource> Distinct <TSource, TKey> (this IEnumerable <TSource> source, Func <TSource, TKey> selector, IEqualityComparer <TKey> comparer) {return source. distinct (new CustomerEqualityComparer <TSource, TKey> (selector, comparer ));}}
For example, to ignore case-sensitivity comparison based on the Person Name, you can write it as follows:
List. Distinct (x => x. Name, StringComparer. CurrentCultureIgnoreCase). ToList (); // StringComparer implements the IEqualityComaparer <string> Interface
Ii. Use a hash table.The disadvantage of the first approach is not only to define new extension methods, but also to define a new class. Is there only one extension method? Yes, it can be done through Dictionary (HashSet is used when HashSet is available ). The implementation method is as follows:
public static IEnumerable<TSource> Distinct<TSource,TKey>(this IEnumerable<TSource> source, Func<TSource,TKey> selector) { Dictionary<TKey, TSource> dic = new Dictionary<TKey, TSource>(); foreach (var s in source) { TKey key = selector(s); if (!dic.ContainsKey(key)) dic.Add(key, s); } return dic.Select(x => x.Value); }
3. Override the object method.Can I skip the extension method? Yes. We know that an object is a base class of all types. There are two virtual Methods: Equals and GetHashCode. By default ,. net compares objects by using these two methods. Is the Distinct without parameters determined by these two methods? We use the override method in Person and implement our own comparison rules. When breakpoint debugging is performed, it is found that the Distinct method will enter the two methods. The Code is as follows:
class Person{ public int ID { get; set; } public string Name { get; set; } public override bool Equals(object obj) { Person p = obj as Person; return this.ID.Equals(p.ID); } public override int GetHashCode() { return this.ID.GetHashCode(); }}
In my needs, it is de-duplicated by id, so the third method provides the most elegant implementation. In other cases, the preceding method is more common.