C # You may not know the pitfalls of the IEnumerable Interface Sample code in detail:
IEnumerable the importance of the interface of the enumerator, said 10,000 words are not excessive. Almost all collections implement this interface, and the core of LINQ relies on this universal interface. The C language for loop is very annoying, and foreach is a lot smoother.
I like this interface, but also encountered a lot of doubts in the use, you are not also with me the same confusion:
(1) What is the difference between IEnumerable and IEnumerator?
(2) Whether the enumeration can cross-border access, what are the consequences of cross-border access? Why can't I change the value of a collection in an enumeration?
(3) What is the specific implementation of LINQ, such as Skip, which skips some elements, then these elements are accessed ?
(4) What is the nature of IEnumerable?
(5) Does the IEnumerable enumeration form a closure ? Will multiple enumeration processes interfere with each other? Can I dynamically change the elements of an enumeration in an enumeration?
....
If interested, let's move on to the following content.
Before we begin, our article stipulates that enumerations are IEnumerable, iterations are IEnumerator, and already instantiated (such as ToList ()) are collections.
1. IEnumerable and IEnumerator
IEnumerable has only one abstract method: GetEnumerator (), and IEnumerator is an iterator that really implements the function of accessing the collection. IEnumerator has only one current property, MoveNext and reset two methods.
There's a small problem, just one accessor interface. Why do you want two interfaces that look easy to confuse? One is called an enumerator, and the other is an iterator. Because
(1) Implement IEnumerator is a dirty live dirty, in vain add two methods a property, and these two methods are actually not good implementation (mentioned later).
(2) It is not easy to maintain the initial state, know how to MoveNext, how to end, and return to the previous state of the iteration .
(3) Iterations are clearly non-thread-safe, and each time IEnumerable generates a new IEnumerator, creating multiple iterative processes that do not affect each other. The iteration collection cannot be modified during the iteration, otherwise it is unsafe.
So as long as you implement the IEnumerable, the compiler will help us achieve IEnumerator. Moreover, most of the cases are inherited from existing collections, and generally do not require rewriting the MoveNext and reset methods. IEnumerable of course there is a generic implementation, which does not affect the discussion of the problem.
IEnumerable reminds us of a one-way list, where a pointer field in C is needed to hold the information for the next node, so who helped save this information in IEnumerable? Does this process consume memory? Is it accounted for in the program area or the heap area?
However, IEnumerable also has its shortcomings, it can not back, can't jump (only one to jump over), and the implementation of reset is not easy to achieve index access. Think of it, if it is an enumeration process of an instance collection, it is possible to return directly to the No. 0 element, but if the IEnumerable is a lengthy access chain, it is difficult to find the original root! So the author of the CLR via C # tells you that many of the implementations of reset are simply lies, knowing that this is the right thing to do and don't rely too much on it.
2. Is there a difference between foreach and MoveNext?
The biggest feature of IEnumerable is that the process of access is handed over to the visitor itself. In the C language, the control of the array is externally fully mastered. This interface is a process that is internally encapsulated for access, further enhancing encapsulation. such as the following:
public class people //defines a simple entity class {public string Name {get; set;} public int Age {get; set;} } public class Personlist { private readonly list<people> peoples; Public personlist () //For convenience, the construction process inserts the element { peoples = new list<people> (); for (int i = 0; i < 5; i++) { peoples. ADD (New people {Name = "P" + I, age = + i});} } public int oldage =; Public ienumerable<people> olderpeoples { get { foreach} (People people in _people) { if (people. Age > Oldage) yield return people; } Yield break;}} }
The essence of IEnumerable is the state machine, which is somewhat similar to the concept of an event, which throws the implementation out and implements the traversal between the code (think about interstellar traversal), which is the basis of LINQ. A cool iterator, is it really as simple as we think?
In the C language, arrays are arrays, real memory space, so what does IEnumerable mean? If it is implemented by a real collection (such as a list), then no problem, but also real memory, but if it is the above example? The filter returns the yield return only returns the element, but probably does not exist this actual collection, if you put the simple enumerator's yield return back to look, will find actually a set of switch-case, the compiler in the background for us to do a lot of work.
The generated new iterator, if not MoveNext, is actually empty, which is why? Why does an iterator not directly point to the head element?
(Thanks to the answer: Just like the head pointer of the C-one-way list, you can specify an enumeration that does not contain any elements, which is more convenient to program.)
Each time foreach moves forward one cell, it stops at the head. Wait, are you sure it'll stop when it's over? Let's do an experiment:
Public ienumerable<people> Peoples1 //Return the collection directly { get {return peoples;} } Public ienumerable<people> Peoples2 //contains yield break; { get { foreach (var people in peoples) { yield return people; } Yield break; In fact, this does not have to be possible } }
The above two, is our common way, pay attention to the second realization, resharper the yield break is marked gray (repeat).
We write the following test code, peoplelist collection only five elements, but try to MoveNext 8 times. You can change the peoplelist.peoples1 to 2, 3, and test separately.
var peoplelist = new Peoplelist (); The internal constructor inserts five elements ienumerator<people> e1 = PeopleList.Peoples1.GetEnumerator (); if (E1. current = = null) { Console.WriteLine ("Current is empty after iterator generation"); } int i = 0; while (i<8) //A total of only five elements, see what happens to the iteration { e1. MoveNext (); if (E1. current = = null) { Console.WriteLine ("Empty after iteration {0}", i); } else { Console.WriteLine {1} after iteration {0}, i,e1. current.name); } i++; }
PeopleEnumerable1 (return directly to the collection) After the iterator has been generated, the current is empty after the No. 0 iteration of the P0 iteration after the 1th time for the P1 iteration after the 2nd time for the P2 iteration after the 3rd time for the P3 iteration after the 4th for the P4 iteration after the 5th is empty after the 6th is empty for the iteration 7th after the time//peopleenumerable2 (without yield break) after the iterator is generated, the current is empty after the No. 0 iteration of the P0 iteration after the 1th time for the P1 iteration after the 2nd time for the P2 iteration after the 3rd of the P3 iteration after the 4th for the P4 iteration after the 5th for the P4 iteration after the 6th for the P4 iteration after 7th after the after the time, and then p4// PeopleEnumerable2 (plus yield Break) After the iterator is generated, the current is empty after the No. 0 iteration of the P0 iteration after the 1th time for the P1 iteration after the 2nd time for the P2 iteration after the 3rd of the P3 iteration for the P4 iteration after the 4th for P4 iterations after the P4 iteration 5th after the 6th time for the P4 to enumerate the test results
It's amazing to go back to the original collection and return null after crossing the bounds, but if the result is MoveNext, whether there is a yield break or not, the last element is returned after the cross -over iteration! Maybe that's what we mentioned in the 1th verse, the iterator returns only the last state, because it can't be moved back, so it repeats itself, so why doesn't the list collection do that? The question is left to everyone.
(Thanks to the answer: cross-border enumeration is null or the last element of the problem, in fact, there is no explicit provisions, specifically.) NET implementation, in the. NET Framework, is still the last element after a cross-border.
But you crossing. Although it is reassuring to note that enumerations can be enumerated when the standard enumeration of foreach is complete, this illustrates the difference in implementations between MoveNext and foreach two, and it is clear that foreach is more secure . Also note that you cannot implement the Try-catch code block in the yield process, why? Because the yield pattern combines code and logic from different locations, how can you add Try-catch to each referenced block of code? It's too complicated.
The characteristics of the enumeration is very helpful when dealing with big data, because of its state, a very large file, I just read a part of each time, I can read it sequentially, until the end of the file, because there is no need to instantiate the collection, memory consumption is very low. The same is true for databases, which can handle a lot of difficult situations each time they are read.
3. Modify the enumerator parameters in the enumeration?
In the enumeration process, the collection cannot be modified, such as in a Foreach loop, if an element is inserted or deleted, a run-time exception is definitely reported. An experienced programmer tells you to use a For loop at this time. What is the essential difference between for and foreach?
In MoveNext, I suddenly changed the parameters of the enumeration, so that its data is more or less variable, and what happens?
Console.WriteLine ("Do not modify Oldage parameters"); foreach (Var olderpeople in peoplelist.olderpeoples) { Console.WriteLine (olderpeople); } Console.WriteLine ("Modified oldage parameter"); i = 0; foreach (Var olderpeople in peoplelist.olderpeoples) { Console.WriteLine (olderpeople); i++; if (i ==1) peoplelist.oldage =; After enumerating only once, modify the value of Oldage }
The test results are:
The Oldage parameter is not modified id:2,namep2,age32id:3,namep3,age33id:4,namep4,age34 the Oldage parameter is modified Id:2,namep2,age32id:4,namep4,age34
As you can see, the value of the control enumeration is modified during enumeration to dynamically change the behavior of the enumeration . Above is the case of changing variables in a yield structure, let's try again in the case of iterators and lambda Expressions (code slightly) and get the result:
Modify the value of a variable in an iteration id:2,namep2,age32id:4,namep4,age34 the variable value in a lambda expression id:2,namep2,age32id:4,namep4,age34
It can be seen that externally modified variables can control the internal iterative process and dynamically change the "elements of the set". This is a good thing, because its behavior is true and bad: in the iterative process, changes in the value of the variable, contextual context changes, but if still in the context of the previous processing, it will obviously lead to a big mistake. It's okay with the closures.
Therefore, if an enumeration needs to maintain its original behavior if the context changes, it is necessary to manually save a copy of the variable.
If you put two sets of a, a, and a-B with the concat function, that is, a-a, and do not instantiate, then in the phase of enumeration A, modify the elements of the set, will be an error? why not?
For example, the following test code:
List<people> peoples=new list<people> () {New people () {Name = "PA"}}; Console.WriteLine ("Connect a virtual enumeration A to the collection B and modify the elements of set B in the enumeration a phase"); var E8 = PeopleList.PeopleEnumerable1.Concat (peoples); i = 0; foreach (var people in E8) { Console.WriteLine (people); i++; if (i = = 1) peoples. ADD (New People () {Name = "PB"}); The PeopleEnumerable1 phase is also enumerated at this time
}
If you want to know, you can do your own experiment (in my attachment, there is this example). Leave it to everyone to discuss.
4. More discussion of LINQ
You can insert any code in yield, which is the performance of delay (lazy), which only executes when it needs to be executed. It's not hard to imagine how many of the LINQ functions are implemented, and it's interesting to include Concat, which connects two sets together, like this:
public static ienumerable<t> concat<t> (this ienumerable<t> source, ienumerable<t> Source2) { foreach (var r in Source) { yield return r; } foreach (Var r in Source2) { yield return r; } }
and select, where all is well implemented, is not discussed.
How is skip implemented? It skips a subset of the elements in the collection, I guess:
public static ienumerable<t> skip<t> (this ienumerable<t> source, int count) { int T = 0;
foreach (Var r in Source) { t++; if (T<=count) continue; Yield return r; } }
So, has the skipped element ever been visited? Has its code been executed?
Console.WriteLine ("Will the elements of skip be accessed?") "); ienumerable<people> e6 = PeopleList.PeopleEnumerable1.Select (d = = { Console.WriteLine (d); return D; }). Skip (3); Console.WriteLine ("Enumerate only, do nothing:"); foreach (Var r in E6) {} Console.WriteLine ("Convert to entity collection, enumerate again"); ienumerable<people> E7 = e6. ToList (); foreach (Var r in E7) {}
The test results are as follows:
Enumerate only, do nothing: Id:0,namep0,age30id:1,namep1,age31id:2,namep2,age32id:3,namep3,age33id:4,namep4,age34 convert to entity collection, enumerate IDs again : 0,namep0,age30id:1,namep1,age31id:2,namep2,age32id:3,namep3,age33id:4,namep4,age34
As you can see,skip skips, but still "accesses" the element , and therefore performs additional operations, such as a lambda expression, whether it is an enumerator or an entity collection. In this perspective, to optimize the expression, skip and take in LINQ should be as early as possible to reduce the additional side effects.
But for LINQ to SQL implementations, it's clear that skip is an extra optimization. Can we also optimize the implementation of skip, so that the upper layer as far as possible to improve the skip performance of the massive data?
5. More questions about the IEnumerable enumeration
(1) How is the enumeration process paused? Is there a pause for that? How do I cancel?
(2) What is the implementation principle of PLINQ? What is the characteristic of the IEnumerable interface that it changes? Does it produce a random enumeration? How does this sort of enumeration actually happen?
(3) IEnumerable implements the chain structure, which is the basis of LINQ, but what is the nature of this chain?
(4) Because IEnumerable represents state and delay, it is not difficult to understand that many asynchronous operations are essentially IEnumerable. I had an interview when I asked the essence of asynchrony, what do you mean by the nature of asynchrony? Async is not multithreaded! The exciting of async is essentially the re-grouping of code, because the long-running asynchronous operation is the state machine ... such as the CCR library. Here is not ready to expand, because temporarily over the author's knowledge Reserve, next time.
(5) If you use C to implement the same enumerator, the same cool LINQ, can not be achieved by the compiler? Let's not mention the stem of lambda, we use the function pointer.
(6) IEnumerable write MapReduce? Linq for MapReduce?
(7) IEnumerable how to sort? Instantiate a collection and reorder it? If it's a very large virtual collection, how can I optimize it?