In. net, the iterator mode is encapsulated by ienumerator, ienumerable, and their corresponding generic interfaces. If a class implements the ienumerable interface, it can be iterated. Calling the getenumerator method will return the implementation of the ienumerator interface, which is the iterator itself. The iterator is similar to a cursor in a database. It is a location record in a data sequence. The iterator can only move forward. Multiple iterators in the same data sequence can operate on the data at the same time.
The support for the iterator has been built in C #1, that is, the foreach statement. This allows for more direct and simple iteration of the set than for loop statements. The compiler will compile foreach to call the getenumerator, movenext method, and current attribute. If the object implements the idisposable interface, after the iteration is completed, the iterator is released. However, in C #1, implementing an iterator is relatively cumbersome. C #2 makes this work much simpler, saving a lot of effort to implement the iterator.
Next, let's look at how to implement an iterator and how C #2 simplifies the implementation of the iterator, and then list several examples of the iterator in real life.
1. C #1: tedious manual implementation of the iterator
Suppose we need to implement a new set type based on the ring buffer. We will implement the ienumerable interface so that users can easily take advantage of all the elements in the set. We ignore other details and focus only on how to implement the iterator. The Set stores the values in the array. The set can set the start point of the iteration. For example, if the set has five elements and you can set the start point to 2, the iteration output is 2, 3, 4, 0, last 1.
To make a simple presentation, we provide a constructor for setting values and starting points. This allows us to traverse the set in the following way:
object[] values = { "a", "b", "c", "d", "e" };IterationSample collection = new IterationSample(values, 3);foreach (object x in collection){ Console.WriteLine(x);}
Since we set the starting point to 3, the output result of the set is D, E, A, B, and C. Now, let's look at how to implement the iterationsample class iterator:
class IterationSample : IEnumerable{ Object[] values; Int32 startingPoint; public IterationSample(Object[] values, Int32 startingPoint) { this.values = values; this.startingPoint = startingPoint; } public IEnumerator GetEnumerator() { throw new NotImplementedException(); }}
We have not yet implemented the getenumerator method, but how to write the logic of the getenumerator part? First, we need to save the current state of the cursor somewhere. On the one hand, the iterator mode does not return all data at a time, but the client requests only one data at a time. This means that we need to record the record in which the customer's current request is sent to the set. C #2 the compiler has done a lot of work for us to save the iterator state.
Now let's take a look at the States to be saved and where they exist. Imagine we try to save the states in the iterationsample set so that it implements the ienumerator and ienumerable methods. At first glance, it seems possible that, after all, the data is in the correct place, including the starting position. Our getenumerator method only returns this. However, this method has a very important issue. If the getenumerator method is called multiple times, multiple independent iterators will return. For example, we can use two nested foreach statements to obtain all possible value pairs. These two iterations must be independent of each other. This means that the two iterator objects returned each time getenumerator is called must be independent. We can still implement the function directly in the iterationsample class. However, our class has multiple responsibilities, and this class carries a single responsibility principle.
Therefore, we will create another class to implement the iterator itself. We use the internal class in C # To implement this logic. The Code is as follows:
Class iterationsampleenumerator: ienumerator {iterationsample parent; // iteration object #1 int32 position; // current cursor position #2 internal iterationsampleenumerator (iterationsample parent) {This. parent = parent; position =-1; // array element subscript starts from 0. The default current cursor is-1, that is, before the first element, #3} public bool movenext () {If (position! = Parent. values. length) // determine whether the current position is the last one. If it is not the auto-increment of the cursor #4 {position ++;} return position <parent. values. length;} public object current {get {If (position =-1 | position = parent. values. length) // The access from the first and last user is invalid #5 {Throw new invalidoperationexception ();} int32 Index = Position + parent. startingpoint; // when you want to customize the start position #6 Index = index % parent. values. length; return parent. values [Index] ;}} public void reset () {position =-1; // reset the cursor to-1 #7 }}
To implement a simple iterator, You need to manually write so much code: You need to record the original set #1 of the iteration and record the current cursor position #2. When an element is returned, set the iterator position #6 in the array based on the current cursor and the starting position defined in the array. During initialization, the current position is set before the first element #3. When the iterator is called for the first time, movenext needs to be called before the current attribute is called. When auto-increment cursor is used, the current position is judged by the condition #4, so that no element can be returned even when movenext is called for the first time. When resetting the iterator, we restore the current cursor to #7 before the first element.
In addition to combining the current cursor position and the custom start position to return the correct value, the above code is very intuitive. Now, you only need to return the compiled iteration class in the getenumerator method of the iterationsample class:
public IEnumerator GetEnumerator(){ return new IterationSampleEnumerator(this);}
It is worth noting that the above is just a relatively simple example. There are not many States to be tracked, so you don't have to check whether the set has changed during iteration. To implement a simple iterator, we have implemented so much code in C #1. We can use foreach to easily implement the set of ienumerable interfaces provided by the framework. However, when we write our own set to implement iteration, We need to write so much code.
In C #1, about 40 lines of code is required to implement a simple iterator. Now let's look at the improvements made by C #2 to this process.
2. C #2: Use yield statements to simplify Iteration
2.1 introduce the iterator and yield return statements
C #2 makes iteration easier-reduces the amount of code and makes the code more elegant. The following code demonstrates the complete code for implementing the getenumerator method in C #2:
public IEnumerator GetEnumerator(){ for (int index = 0; index < this.values.Length; index++) { yield return values[(index + startingPoint) % values.Length]; }}
A few lines of code can fully implement the functions required by the iterationsampleiterator class. The method looks very common except yield return. This statement tells the compiler that this is not a common method, but an iteration block (yield block) to be executed. It returns an ienumerator object, you can use the iteration block to execute the iteration method and return an ienumerable Implementation type, ienumerator or the corresponding generic type. If the implementation of the non-generic version interface, the yield type returned by the iteration block is of the object type, otherwise the returned is of the corresponding generic type. For example, if the method implements the ienumerable <string> interface, yield returns the string type. Except yield return, normal return statements are not allowed in iteration blocks. All yield return statements in the block must return types compatible with the block's final return type. For example, if the method definition needs to return the ienumeratble <string> type, yield return 1 is not allowed. It should be emphasized that, for iterative blocks, although the methods we write seem to be executed in sequence, we actually let the compiler create a state machine for us. This is the part of the code we wrote in C #1-the caller only needs to return one value each call. Therefore, we need to remember the position in the set when the last return value is returned. When the compiler encounters an iteration block, it creates an internal class that implements the state machine. This class remembers the exact current position of our iterator and local variables, including parameters. This class is a bit similar to the code we wrote previously. It saves all the States that need to be recorded as instance variables. Next let's take a look at the operations that need to be executed in order to implement an iterator:
- It requires some initial states
- When movenext is called, run the code in the getenumerator method to prepare the next data to be returned.
- When the current attribute is called, the value of yielded needs to be returned.
- You need to know when the iteration ends. movenext returns false.
Let's take a look at the execution sequence of the iterator.
2.2 execution process of the iterator
The following code shows the execution process of the iterator. The code output (0, 1, 2,-1) is terminated.
Class program {static readonly string padding = new string ('', 30); static ienumerable <int32> createenumerable () {console. writeline ("{0} createenumerable () method start", padding); For (INT I = 0; I <3; I ++) {console. writeline ("{0} start yield {1}", I); yield return I; console. writeline ("{0} yield ended", padding);} console. writeline ("{0} yielding last value", padding); yield return-1; console. writeline ("{0} create Enumerable () method ends ", padding);} static void main (string [] ARGs) {ienumerable <int32> iterable = createenumerable (); ienumerator <int32> iterator = iterable. getenumerator (); console. writeline ("START iteration"); While (true) {console. writeline ("Call the movenext method ...... "); Boolean result = iterator. movenext (); console. writeline (" {0} returned by the movenext method ", result); If (! Result) {break;} console. writeline ("Get current value ...... "); Console. writeline (" the obtained current value is {0} ", iterator. Current);} console. readkey ();}}
In order to show the details of the iteration, the above Code uses a while loop, usually using foreach. Unlike the previous iteration, we returned the ienumerable object instead of the ienumerator object in this iteration method. Generally, to implement the ienumerable interface, you only need to return the ienumerator object. If you want to return data of some columns from a method, use ienumerable. The following is the output result:
The output result shows the following points:
- The method in createenumerable is called until movenext is called for the first time.
- When movenext is called, all operations have been completed, and no code is executed when the current attribute is returned.
- The Code stops execution after yield return and waits until the next movenext method is called.
- There can be multiple yield return statements in the method.
- After the last yield return is executed, the Code is not terminated. Call movenext and return false to end the method.
This means that you cannot write any code that needs to be executed immediately in the iteration block, for example, parameter verification. If you put the parameter verification in the iteration block, it will not work very well, which is often caused by errors, and this error is not easy to find.
The following describes how to stop iteration and the Special execution method of finally statement blocks.
2.3 Special execution process of the iterator
In a common method, the return statement usually has two functions: one is to return the invocation result of the caller. The second is to terminate the execution of the method. Execute the method in the finally statement before the termination. In the above example, we can see that the yield return statement only exits the method for a short time and continues to be executed when movenext calls it again. Here we have not written finally statement blocks. How to really exit the method and how to execute the finnally statement block when exiting the method. Let's take a look at a simple structure: yield break statement block.
End an iteration with yield break
What we usually do is to make the method have only one exit point. Generally, multiple exit points make the code hard to read, especially when you use try catch finally or other statement blocks to clear resources and handle exceptions. This problem also occurs when you use iterative blocks. However, if you want to exit iteration earlier, you can use yield break to achieve the desired effect. He can terminate the iteration immediately so that false is returned for the next call to movenext.
The following code demonstrates the iteration from 1 to 100, but the iteration stops when the time expires.
Static ienumerable <int32> countwithtimelimit (datetime limit) {try {for (INT I = 1; I <= 100; I ++) {If (datetime. now> = Limit) {yield break;} yield return I ;}} finally {console. writeline ("Stop iteration! "); Console. readkey () ;}} static void main (string [] ARGs) {datetime stop = datetime. now. addseconds (2); foreach (int32 I in countwithtimelimit (STOP) {console. writeline ("Return {0}", I); thread. sleep (300 );}}
Is the output result. It can be seen that the iteration statement ends normally. The yield return statement is the same as the return statement in the normal method. Let's take a look at the finally statement block when and how to execute it.
Execution of finally statement Blocks
Generally, finally statement blocks are executed when method execution exits a specific region. The finally statement block in the iteration block is different from the finally statement block in the normal method. As we can see, the yield Return Statement stops method execution rather than exiting the method. According to this logic, in this case, the statements in the finally statement block are not executed.
However, when the yield break statement is met, the finally statement block is executed, and the return in this common method is the same. Generally, the finally statement is used to release resources in the iteration block, just like the using statement.
The following describes how to execute the finally statement.
The finally statement is always executed no matter the iteration reaches 100 times, the iteration stops due to the time, or an exception is thrown. However, in some cases, we do not want the finally statement block to be executed.
The statement in the iteration block is executed only after movenext is called. If movenext is not used, what will happen if movenext is called several times and then the call is stopped? Please refer to the following code?
Datetime stop = datetime. now. addseconds (2); foreach (int32 I in countwithtimelimit (STOP) {if (I> 3) {console. writeline ("returning ^"); return;} thread. sleep (300 );}
In forech, after the return statement, the Code continues to execute the finally statement block in countwithtimelimit because there is a Finally block in countwithtimelimit. The foreach statement calls the dispose method of the iterator returned by getenumerator. When the dispose method of the iterator that contains the iterator blocks is called before the iteration ends, all finally blocks in the paused code range within the iterator range will be executed in the State opportunity, this is a bit complicated, but the result can be easily explained: only when foreach is used to call iteration, the Finally block in the iteration block will be executed as expected. The above conclusions can be verified using code below:
Ienumerable <int32> iterable = countwithtimelimit (STOP); ienumerator <int32> iterator = iterable. getenumerator (); iterator. movenext (); console. writeline ("{0} returned", iterator. current); iterator. movenext (); console. writeline ("{0} returned", iterator. current); console. readkey ();
The code output is as follows:
We can see that the stop iteration is not printed out. When we manually call the iterator's dispose method, we will see the following results. It is rare to terminate the iterator before the iteration ends, and it is rarely implemented manually instead of using the foreach statement, do not forget to use the using statement outside the iterator to ensure that the dispose method of the iterator can be executed to execute the finally statement block.
Next, let's take a look at some special behaviors of Microsoft's implementation of the iterator:
2.4 special actions in iterator execution
If you use the C #2 compiler to compile the iteration block, and then use ildsam or reflector to view the generated il code, you will find that some nested types are generated by the compiler behind the scenes ). is to use ildsam to view the generated Il, the bottom two lines are two static methods in the code, the blue <countwithtimelimit> D_0 is the class generated by the compiler for US (the angle brackets are only the class name and are irrelevant to the generic type). The Code shows that the class implements those interfaces, and the methods and fields. The structure of the iterator is similar to that implemented manually.
The real code logic is actually executed in the movenext method, where there is a large switch statement. Fortunately, as a developer, there is no need to understand these details, but some iterator execution methods are worth noting:
- Before the movenext method is executed for the first time, the current attribute always returns the default value of the type returned by the iterator. For example, if ienumeratble returns the int32 type, the default initial value is 0. Therefore, the current attribute is returned when the movenext method is called.
- After the movenext method returns false, the current attribute always returns the value of the last iteration.
- The reset method usually throws an exception. In the code beginning in this article, when we manually implement an iterator, We can correctly execute the logic in the reset.
- The Nested classes generated by the compiler implement both the generic and non-generic versions of ienumerator (and the generic and non-generic versions of ienumerable will be implemented as appropriate ).
There is a reason for failing to correctly implement the reset method-the compiler does not know what logic should be used to re-set the iterator. Many people think that there should be no reset method, and many collections do not support it. Therefore, callers should not rely on this method.
There is no harm in implementing other interfaces. The method returns the ienumerable interface, which implements five interfaces (including idisposable). As a developer, you don't have to worry about this. At the same time, implementing the ienumerable and ienumerator interfaces is not common. In order to make the iterator always behave normally, in addition, this can be done only when a set needs to be iterated in the current thread to create a separate nested type.
The behavior of the current attribute is somewhat odd. It saves the last return value of the iterator and prevents garbage collection.
Therefore, the automatically implemented iterator method has some minor defects, but wise developers will not encounter any problems. Using it can save a lot of code, this makes the iterator more widely used than C #1. The following describes how the iterator simplifies the code in actual development.
3. Examples of iterations used in actual development
3.1 iterate the date from the time range
Loop is usually used when time segments are involved. The Code is as follows:
for (DateTime day = timetable.StartDate; day < timetable.EndDate; day=day.AddDays(1)){ ……}
Loop is sometimes not iterative and expressive. In this example, it can be understood as "every day in a time interval", which is exactly the scenario used by foreach. Therefore, if the above loop is written as iteration, the code will be more beautiful:
foreach(DateTime day in timetable.DateRange){ ……}
It takes some time to implement this in C #1.0. C #2.0 is easy. In the timetable class, you only need to add one attribute:
public IEnumerable<DateTime> DateRange{ get { for (DateTime day=StartDate ; day < =EndDate; day=day.AddDays(1)) { yield return day; } } }
It only moves the loop inside the timetable class, but with this change, the encapsulation becomes better. The daterange attribute only traverses each day in the time range and returns one day each time. To make the logic more complex, you only need to change one. This small change greatly enhances the readability of the Code. Next we can consider extending this range to a generic range <t>.
3.2 read each row of the file iteratively
When reading files, we often write such code:
using (TextReader reader=File.OpenText(fileName)){ String line; while((line=reader.ReadLine())!=null) { …… }}
This process involves four steps:
- How to obtain textreader
- Manage the lifecycle of a textreader
- Use textreader. Readline to iterate all rows
- Process rows
This process can be improved from two aspects: You can use the delegate -- you can write an auxiliary method with reader and a proxy as the parameter, and use the proxy method to process each row, close reader, which is often used to show closures and proxies. There is also an improvement that is more elegant and more in line with the LINQ method. In addition to passing the logic as a method parameter, we can use iteration to iterate one line of code at a time, so that we can use the foreach statement. The Code is as follows:
static IEnumerable<String> ReadLines(String fileName){ using (TextReader reader = File.OpenText(fileName)) { String line; while ((line = reader.ReadLine()) != null) { yield return line; } }}
In this way, you can use the following foreach method to read files:
foreach (String line in ReadLines("test.txt")){ Console.WriteLine(line);}
The subject of the method is the same as the previous one. Using yield return, each row read is returned, but it is a little different after the iteration. In the previous operation, open the document, read a row each time, and then close reader at the end of reading. Although "when reading ends" is similar to using in the previous method, this process is more obvious when iteration is used.
This is why foreach calls the dispose method of the iterator after the iteration ends. This operation ensures that the reader can be released. The Using statement block in the iteration method is similar to the try/finally statement block. The finally statement is executed when the file is read or when the dispose method of ienumerator <string> is called is displayed. Sometimes, ienumerator <string> may be returned through Readline (). getenumerator (). If you perform manual iteration without calling the dispose method, a resource leakage occurs. The foreach statement is usually used for iteration, so this problem rarely occurs. But it is still necessary to be aware of this potential problem.
This method encapsulates the first three steps, which may be harsh. It is necessary to encapsulate the lifecycle and method. Now, if we want to read a stream file from the network or we want to use the UTF-8 encoding method, we need to expose the first part to the method caller so that the call signature of the method is roughly as follows:
static IEnumerable<String> ReadLines(TextReader reader)
In this way, there are many bad things. We want to have absolute control over the reader so that the caller can clean up the resources after the end. The problem is that if an error occurs before movenext () is called for the first time, we have no chance to clean up the resources. Ienumerable <string> itself cannot be released. It needs to be cleared if it stores a status. Another problem is that if getenumerator is called twice, we intend to return two independent iterators and then they use the same reader. One way is to change the return type to ienumerator <string>. In this case, foreach cannot be used for iteration, and resources cannot be cleared if the movenext method is not executed.
Fortunately, there is one way to solve the above problems. Just as the Code does not need to be executed immediately, and we do not need reader to be executed immediately. We can provide an interface to implement "if you need a textreader, we can provide it ". There is a proxy in. Net 3.5 and the signature is as follows:
public delegate TResult Func<TResult>()
The proxy does not have any parameters. The return value is of the same type as the type parameter. To obtain the textreader object, you can use func <textreader>. The Code is as follows:
using (TextReader reader=provider()){ String line; while ((line=reader.ReadLine())!=null) { yield return line; } }
3.3 Use iterative blocks and iteration conditions to filter the set with inertia
LINQ allows you to query multiple data sources, such as a memory set or database, in a simple and powerful way. Although C #2 does not integrate Query expressions, lambda expressions, and extension methods. However, we can achieve similar results.
A core feature of LINQ is the ability to use the where method to filter data. A set and filtering condition proxy are provided, and the filtered results are matched by inertia during iteration. Each matching filtering condition returns a result. This is a bit like the list <t>. findall method, but LINQ supports evaluate the inertia of all objects that implement the ienumerable <t> interface. Although C #3 and later support LINQ, we can also use existing knowledge to implement the where Statement of LINQ to a certain extent. The Code is as follows:
public static IEnumerable<T> Where<T>(IEnumerable<T> source, Predicate<T> predicate){ if (source == null || predicate == null) throw new ArgumentNullException(); return WhereImpl(source, predicate);}private static IEnumerable<T> WhereImpl<T>(IEnumerable<T> source, Predicate<T> predicate){ foreach (T item in source) { if (predicate(item)) yield return item; }}IEnumerable<String> lines = ReadLines("FakeLinq.cs");Predicate<String> predicate = delegate(String line){ return line.StartsWith("using");};
In the above Code, we divide the entire implementation into two parts: parameter verification and specific logic. Although it looks strange, it is necessary for error handling. If you put these two methods in one method, if you call where <string> (null, null), no problem will occur. At least the expected exception is not thrown. This is because of the inertia evaluation mechanism of iteration blocks. Before the movenext method is called for the first time during user iteration, the code in the method body is not executed, as shown in section 2.2. If you are eager to judge the parameters of the method, nothing can delay the exception, which makes tracking of bugs difficult. The standard practice is the code above. The method is divided into two parts: one is to verify the parameters as the common method, and the other is to use iterative blocks to perform inert processing on the main logical data.
The subject of the iteration block is intuitive. You can use the predict proxy method to determine the elements one by one in the set. If the conditions are met, the result is returned. If the condition is not met, the next iteration is performed until the condition is met. It is very difficult to implement this logic in C #1, especially to implement its generic version.
The code below demonstrates reading data using the previous Readline method, then using our where method to filter and obtain rows starting with using in line, and using file. readalllines and array. the biggest difference between findall <string> Implementation of this logic is that our method is completely inert and streamlined (streaming ). Each time, only one row is requested and processed in the memory. Of course, if the file is small, there is no difference, but if the file is large, such as the log file on G, the advantages of this method will be apparent.
4. Summary
C # indirect implementation of many design patterns makes it easy to implement these patterns. It has fewer features than direct implementation for a specific design pattern. From the foreach code, we can see that C #1 provides direct support for the iterator mode, but does not effectively support the set of iteration. Implementing a correct ienumerable for a set is time-consuming and error-prone and boring. In C #2, the compiler has done a lot of work for us and implemented a state machine for iteration.
This article also shows a function similar to that of LINQ: Filter collections. Ienumerable <t> is one of the most important interfaces in LINQ. If you want to implement your own LINQ operations on LINQ to object, then, you will sincerely lament the powerful functions of this interface and the usefulness of the iterative Blocks provided by C # language.
This article also shows examples of how to use iterative blocks in actual projects to make the code easier to read and more logically. We hope these examples will help you understand iteration.