Y is for… Yield
Before we talk specifically about the yield
keyword, let’s review a few constructs you probably use everyday, namely collection classes like lists and arrays. We’re quite used to traversing these simply with a foreach
loop, and what enables us to do so is that these types implement the System.Collections.IEnumerable
interface.
IEnumerable
is a rather simple interface that requires implementing a single method, GetEnumerator
, which returns an object that implements another interface, IEnumerator
. IEnumerator
, in turn, encapsulates two methods and a property:
MoveNext |
called when advancing to the next element in the sequence or collection; it returns false when at the end of the collection. |
Current |
obtains the System.Object at the current spot in the collection |
Reset |
reinitializes the list or sequence |
Now, let’s say we wanted to build a implementation of a class that returns the first n triangular numbers. One way to do so would be the following:
1: public class TriNumbers : IEnumerable
2: {
3: private Int32[] _nums;
4: public TriNumbers(Int32 n)
5: {
6: if (n <= 0)
7: throw new System.ArgumentOutOfRangeException();
8:
9: _nums = new Int32[n];
10: _nums[0] = 1;
11: for (Int32 i = 1; i < n; i++)
12: _nums[i] = _nums[i-1] + i + 1;
13: }
14:
15: public IEnumerator GetEnumerator()
16: {
17: return _nums.GetEnumerator();
18: }
19: }
and then iterate over it with code like this.
class Program
{
static void Main(string[] args)
{
TriNumbers triNums = new TriNumbers(10);
foreach (var val in triNums)
Console.WriteLine(val);
Console.ReadLine();
}
}
In the implementation of TriNumbers
, I sort of ‘cheated’ (line 17) by just deferring the iteration work to the iterator I get for free from the underlying integer array, which is really what’s storing my sequence.
Not that cheating is wrong in this case, but it does require my array to be populated from the get-go, in the constructor. Initializing an array of ten elements is no big deal, but what if the argument to the constructor were much larger? and there are many instances of this class in memory? and what if I only end up iterating over the first couple of values? Clearly I’m using up memory here that isn’t really needed, and I’m taking time to calculate each element of the sequence, when I’m not really even sure I need them all. If the calculation was CPU-intensive or required a series of database or services calls to build the list, there could be a lot of wasted cycles dedicated to populating data that might never be requested in the application.
To make the class a little less wasteful up front, I can implement the IEnumerator
interface explicitly – something like this:
public class TriNumbers : IEnumerable
{
private TriNumbersEnumerator _enum;
public TriNumbers(Int32 n)
{
_enum = new TriNumbersEnumerator(n);
}
public IEnumerator GetEnumerator()
{
return _enum;
}
private class TriNumbersEnumerator : IEnumerator
{
private Int32 _limit = 0;
private Int32 _index = 0;
private Int32 _value = 0;
public TriNumbersEnumerator(Int32 n)
{
if (n <= 0)
throw new System.ArgumentOutOfRangeException();
_limit = n;
}
public object Current
{
get
{
if (_index == 0)
throw new System.InvalidOperationException(
"Enumeration has not started. Call MoveNext.");
else
return _value;
}
}
public bool MoveNext()
{
_index++;
if (_index <= _limit)
{
_value += _index;
return true;
}
else
{
Reset();
return false;
}
}
public void Reset()
{
_index = 0;
_value = 0;
}
}
}
That works, but there’s an easier way! Enter yield
– not much longer than the original implementation, and certainly less wasteful of space and CPU cycles should the entire sequence not be enumerated.
1: public class TriNumbers : IEnumerable
2: {
3: private Int32 _limit = 0;
4: public TriNumbers(Int32 n)
5: {
6: if (n <= 0)
7: throw new System.ArgumentOutOfRangeException();
8: _limit = n;
9: }
10:
11: public IEnumerator GetEnumerator()
12: {
13: Int32 val = 0;
14: for (Int32 i = 0; i < _limit; i++)
15: {
16: val += i + 1;
17: yield return val;
18: }
19: }
The magic is on line 17, with the yield return
statement. Each time this statement is reached, the current value of val
is returned as the value of the IEnumerator
reference (namely, what the foreach
in the calling program will see). The current location and state of the GetEnumerator
method is stored, and so the next time the iterator is called, we’ll get the next value from the for loop in line 14.
While reading IL is not something I do often, it’s interesting to look at what gets generated when using the yield return
construct. Using IL DASM, you can see a newobj
call to a class called d__0
, essentially indicating that there is a new class being constructed under the covers.
IL DASM shows that that class is implementing the IEnumerator
interface (the Current
property and the Reset
and MoveNext
methods), so essentially the yield return
provides some "syntactic sugar" (more or less) for code similar to the TriNumbersEnumerator
class that I wrote above.
The real story is a tad deeper than that though, and Wes Dyer does a great job of looking more closely at the generated class in his blog post. And if you’d like to do all this in Visual Basic, which unfortunately doesn’t have iterators (yet?), take a look at Matthew Doig’s blog.