Optimized Use of Enumerables with Yield Return
3 min read
Manipulating lists is something trivial in day-to-day development. There are cases where filters are made on a large amount of data, which can cause pressure on the GC.
When the list has few elements it is difficult to detect a possible problem, but when this list grows, the debt is collected. One way to minimize this pressure is by using yield return.
Yield Return
Given a list of elements, yield return returns the next value in the sequence. The yield return works on top of the Enumerable interface, which is the contract for implementing the iterator pattern in the Enumerable class.
The iterator pattern is a pattern present in most general purpose and/or object-oriented languages and is used to cross a collection of objects. This crossing allows you to decouple algorithms from list interaction.
An object that inherits from Agggregate aggregates objects. This Aggregate object is interacted with by the ConcreteIterator object.
The Problem with Filters
A typical pattern in list manipulation is list filters. In the example below, we are given a list of numbers where even numbers are added to a new list and then returned.
private IEnumerable<int> FilteringWithout(int iterations)
{
var range = Enumerable.Range(1, iterations);
var evenList = new List<int>();
foreach (var number in range)
{
if (number % 2 == 0)
evenList.Add(number);
}
return evenList;
}
The code is correct and works. If this list is small, it probably won’t put pressure on the GC, however, when the number of objects increases things get interesting.
Using Yield Return
The example below demonstrates the yield return in practice. As its definition says, it returns the next element in the list. Since it always returns the next element in the list, it is not necessary to do the memory allocations that we observed in the previous solution, where each result object is added to a temporary list that is returned.
This change in behavior causes a tremendous impact of pressure on the GC.
private IEnumerable<int> FilterWith(int iterations)
{
var range = Enumerable.Range(1, iterations);
foreach (var number in range)
{
if (number % 2 == 0)
yield return number;
}
}
Benchmark with Yield Return
The examples above were run with BenchmarkDotNet, where it is possible to observe the effect of pressure on the GC on the list increment. After 10_000 interactions, the promotion of objects to GEN1 is observed.
When the collection is increased to 100_000 things get more interesting. In this scenario, the promotion of objects to GEN2 is observed.
In a real scenario this increases the chance of a full GC collection. It is also interesting to note that when using yield return, the memory allocation does not change when the list is incremented.
This is because the iterator returns one object at a time instead of accumulating them in a temporary list. The memory allocation in the scenario without yield return and with 100_000_000 interactions is 536MB.
Real Scenarios
A very common scenario of encountering GC pressure from collection interactions is in processes that manipulate data files, such as csv or txt. The most common implementations treat each line of the file as an object in a list.
The file is read, the lines create objects that are added to a list. Later, manipulations are applied that can generate other intermediate lists. Using yield return when importing files enables the reading of blocks of lines that are manipulated when received.
Conclusion
Manipulating collections is part of a developer’s daily routine and deserves a lot of attention. It is essential to consider the volume of objects that will be handled, this can become an offender.
Whenever possible, use yield returns in returns to relieve pressure on the GC. Small changes in coding style can prevent offenders, save computational resources and deliver a more performant application.