Stack on the stack

ref struct have been introduced with C# 7.2 to give us some optimizations opportunities. So let's try to implement a ref struct Stack and see how it goes.

The simplest implementation I can think of is the following.

ref struct Stack<T>
{
    private readonly Span<T> storage;
    private int count;

    public Stack(Span<T> storage)
    {
        Debug.Assert(!storage.IsEmpty, "The storage cannot be empty");
        this.storage = storage;
    }

    public readonly int Count => count;

    public void Push(T value)
    {
        storage[count++] = value;
    }

    public readonly ref T Peek()
    {
        return ref storage[count - 1];
    }

    public ref T Pop()
    {
        return ref storage[--count];
    }
}
Pros
  • does not allocate
  • avoir if statements by letling the framework perform bound checks
Cons
  • does not grow
  • IndexOutOfRangeException thrown for overflow or underflow
  • does not clean the memory cell once the item has been popped, if T is a reference type

One way to fix the exception type issue is to catch and rethrow, as in the CompensatingStack implementation:

    public void Push(T value)
    {
        try
        {
            storage[count++] = value;
        }
        catch (IndexOutOfRangeException)
        {
            count = storage.Length;
            throw new InvalidOperationException("Capacity overflow");
        }
    }

or by using an if statement and throw, as in the DefensiveStack implementation:

    public void Push(T value)
    {
        if (count == storage.Length)
            throw new InvalidOperationException("Capacity overflow");

        storage[count++] = value;
    }

Of course, if we have an if statement, we could grows the storage instead, as in the GrowingStack implementation:

    public void Push(T value)
    {
        if ((uint)count < (uint)storage.Length)
        {
            storage[count++] = value;
        }
        else
        {
            // give a chance to inline the common Push method by moving the uncommon case in another method
            PushWithResize(value);
        }
    }

None of these implementation fixes the clean issue, which is, by the way, to let the garbage collector know that the popped item is no longer required.
To fix this efficiently, we should change the signature of Pop and return void, like c++' STL does.

    public void Pop()
    {
        storage[--count] = default!;
    }

So now, let's benchmark all these implementations by counting the number of items in a tree using a stack. There are different scenario to isolate the impact of memory allocation.


BenchmarkDotNet v0.13.7, Windows 11 (10.0.22000.2295/21H2/SunValley)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
Method Mean Error StdDev Ratio RatioSD
StandardStack 573.1 ns 3.05 ns 2.70 ns 1.00 0.00
StandardStackWithEnoughCapacity 540.6 ns 10.64 ns 13.84 ns 0.94 0.02
StandardStackWithSharedStack 564.8 ns 1.44 ns 1.28 ns 0.99 0.00
UsingCompensatingStack 770.7 ns 5.87 ns 5.49 ns 1.34 0.01
UsingDefensiveStack 416.0 ns 8.26 ns 10.15 ns 0.72 0.02
UsingUnguardedStack 386.7 ns 2.14 ns 2.00 ns 0.67 0.00
UsingUnguardedStackWithSharedBuffer 399.3 ns 1.08 ns 0.90 ns 0.70 0.00
UsingGrowingStack 460.0 ns 9.07 ns 8.48 ns 0.80 0.01

So, what do we learn?

  • Compensation has a negative impact on the performance.
  • The other optimization strategies are more or less the same with 20% improvement, more or less, the unguarded version being the fastest.

Because we were using reference types, we were not able to use stackalloc. Let check if it has an impact on the performance. So, let basically push and pop a fixed number of items N times, in order to have measures in µs instead of ns.

Method N Mean Error StdDev Ratio RatioSD
StandardStack 10000 1.992 ms 0.0359 ms 0.0336 ms 1.00 0.00
StandardStackWithEnoughCapacity 10000 2.043 ms 0.0400 ms 0.0374 ms 1.03 0.03
StandardStackWithSharedStack 10000 1.940 ms 0.0058 ms 0.0051 ms 0.98 0.02
UsingCompensatingStack 10000 2.668 ms 0.0102 ms 0.0095 ms 1.34 0.02
UsingDefensiveStack 10000 2.093 ms 0.0418 ms 0.0391 ms 1.05 0.03
UsingUnguardedStack 10000 1.564 ms 0.0052 ms 0.0046 ms 0.79 0.01
UsingGrowingStack 10000 1.885 ms 0.0097 ms 0.0086 ms 0.95 0.02

We end up having equivalent performance for almost all scenario, but we can definitively say that:

  • The unguarded scenario is the fastest.
  • The compensating scenario is the slowest.

After looking at the implementation of the standard Stack, it appears than we may be able to improve the performance for the growing stack implementation, but I would probably not invest time in this. The stack on the stack has such a specific use case, namely to optimize some tree or graph algorithm, that we may use the unguarded version but keep it an implementation detail.

References & Useful Links

Leave a comment

Please note that we won't show your email to others, or use it for sending unwanted emails. We will only use it to render your Gravatar image and to validate you as a real person.