Clean code & ReverseString
In some of his latest video, Nick Chapsas is reviewing various C# advises. The one we will be talking is about Keep it simple stupid, for which the following code is considered to be bad:
public string ReverseString(string input)
{
char[] charArray = input.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
As opposed to the following, considered to be good:
public string ReverseString(string input)
{
return new string(input.Reverse().ToArray());
}
It then started a discussion about which version is simpler.
Personally, I can read both as easily, and anyone who has been reading C# for a little while should.
So my point is that I do not care, as what makes the code simple is the fact that it is the implementation of a function for which the intent is clear.
The interesting part is when Nick Chapsas started to benchmark both versions and shows that the first one is faster and allocates less memory. In fact, before Linq optimizations in dotnet 6.0, the difference would have been even greater.
The second version is slower because it relies on the IEnumerable abstract. There are more tests and more allocation because of this.
Nick Chapsas also showed an even faster implementation using spans.
So, whenever I can, I am explicit in my implementation. For instance, when I declare a private list field I generally declare it as List<T>
, instead of IList<T>
, or worse, IEnumerable<T>
, because I then have access to functions specific to this data structure.
So here, I would have written the first version.
Also, someone in the comments pointed out that the code is incorrect because of the surrogates pair, and suggested to try the algorithm on a string with smileys.
So, this code is clean because it is a function, and therefore:
- we can easily understand what the code does by simply reading the function name
- we can easily test it
- we can easily benchmark it
- we can easily change the code for the faster version
- we can easily fix the implementation when the code is proven faulty.
The implementation of the reverse that handle grapheme clusters (text elements), such as diacritic characters and emoji is:
public static string Reverse(this string input)
{
return string.Create(input.Length, input, CopyRunesBackwards);
static void CopyRunesBackwards(Span<char> span, string state)
{
var offset = span.Length;
var enumerator = StringInfo.GetTextElementEnumerator(state);
while (enumerator.MoveNext())
{
var t = enumerator.GetTextElement().AsSpan();
offset -= t.Length;
t.CopyTo(span[offset..]);
}
Debug.Assert(0 == offset, "String not completly initialized");
}
}
Unfortunately, this implementation is slower and allocates more:
- the enumerator is a class
GetTextElement
returns a substring
. But you known how it is:
Make it work, make it right, make it fast.
So, getting inspiration from StringRuneEnumerator and TextElementEnumerator
, I implemented a GraphemeClusterEnumerator
. Coding it took more time than anticipated because it requires System.Text.Unicode.TextSegmentationUtility
, which is an internal class, so I had to grab some code from the dotnet/runtime repo in github.
The shape of the implementation is the same as above but using structs reduce the number of allocations.
Benchmark
The different implementations benchmark results are:
BenchmarkDotNet v0.13.8, Windows 11 (10.0.22000.2416/21H2/SunValley)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK 7.0.401
[Host] : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2
Method | Input | Mean | Error | StdDev | Median | Ratio | RatioSD | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
Reverse_Array | **** | 4.602 ns | 0.0906 ns | 0.0756 ns | 4.596 ns | 1.00 | 0.00 | - | NA |
Reverse_Linq | 41.157 ns | 0.5132 ns | 0.4549 ns | 41.166 ns | 8.94 | 0.16 | 80 B | NA | |
Reverse_Span | 3.306 ns | 0.1261 ns | 0.1808 ns | 3.209 ns | 0.74 | 0.04 | - | NA | |
Reverse_Text | 3.209 ns | 0.0914 ns | 0.0855 ns | 3.186 ns | 0.70 | 0.02 | - | NA | |
Reverse_Optimized | 3.307 ns | 0.0622 ns | 0.0552 ns | 3.286 ns | 0.72 | 0.02 | - | NA | |
Reverse_Array | 😀😟 | 20.203 ns | 0.4121 ns | 0.3441 ns | 20.070 ns | 1.00 | 0.00 | 64 B | 1.00 |
Reverse_Linq | 😀😟 | 85.967 ns | 0.7060 ns | 0.6258 ns | 86.001 ns | 4.26 | 0.08 | 144 B | 2.25 |
Reverse_Span | 😀😟 | 14.995 ns | 0.3527 ns | 0.3464 ns | 14.932 ns | 0.74 | 0.02 | 32 B | 0.50 |
Reverse_Text | 😀😟 | 103.386 ns | 2.0087 ns | 1.9728 ns | 103.014 ns | 5.11 | 0.12 | 144 B | 2.25 |
Reverse_Optimized | 😀😟 | 63.314 ns | 0.6747 ns | 0.6311 ns | 63.150 ns | 3.14 | 0.06 | 32 B | 0.50 |
Reverse_Array | 0123456789 | 24.129 ns | 0.4867 ns | 0.5794 ns | 23.844 ns | 1.00 | 0.00 | 96 B | 1.00 |
Reverse_Linq | 0123456789 | 163.535 ns | 2.7770 ns | 2.5976 ns | 163.151 ns | 6.75 | 0.23 | 288 B | 3.00 |
Reverse_Span | 0123456789 | 17.467 ns | 0.2907 ns | 0.2720 ns | 17.372 ns | 0.72 | 0.02 | 48 B | 0.50 |
Reverse_Text | 0123456789 | 401.356 ns | 4.9837 ns | 4.1616 ns | 399.765 ns | 16.52 | 0.47 | 336 B | 3.50 |
Reverse_Optimized | 0123456789 | 261.415 ns | 2.5037 ns | 2.2194 ns | 260.985 ns | 10.78 | 0.32 | 48 B | 0.50 |
Reverse_Array | abcde(...)vwxyz [26] | 26.849 ns | 0.5541 ns | 0.4912 ns | 26.583 ns | 1.00 | 0.00 | 160 B | 1.00 |
Reverse_Linq | abcde(...)vwxyz [26] | 278.826 ns | 4.8125 ns | 4.0186 ns | 278.790 ns | 10.38 | 0.27 | 464 B | 2.90 |
Reverse_Span | abcde(...)vwxyz [26] | 19.804 ns | 0.1992 ns | 0.1663 ns | 19.815 ns | 0.74 | 0.02 | 80 B | 0.50 |
Reverse_Text | abcde(...)vwxyz [26] | 992.318 ns | 10.9564 ns | 9.7126 ns | 991.909 ns | 36.97 | 0.65 | 752 B | 4.70 |
Reverse_Optimized | abcde(...)vwxyz [26] | 660.109 ns | 8.5522 ns | 6.6770 ns | 659.187 ns | 24.55 | 0.57 | 80 B | 0.50 |
Reverse_Array | grinn(...)ce 😟 [34] | 27.702 ns | 0.5980 ns | 0.6887 ns | 27.426 ns | 1.00 | 0.00 | 192 B | 1.00 |
Reverse_Linq | grinn(...)ce 😟 [34] | 340.099 ns | 2.1584 ns | 1.8024 ns | 340.646 ns | 12.21 | 0.39 | 584 B | 3.04 |
Reverse_Span | grinn(...)ce 😟 [34] | 19.777 ns | 0.3133 ns | 0.2931 ns | 19.653 ns | 0.71 | 0.02 | 96 B | 0.50 |
Reverse_Text | grinn(...)ce 😟 [34] | 1,257.443 ns | 20.1528 ns | 23.2079 ns | 1,255.964 ns | 45.41 | 1.24 | 928 B | 4.83 |
Reverse_Optimized | grinn(...)ce 😟 [34] | 882.679 ns | 11.5303 ns | 10.7855 ns | 878.964 ns | 31.77 | 0.92 | 96 B | 0.50 |
References & Useful Links
- "One Line of Code Means Clean Code!" - Code Cop #002
- Implementation of Enumerable.Reverse
- Optimized implementation of Enumerable.Reverse
- "Clean" Code, Horrible Performance
- How to reverse a string that contains surrogate pairs
- Character encoding in .NET — Grapheme clusters
- Rune Struct
A Rune instance represents a Unicode scalar value, which means any code point excluding the surrogate range (U+D800..U+DFFF).