Some thoughts on string concatenation in C#
I recently stumbled into a blog entry on CodeProject.com that stated some things about string concatenation in C# that went against what I thought to be true, and it got me thinking.
From the article:
string sentence = "The " + "dog " + "ate " + "the " + "cat " + "all " + "day " + "for " + "for " + "fun.";
”That innocent looking line of code actually takes up much more processing power and memory than it appears to. If strings were combined in the ideal way, you would expect that the sentence would be the only string created from this operation. However, since each string is combined to its neighbor in succession, it turns out that 7 other strings are also created (shown in gray in the diagram above). The total amount of unnecessary memory allocations created from this operation is equal to the following equation, where N is the number of strings you are combining…”
But I knew that couldn’t be true, so I fired up Visual Studio and wrote a few simple tests, compiled and then analyzed the generated CIL.
First test:
Original C#:
public string createStringOne() { return "The " + "dog " + "ate " + "the " + "cat " + "all " + "day " + "for " + "for " + "fun."; }
Generated CIL:
.method public hidebysig instance string createStringOne() cil managed { .maxstack 8 L_0000: ldstr "The dog ate the cat all day for for fun." L_0005: ret }
Second test:
Original C#:
public string createStringTwo() { return "The dog ate the cat all day for for fun."; }
Generated CIL:
.method public hidebysig instance string createStringTwo() cil managed { .maxstack 8 L_0000: ldstr "The dog ate the cat all day for for fun." L_0005: ret }
Third test:
Original C#:
public string createStringThree() { var sb = new StringBuilder(); sb.Append("The "); sb.Append("dog "); sb.Append("ate "); sb.Append("the "); sb.Append("cat "); sb.Append("all "); sb.Append("day "); sb.Append("for "); sb.Append("for "); sb.Append("fun."); return sb.ToString(); }
Generated CIL:
.method public hidebysig instance string createStringThree() cil managed { .maxstack 2 .locals init ( [0] class [mscorlib]System.Text.StringBuilder sb) L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor() L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldstr "The " L_000c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0011: pop L_0012: ldloc.0 L_0013: ldstr "dog " L_0018: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_001d: pop L_001e: ldloc.0 L_001f: ldstr "ate " L_0024: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0029: pop L_002a: ldloc.0 L_002b: ldstr "the " L_0030: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0035: pop L_0036: ldloc.0 L_0037: ldstr "cat " L_003c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0041: pop L_0042: ldloc.0 L_0043: ldstr "all " L_0048: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_004d: pop L_004e: ldloc.0 L_004f: ldstr "day " L_0054: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0059: pop L_005a: ldloc.0 L_005b: ldstr "for " L_0060: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0065: pop L_0066: ldloc.0 L_0067: ldstr "for " L_006c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_0071: pop L_0072: ldloc.0 L_0073: ldstr "fun." L_0078: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string) L_007d: pop L_007e: ldloc.0 L_007f: callvirt instance string [mscorlib]System.Object::ToString() L_0084: ret }
Fourth test:
Original C#:
public string createStringFour() { return new StringBuilder("The dog ate the cat all day for for fun.").ToString(); }
Generated CIL:
.method public hidebysig instance string createStringFour() cil managed { .maxstack 8 L_0000: ldstr "The dog ate the cat all day for for fun." L_0005: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor(string) L_000a: callvirt instance string [mscorlib]System.Object::ToString() L_000f: ret }
Fifth test:
Original C#:
public string createStringFive() { string s = "The "; s += "dog "; s += "ate "; s += "the "; s += "cat "; s += "all "; s += "day "; s += "for "; s += "for "; s += "fun."; return s; }
Generated CIL:
.method public hidebysig instance string createStringFive() cil managed { .maxstack 2 .locals init ( [0] string s) L_0000: ldstr "The " L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldstr "dog " L_000c: call string [mscorlib]System.String::Concat(string, string) L_0011: stloc.0 L_0012: ldloc.0 L_0013: ldstr "ate " L_0018: call string [mscorlib]System.String::Concat(string, string) L_001d: stloc.0 L_001e: ldloc.0 L_001f: ldstr "the " L_0024: call string [mscorlib]System.String::Concat(string, string) L_0029: stloc.0 L_002a: ldloc.0 L_002b: ldstr "cat " L_0030: call string [mscorlib]System.String::Concat(string, string) L_0035: stloc.0 L_0036: ldloc.0 L_0037: ldstr "all " L_003c: call string [mscorlib]System.String::Concat(string, string) L_0041: stloc.0 L_0042: ldloc.0 L_0043: ldstr "day " L_0048: call string [mscorlib]System.String::Concat(string, string) L_004d: stloc.0 L_004e: ldloc.0 L_004f: ldstr "for " L_0054: call string [mscorlib]System.String::Concat(string, string) L_0059: stloc.0 L_005a: ldloc.0 L_005b: ldstr "for " L_0060: call string [mscorlib]System.String::Concat(string, string) L_0065: stloc.0 L_0066: ldloc.0 L_0067: ldstr "fun." L_006c: call string [mscorlib]System.String::Concat(string, string) L_0071: stloc.0 L_0072: ldloc.0 L_0073: ret }
Results:
So as you can see, the first two methods are essentially the same thing! It doesn’t matter if we concatenate one large string from several smaller strings if (and only if) it happens on one operation. If concatenation is done using multiple operations, then it does in fact incur a performance and memory hit.
The use of a StringBuilder in this case (where there are only a small handful of small strings) serves no purpose as far as performance is concerned.
But… how are they equal!?
Compiler optimizations! The C# compiler is an incredible piece of software that can look at your code and “fix” it. It is important when working on code optimizations like the one in the article mentioned above that developers take into consideration compiler optimizations that may alter the code that they think is poorly written.
September 9th, 2009 at 7:02 am
$ cat litstr.c
#include
int main() {
const char * str = “The ” “dog ” “ate ” “the ” “cat ” “all ” “day ” “for ” “fun.”;
printf(“%s\n”, str);
return 0;
}
$ cat litstr.s
.file “litstr.c”
.section .rodata.str1.8,”aMS”,@progbits,1
.align 8
.LC0:
.string “The dog ate the cat all day for fun.”
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB3:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $.LC0, %edi
call puts
xorl %eax, %eax
addq $8, %rsp
ret
.cfi_endproc
.LFE3:
.size main, .-main
.ident “GCC: (Gentoo 4.4.1 p1.0) 4.4.1″
.section .note.GNU-stack,”",@progbits
September 10th, 2009 at 8:37 pm
I think what wolf550e is trying to say is that the GCC compiler also does a whole bunch of compiler optimizations (certainly not just limited to strings).
Thanks wolf!