Strings in .NET Are Not Null Terminated
Unexpected Behavior
Take a look at the following function:
void DumpToConsole(string s)
{
Console.WriteLine($"String contents: {s}");
Console.WriteLine($"String length: {s.Length}");
}
Can you think of an input value that would output the following to the console?
String contents: a
String length: 10
Yes, the string could be padded with whitespace, but I was thinking of another way:
var chars = new char[10];
chars[0] = 'a';
var nullPaddedString = new String(chars);
DumpToOutput(nullPaddedString);
This might surprise you, if you're not very familiar with .NET internals. At least, if you have previous experience with C or other non-managed languages. I know, it surprised me. The behavior is explained in documentation, though:
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length.
Comparing strings with trailing null characters can give even stranger results:
var regularString = "a";
Console.WriteLine(String.Compare(nullPaddedString, regularString, StringComparison.InvariantCulture));
Console.WriteLine(nullPaddedString.Length == regularString.Length);
Although the string length differs, their contents will be treated as equal:
0
False
Again, documentation warns about that:
Null characters are ignored when performing culture-sensitive comparisons between two strings, including comparisons using the invariant culture. They are considered only for ordinal or case-insensitive ordinal comparisons.
Sanitizing the Strings
In spite of it being documented, working with null padded strings rarely makes sense. To avoid that, you really should sanitize the strings, when creating them from character arrays, which you don't have full control over:
string CreateStringFromCharArray(char[] chars)
{
return new String(chars, 0, Array.IndexOf(chars, '\0'));
}
The above method treats first null character as string terminator and ignores all the remaining characters in the array.
Trimming the null characters from the string after it is created might seem a more obvious approach:
string CreateStringFromCharArray(char[] chars)
{
var s = new String(chars);
return s.TrimEnd('\0');
}
It has two significant disadvantages, though:
- It will only trim trailing null characters. If the input character array had any null characters in the middle, they would remain in the resulting string. This might happen when the character array has been reused and contains old invalid data beyond the first null character.
- Since string are immutable, two strings will be created instead of one: the original one with the null characters and the final one without them. When creating a lot of strings this way, it will cause extera work for the garbage collector.
Tooling
Visual Studio mostly handles strings with embedded null characters well: null characters will be displayed in the watch windows, in the immediate window, and even in the new C# interactive window:
Printing out such strings to the debug console doesn't work as expected: everything beyond the first null character in the string is cut off, including the new line that should be appended to the Debug.WriteLine()
output. Hence, the following two lines of code:
Console.WriteLine($"String contents: {s}");
Console.WriteLine($"String length: {s.Length}");
Would result in:
String contents: aString length: 10
Other tools don't necessarily handle strings with null characters that well. For example, LINQPad just ignores them everywhere, which can be quite misleading:
That's where I did my first tests, when Goran brought this behavior to my attention.