UTF-8 string literals

Today I want to talk about the shiny new feature of C# 11, the UTF-8 string literals. If you’re a programmer, you know that string handling is a big part of what we do, and in .NET, we’ve traditionally used string as the default type for string literals. But that’s about to change! With C# 11, we now have the option to use UTF-8 string literals instead.

The big advantage of using UTF-8 string literals is that they are stored as ReadOnlySpan, which is the natural type for a UTF-8 string literal. This means that we can manipulate UTF-8 string literals with ReadOnlySpan methods, and avoid unnecessary string decoding and encoding.

So, how do we use UTF-8 string literals in C# 11 and .NET Core 7? It’s as simple as adding the u8 suffix to your string literal:

// Old way
{
    ReadOnlySpan<byte> utf8String = new byte[] { 72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33 };
    string decodedString = System.Text.Encoding.UTF8.GetString(utf8String);
    Console.WriteLine(decodedString);
}
// Fancy new way
{
    ReadOnlySpan<byte> utf8StringLiteral = "Hello, World!"u8;
    string decodedString = System.Text.Encoding.UTF8.GetString(utf8StringLiteral);
    Console.WriteLine(decodedString);
}

Just look at that syntactic sugar! One important thing to note about UTF-8 string literals is that they are runtime constants, not compile-time constants. This means that their value is determined at runtime and not at compile time. This also means that UTF-8 string literals can’t be combined with string interpolation. You can’t use the $ token and the u8 suffix on the same string expression.

Conclusion

In conclusion, UTF-8 string literals in C# 11 and .NET Core 7 are a great addition to the language and framework. They allow us to handle UTF-8 strings more efficiently and can lead to better performance in our applications. So, next time you’re working on a project, consider using UTF-8 string literals instead of traditional string literals. Your future self will thank you.