Objects of type `string` are internally encoded as UTF-16. Byte arrays or spans can contain anything, including strings encoded as UTF-8 (or whatever encoding you like, or raw binary files, or random bytes). The new feature does exactly what's written in the blog post: if a string literal contains only UTF-8 characters and you assign it to a byte array or span, it gets encoded as UTF-8. It's just syntactical sugar.
This is a post about C#11 the language, not the framework or the runtime. It's only telling you about what the compiler does when it encounters some syntax. Under the hood, it probably calls some encoders from the standard library.
> if a string literal contains only UTF-8 characters and you assign it to a byte array or span, it gets encoded as UTF-8.
I write a bunch of C# for my job, but am far from an expert in the language. My reading of this statement is redundant, which means I feel sure it's trying to communicate something the authors thought was "obvious" and is not.
* A string literal - so, realistically some Unicode text, right? All the other encodings anybody was actually using can transliterate to Unicode, so, they are just Unicode (with a different encoding)
* contains only UTF-8 characters - UTF-8 is an encoding of Unicode, so, this just means Unicode again
I'm guessing actually C# can write something that's not Unicode in a String for some reason? But what that might be is unexplained:
Can you... emit arbitrary bytes? But how when your native encoding (UTF-16) isn't even byte oriented? What does that mean?
Maybe you can emit the rare Unicode "non-characters" like U+FFFF ? But, you can express those just fine in UTF-8 so who cares?
Or perhaps it's as simple as C# lets you write literals which are sequences of 16-bit code units but aren't UTF-16 ?
> The language will allow conversions between string constants and byte sequences where the text is converted into the equivalent UTF8 byte representation. Specifically the compiler will allow string_constant_to_UTF8_byte_representation_conversion - implicit conversions from string constants to byte[], Span<byte>, and ReadOnlySpan<byte>. A new bullet point will be added to the implicit conversions §10.2 section. This conversion is not a standard conversion §10.4.
This is a post about C#11 the language, not the framework or the runtime. It's only telling you about what the compiler does when it encounters some syntax. Under the hood, it probably calls some encoders from the standard library.