I've seen UTF-8 with a BOM while consuming data when integrating with strongly Windows-centric environments. Relatively uncommon, but does happen. And it is very annoying!
BOM is only a problem with strict syntaxes, which robots.txt is not an example of. If the "consumer" simply ignores invalid or meaningless lines, you can avoid issues from invisible characters by not having anything meaningful on the first line of your file.
Yes, it’s widely used. Many text editors insert a UTF-8 BOM as the first character in a text file to signal that the encoding is UTF-8. It’s technically pointless since UTF-8 doesn’t depend on endianness, but since UTF-8 doesn’t have a “magic number” to identify itself, the convention is to use the BOM codepoint.
You can occasionally see it in git diffs as U+FEFF, or if you open a text file in a hex editor as EF BB BF
>since UTF-8 doesn’t have a “magic number” to identify itself, the convention is to use the BOM codepoint
Neither does any other of the hundreds of existing text encodings.
It's debatable how much of a magic number it's supposed to be anyway, considering that few people have insisted on having magic numbers in text files, and that you get the BOM at the beginning by simply naively converting a UCS-2/UTF-16 file codepoint by codepoint (and vice versa, enforce it to be there if you ever happen to do the conversion the other way around because of course you're conversion couldn't include that extra logic in it).
The nice thing about the BOM is you can't get it accidentally in an ASCII file - all the bytes have the upper bit set but all ASCII characters have that bit as zero. It makes an excellent magic number for that reason. It's probably just as unlikely to come up in other encodings that use the upper bit.
I'm wondering if it's widely used.