I find this criticism very bizarre. C++ and C have several different string types, as does python. This comes down to the fact that what programmers think are strings are often not really and vice versa.
To be clear, a 'string' is a sequence of characters or glyphs meant for human consumption or that are produced by a human as input. A string is not a sequence of bytes. C++ and C get this really wrong, and we pretend char* or std::string is actually a string, but they're not -- they're sequences of bytes. The best string in C is wchar* and the best string in C++ is std::wstring or even better a string from the ICU library. std::string completely glosses over the various encodings of real strings.
Python gets the string situation much better. str is for strings, and bytes is for bytes. str can contain actual human characters, and bytes can't (unless you encode a str to bytes properly).
From this perspective, Haskell really has only one string type: Text. ByteString is actually just a sequence of bytes, and it's an accident of history that Roman characters fit into bytes.
String is an iterator over human readable characters, akin to str in python or std::wstring::iterator in C++.
You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).
In that regard I still find it extremely puzzling that some languages / runtimes still struggle with only giving you two choices: byte arrays and UTF-8 strings. I really don't know what's so hard about it. As mentioned in another comment in this [now pretty big] sub-thread, this is one of the things I found off-putting in OCaml which is otherwise an excellent language.
As for Haskell, it has been about 2 years since I last looked at it. I might be spoiled by the languages I enumerated but I just found the docs [at the time] lacked proper nuance to give you a way to immediately make a choice. Dunno. It's not only the strings though; Haskell's community focuses so much on certain aspects that basic ergonomics get neglected.
I am willing to retry getting into it one day. Just shared my general impressions which are very likely outdated.
> You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).
In Java, there is a 'String' class that is used for strings most of the time, but there is also char[], which would be confusing to many C programmers. It's only familiarity and ubiquity that make this choice 'obvious' for most devs
> In that regard I still find it extremely puzzling that some languages / runtimes still struggle with only giving you two choices: byte arrays and UTF-8 strings.
To be clear, Haskell's strings and Text are not UTF-8. Thy are containers that can hold any unicode character, but the particular way in which they're stored is not really guaranteed. I believe Text is UTF-16 internally, and String, being based on Char, uses full UTF-32.
> You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).
Java's strings have noticeable misbehaviour for astral characters, and since the language is built around a single hardcoded string type, almost all Java programs inherit this. Ruby had to have a major migration to fix its string support. Rust is much, much younger than Haskell (so it hasn't had to go through a migration yet) and is famous for having at least 6 different string types.
I find this criticism very bizarre. C++ and C have several different string types, as does python. This comes down to the fact that what programmers think are strings are often not really and vice versa.
To be clear, a 'string' is a sequence of characters or glyphs meant for human consumption or that are produced by a human as input. A string is not a sequence of bytes. C++ and C get this really wrong, and we pretend char* or std::string is actually a string, but they're not -- they're sequences of bytes. The best string in C is wchar* and the best string in C++ is std::wstring or even better a string from the ICU library. std::string completely glosses over the various encodings of real strings.
Python gets the string situation much better. str is for strings, and bytes is for bytes. str can contain actual human characters, and bytes can't (unless you encode a str to bytes properly).
From this perspective, Haskell really has only one string type: Text. ByteString is actually just a sequence of bytes, and it's an accident of history that Roman characters fit into bytes.
String is an iterator over human readable characters, akin to str in python or std::wstring::iterator in C++.