> (several types of them). I find this criticism very bizarre. C++ and C have se...

pdimitar · on June 8, 2020

You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).

In that regard I still find it extremely puzzling that some languages / runtimes still struggle with only giving you two choices: byte arrays and UTF-8 strings. I really don't know what's so hard about it. As mentioned in another comment in this [now pretty big] sub-thread, this is one of the things I found off-putting in OCaml which is otherwise an excellent language.

As for Haskell, it has been about 2 years since I last looked at it. I might be spoiled by the languages I enumerated but I just found the docs [at the time] lacked proper nuance to give you a way to immediately make a choice. Dunno. It's not only the strings though; Haskell's community focuses so much on certain aspects that basic ergonomics get neglected.

I am willing to retry getting into it one day. Just shared my general impressions which are very likely outdated.

_fq4v · on June 8, 2020

> You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).

In Java, there is a 'String' class that is used for strings most of the time, but there is also char[], which would be confusing to many C programmers. It's only familiarity and ubiquity that make this choice 'obvious' for most devs

> In that regard I still find it extremely puzzling that some languages / runtimes still struggle with only giving you two choices: byte arrays and UTF-8 strings.

To be clear, Haskell's strings and Text are not UTF-8. Thy are containers that can hold any unicode character, but the particular way in which they're stored is not really guaranteed. I believe Text is UTF-16 internally, and String, being based on Char, uses full UTF-32.

EdwardDiego · on June 8, 2020

> In Java, there is a 'String' class that is used for strings most of the time

Nearly all the time. Only time you're likely to see a char[] is in high performance code needs to mutate strings in a space efficient manner.

I would also mention CharSequence, an interface implemented by several string like classes.

lmm · on June 9, 2020

> You might find this a bizarre criticism but In Java, Ruby, Elixir, Rust strings are all Unicode with escape hatches for manually handling byte arrays (that could be C strings).

Java's strings have noticeable misbehaviour for astral characters, and since the language is built around a single hardcoded string type, almost all Java programs inherit this. Ruby had to have a major migration to fix its string support. Rust is much, much younger than Haskell (so it hasn't had to go through a migration yet) and is famous for having at least 6 different string types.

tome · on June 9, 2020

Astral characters?

lmm · on June 9, 2020

Unicode characters that lie outside the Basic Multilingual Plane.

tome · on June 8, 2020

Another great comment on this thread which could be a blog post of its own :)