I maintain two of the crates you called out so here is a bit more detail on the ...

Animats · on Dec 30, 2016

"serde_json" uses an unsafe assumption that a slice of bytes is valid UTF-8 in two places. This is either for performance or for maintainability, depending on how you look at it. Performance is the more obvious reason but in fact we could get all the same speed just by duplicating most of the code in the crate.

Could that be done safely with a generic, instantiated for both types?

dtolnay · on Dec 30, 2016

Yes, that is what we already do. The two unsafe UTF-8 casts are the two critical spots at opposite edges of the generic abstraction where the instantiation corresponding to UTF-8 string needs to take advantage of the knowledge that something is guaranteed to be valid UTF-8.

What we have is as close as possible to what you suggested.

As I mentioned, we could get rid of the unsafe code in other ways without sacrificing performance. Ultimately it is up to me as a maintainer of serde_json to judge the likelihood and severity of certain types of bugs and make tradeoffs appropriately. There are security-critical bugs we could implement using only safe code, for example if you give us JSON that says {"user": "Animats"} and we deserialize it as {"user": "admin"}. My judgment is that using 100% safe code would increase the likelihood of other types of bugs (not related to UTF-8ness) and the current tradeoff is what makes the most sense for the library.

From another point of view, performance and safety are synonyms in this case, not opposites. If we use 0.1% unsafe code and perform faster than the fastest 100% unsafe C/C++ library (which is what the benchmarks show for many use cases) then people will be inclined to use our 0.1% unsafe library. If we give up unsafe but sacrifice performance, people will be inclined to use the 100% unsafe C/C++ alternatives.