Richard Hipp (creator of SQLite) had this to say about Rust and SQLite in the co...

pcwalton · on July 12, 2018

One problem with this argument is that SQLite is primarily used as a source-level embeddable library. That is, most users of SQLite don't use the official binaries and instead build the source code themselves. So, in practice, the source code, not the official blessed binary, is what matters. When upgrading compilers, developers of apps that embed SQLite don't typically check to ensure that upstream SQLite has tested the new version of their compiler. They just upgrade their compiler and assume that the new version will continue to compile their SQLite source properly. If the new version of the compiler happens to compile programs with undefined behavior differently, then problems can arise.

SloopJon · on July 12, 2018

Right, and the much vaunted test suite is proprietary, so the typical end user can't reproduce this test of "every single instruction."

Not that I'm taking sides here. I'm really interested in both the extensive testing that the SQLite team does, and the analysis that John blogs about.

blub · on July 12, 2018

He's not wrong, SQLite is a very well tested piece of software and probably the best that can be done safety-wise in C. Still, as it was pointed out by pcwalton below, and also last time this topic was discussed, there are some use cases where tests done on the entirety machine code do not guarantee that UB will not occur.

For one, it's quite likely that an embedded platform's toolchain will not be part of the SQLite test configurations. Secondly, SQLite can be and is compiled into a binary, and this means that all bets are off, especially if LTO is enabled. Thirdly there are products that build on SQLite, such as its own commercial encryption extension and other extensions from third parties. The former probably enjoy the same level of testing, but it's not clear how the latter are tested.

The conclusion is that it's humanly impossible to write memory-safe C, even with 100% test coverage, static and dynamic analysis. Something like Frama-C is required, which is virtually unheard of for the majority of open source and commercial software.

dbaupp · on July 13, 2018

> In the case of rust we get to call these “compiler bugs” whereas in the C-language world such occurrences are more often labeled “undefined behavior”.

C has both compiler bugs and undefined behaviour. Undefined behaviour is an inherent property of the C standard, while a compiler bug is a property of the implementation (a place where it doesn't match the standard).

A valid argument along the same lines might be that the Rust compiler has existed for less time and is used less than C compilers, and therefore is more likely to contain bugs.

> Because it is a newer language, it does not have (afaik) tools like gcov that are so helpful for doing machine-code testing.

Coverage tools work on Rust, such as kcov. I'm not sure of the state of gcov itself though.

> Nor are there multiple independently-developed rust compilers for diversity testing.

Isn't diversity testing only necessary/good because there are many C compilers? Using your phrasing, if the code compiles and runs correctly (i.e. every single machine instruction is checked) with the one Rust compiler that exists, then it works.

There's definitely many reasons why a language having multiple compilers is good, but I think "diversity testing" is circular logic.

IshKebab · on July 12, 2018

Undefined behaviour is not a compiler bug - it is deliberate.

And having undefined behaviour in your C code is definitely not a good thing, even if it is basically unavoidable.

The real problem is that the C and C++ standards cop out to UB in too many places, e.g. with things like type aliasing and people reasonably think "weeeell, it may be UB but it works now and I need it so screw it" and then you have a mess of programs relying on de facto non-standard behaviour which is shit.

The C people just need to officially define some of the de facto behaviours.

Rust doesn't have this problem because it doesn't leave so many basic things undefined.

cryptonector · on July 12, 2018

> And having undefined behaviour in your C code is definitely not a good thing, even if it is basically unavoidable.

If it's literally unavoidable, then the language specification is BROKEN.

Now, most C UB is avoidable, but it's very difficult to notice some UB, and most compilers aren't that good at telling you about the UB they exploit. In this sense UB is unavoidable in that human programmers may often write code with UB without noticing.

If it's only "practically unavoidable", not literally, then the language specification and/or the compilers (by failing to warn about it) are BROKEN.

You cannot blame C programmers, not anymore. The committee has been much too aggressive in its zeal to speed up C by adding more UB cases. We've reached the point where compiler outputs run very fast because all the important bits have been elided by the optimizer, breaking the program in the process. We, the users of the language, have been pushed to the breaking point by the committee and the compiler groups. Please stop. And don't just stop, revisit some of the worst UB decisions.

Yes, even C89 had lots of footguns, but UB was much more manageable.

The only reasons I myself have not yet abandoned C are: a) I haven't learned Rust yet, b) many codebases I work with are C codebases and won't get rewritten in Rust anytime soon, c) it takes time to get enough critical mass. (c) is happening though, and (a) is, for me, just a matter of time; (b) I can solve by moving on to new things, but the world is full of legacy code that we can't just abandon/rewrite, so moving on isn't exactly likely.

akira2501 · on July 12, 2018

> The C people just need to officially define some of the de facto behaviours

Sure, as soon as the all the different ISA people officially define some of the de facto behaviors. UB isn't in the standard "just because" it's in the standard because there is no apparent underlying standard.

cryptonector · on July 12, 2018

Aliasing rules, for example, have nothing to do with ISAs. Neither do pointer comparison rules, and many others besides.

The rule about memcmp() with invalid pointers by length zero does have to do with actual systems, but it can still be standardized and the vendors with now-non-compliant memcmp() implementations just have to fix it. This has happened before (e.g., snprintf()), so the ISA thing is a total cop-out.

zlynx · on July 12, 2018

Weird ISAs are exactly why you cannot compare pointers. Segmented memory for one. Or imagine an OS and compiler that implemented automatic overlay switching. With that and PAE on 32-bit x86 systems you could have special "far overlay" pointers returned from malloc calls which would map in different 1 GB overlay sections when accessed.

Aliasing rules are important in some ISAs too. Like weird DSPs. Imagine a system where 32-bit objects can't even share the same memory space as 8-bit objects. Casting a pointer to a different sized type is completely meaningless there. Of course programming such a weird thing is usually done in assembly, but there are C compilers.

cryptonector · on July 13, 2018

I'm not familiar enough with C on segmented architectures, so I can't quite speak to that, but I was referring to [0], which clearly has nothing to do with segmented architectures.

As to aliasing, ISAs too had nothing to do with the reason for aliasing rules, but rather optimizations for functions like memcpy() (as opposed to memmove()).

[0] https://news.ycombinator.com/item?id=17439467

mattnewport · on July 13, 2018

There are other aliasing rules that have big performance impacts on certain architectures. The Xbox 360 and PS3 Power cores for example had a severe load-hit-store performance penalty that tended to be triggered by code that moved data between floating point and integer registers via memory. Strict aliasing rules that allow the compiler to assume float and int pointers don't alias could make a huge performance difference but those rules are also the source of much troublesome undefined behavior for code that does intentional type punning.

The ISA in this case requires going via memory to move data between fp and integer registers and certain implementations of that ISA had major performance impacts associated with that. In this case UB rules really did allow for valuable optimizations but really did cause trouble elsewhere.

cryptonector · on July 13, 2018

The ISA you describe doesn't require aliasing rules. It merely gives you an incentive to have them.

C and other languages need much better control over aliasing than the 'restrict' keyword and compiler command-line switches.

Tomte · on July 12, 2018

I don't find that comment very impressive, but further down he has commented some more, and I'm fully in agreement there:

"The disagreement is not over whether or not UB is a problem, but rather how serious of a problem. Is it like “Emergency patch – update immediately!” or more like “We fixed a compiler warning” or is it something in between."

pjmlp · on July 12, 2018

Compiler bugs are supposed to get fixed.

In C regarding UB, everything goes every single time the compiler gets upgraded.

dangerbird2 · on July 12, 2018

In many cases, what is considered UB in the C standard is a widely accepted and documented extension in the vast majority of compilers. For example, nonstrict aliases are UB in ISO C, but are defined language extensions in MSVC, GCC, clang, and many other compiler vendors.

pjmlp · on July 12, 2018

Now try to maintain a large C codebase stable and safe across such variety of compilers.

ignoramous · on July 12, 2018

Previous discussion: https://news.ycombinator.com/item?id=11312918

Related: https://news.ycombinator.com/item?id=11337399