Hacker News new | past | comments | ask | show | jobs | submit login
DieHard: An error-resistant memory allocator for Windows, Linux, and Mac OS X (github.com/emeryberger)
64 points by lgeek on Nov 16, 2014 | hide | past | favorite | 17 comments



What a hack! They have an "automatic patch generator" which "fixes" programs with buffer overflows and dangling pointers. (Ref: https://github.com/emeryberger/DieHard/blob/master/docs/pldi...) If a buffer overflow is detected, the code is patched to make the buffer bigger. If a dangling pointer (use after free) is detected, a time delay is inserted so that the buffer is released some time later than the "free" call.

That this is useful is a good argument to get out of C/C++ and into Go, or Rust, or just about anything with subscript checking.


What you are referring to is not DieHard but rather Exterminator. DieHard probabilistically tolerates memory errors, while Exterminator repairs them.

Both rely on a carefully designed semantics under which one can provide a reasonable interpretation for erroneous computations. They employ sophisticated randomized algorithms and/or statistical inference procedures and contain proofs of their effectiveness. That's not exactly a hack.

The fact is that managed languages that provide memory safety guarantees -- through strong type systems, mandatory bounds checks and automatic memory management -- are of course preferable to C/C++/Objective-C from a software engineering perspective.

But the fact also remains that most of the code we use today is written in unsafe languages, including the runtime systems that undergird managed languages. Since these programs almost certainly contain errors, it makes sense to accept that fact and work to provide solutions.

I'd encourage you to read this article, which appeared in Communications of the ACM - "Software Needs Seatbelts and Airbags", which contains an in-depth discussion of these issues.

http://cacm.acm.org/magazines/2012/9/154577-software-needs-s...

That content, which initially appeared in ACM Queue, is freely available here on my blog:

http://emeryblogger.com/2012/05/31/software-needs-seatbelts-...


I think there is a misconception around GC, you could still be retaining memory you didn't intend to. It won't lead to a dangling pointer, but there is still a need for some kind of assertion of the effect: "I know this object should be collected at this point in time", and some complex tools to analyse the retention path. Because even if the object is still valid on a memory point of view, it could still be in a inconsistent state if you thought it was not retained, but it is, and now it goes into a pipeline of code that can't handle its state etc. Imagine a closed file handle that you forgot to remove from a collection or something like that.

The memory errors in C++ are not exactly memory error, they are, like other errors, a misconception on the state of the object graph. And it's hard to find and fix.


To accommodate SimCity (and I strongly suspect other programs), Windows would detect it and run a special version of the deallocator that allowed its form of use after freeing (see e.g. http://www.joelonsoftware.com/articles/APIWar.html).

I've been using C since ~ 1980 (sic), loathe C++, and, yeah, I'm going to investigate Rust in due course.


DieHard directly inspired the Windows Fault-Tolerant Heap, and follow-on work (DieHarder) inspired security hardening features in Windows.

Follow-on work from DieHard, which probabilistically tolerates memory errors, includes the following: Archipelago, which trades virtual address space for reliability; Exterminator, which automatically corrects memory errors with high probability; and DieHarder, which secures the heap against attack. The Github repo contains the code for all of these.

More info here -

http://emeryberger.com/research/diehard/

http://emeryberger.com/research/archipelago/

http://emeryberger.com/research/exterminator/

http://emeryberger.com/research/dieharder/

DieHarder talk at Woot 2011: https://www.usenix.org/conference/woot11/dieharder-securing-...


Yes I understand you are a master kernel hacker uber programmer. I understand you absolutely positively need bare metal performance.

However the vast majority of all C code ever written does not; most code is written by mere mortals and contains various bugs, including memory bugs and undefined behavior.

Why can't the C standards committee introduce a memory safe mode for C? Make it opt-in. Prohibit aliasing without special overriding casts. Bounds check array accesses.

I'm willing to trade some performance to prevent the next Heartbleed; I can throw VMs at my problems very cheaply. Having to tell everyone on the Internet to change their passwords (and every server op to get a new SSL key) is a multi-billion dollar waste of human capital for no reason. And that's just one C memory error. New zero day vulnerabilities, many due to C memory errors are continuously discovered. How many more Heartbleeds does it take?


I think there are compiler flags available that will do this (at least the bounds checking).

When talking about C it's important to understand why the standard is the way it is. C was designed to be a portable language. Not in the sense that code is easily portable, but that the language itself is easily portable by writing a new compiler for it on whatever OS you're using. As such, the C standard is written to make C an easy language to write a compiler for, which is why it has so much undefined behavior.


I don't think bounds checking can be done at compile time.

Otherwise Java, Rust, Go et al. wouldn't need to incur the runtime overhead of the extra bookkeeping int per array.


-fbounds-checking, I'm sure there is a runtime cost, but it's provided by the compiler.


Not applicable to C -- "Currently only supported by the Java and Fortran front ends."

https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Code-Gen-Option...

There have been bounds-checked implementations of C, but the runtime cost is very large. The problem is that to do it, you need to check not only explicit array references, but pointer dereferences as well -- and in order to make that work, each pointer needs to carry around with it its associated bounds. That, in turn, means that your pointer representation has to be something more (or other) than just a plain machine address.


I'm not an expert, but you can do this in a language with dependent types such as Idris. But given the historical progress of the industry, it will probably be a long, long time before such an idea percolates down into an industrial language.


> Why can't the C standards committee

Wrong people. It would be the compiler writers who would do that, since they know the target machine.


Seems like it is quite old, docs talk mostly about getting Firefox 1.5 running on XP SP3.


[2012]


Actually, the codebase continues to be actively updated.


Good to know! I wasn't really commenting on the utility of the software, but rather the news-ness of it. I guess I could have just as easily said 2005? Or, the mysterious phenomenon of Hacker News could have decided it was time to talk about Hoard.


Sure, it's not officially news, but it's apparently news to Hacker News.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: