There are so many things in here that tempt me to comment about, so here goes: 1...

ars · on Nov 24, 2014

I fail to see how an exception vs an error code makes any difference whatsoever.

The whole problem is that the library doesn't notice the error. If you don't notice an error you are not going to use an exception either.

Not to mention that an exception by default terminates the program, which is exactly what you don't want to do.

pilif · on Nov 24, 2014

The library noticed the error and was (effectively) calling exit().

If the library was at liberty to throw, they would not have to change any internal callers and would get the same feature for them (not having to deal with errors anywhere) but they would still give the external caller a chance not to terminate.

Yes. By default, a caller would terminate, but it would get a chance not to in a way that doesn't involve changing the libraries public interface.

adamtj · on Nov 24, 2014

Exceptions are not a silver bullet. One can't simply turn an assert into an exception and expect things to work. That's a great way to add new and tricky bugs unrelated to the original issue.

If an exception crosses a stack frame that isn't expecting one, things could be left in an inconsistent state. Invariants may vary, constraints may be unconstrained.

To retrofit exceptions into the existing code at this point would be very difficult, as you would have to audit every caller of the function in question, a nd the callersof those, and on and on...

Imagine problems like Apple's "goto fail" SSL bug, but more obscure and harder to find.

To have used exceptions from the beginning would require an amount of work roughly equivalent to returning error codes. They aren't fundamentally different from error codes in that way. Well, unless you're a cowboy who ignores errors.

Exceptions wouldn't have solved anything here.

cbsmith · on Nov 25, 2014

> Exceptions are not a silver bullet.

There are a lot of things that aren't silver bullets. Oddly though, they are still helpful.

> One can't simply turn an assert into an exception and expect things to work.

Actually, if you did that, you'd expect them to fail... but with better semantics.

> That's a great way to add new and tricky bugs unrelated to the original issue.

Only if your runtime doesn't allow for the clean expression of stack unwinding semantics.

> To retrofit exceptions into the existing code at this point would be very difficult.

Yes, I don't think that was at all related to the original point though. It'd be hard to retrofit almost any new language feature in to the existing code.

> To have used exceptions from the beginning would require an amount of work roughly equivalent to returning error codes. They aren't fundamentally different from error codes in that way. Well, unless you're a cowboy who ignores errors.

Not at all. Error codes require error handling in every caller up the call chain. Exceptions at least allow you to put your error handling logic just where you have handlers.

gohrt · on Nov 25, 2014

> Only if your runtime doesn't allow for the clean expression of stack unwinding semantics.

The point here is that if a library has already chosen to not cleanly unwind anything, your application catching the exception isn't going to help the garbled state created by the library.

adamtj · on Nov 25, 2014

I think cbsmith's point is that a a good language makes it so easy to correctly unwind the stack that it's almost harder to not do it right. If we could go back in time and do it over using exceptions and a better language, this likely would have been an exception from the start, which would have made this bug easier to fix. For example, Python's "assert" raises instead of exiting the process. The caller could just catch the exception from the failed assertion and deal with it, without need to update any other code. The problem is, we can't easily change languages now, nor can we safely retrofit exceptions. Hindsight is 20/20.

It's worth noting that that argument depends on the exception-raising assertion having been present from the beginning. The correctness of stack unwinding code can sometimes depend of which exceptions a function raises. A particular piece of code that is currently correct may break if one of the functions it calls is changed to raise a new type of exception.

cbsmith · on Nov 26, 2014

That's essentially it. The whole, "yeah but the library may not handle exceptions properly" is essentially a C++ problem, not an exceptions problem.

willvarfar · on Nov 24, 2014

An assert is what a programmer uses when they think something won't/can't ever happen.

So the authors did not realise inotify could legitimately return an error.

If they'd thought it could return an error, they'd return the error code themselves or recover gracefully or whatever.

vidarh · on Nov 24, 2014

The library can do the equivalent by offering a new interface to register an error callback.

Even without that, callers do have a way of preventing termination: Fork.

tedunangst · on Nov 24, 2014

Use atexit to register a handler that setjmps back to main.

cnvogel · on Nov 24, 2014

Hey, I like that. Always nice to use the more obscure things messing with the stack (set/longjmp) ;-)... Haven't encountered them for a while.

http://pastebin.com/2fdwHvi8

    [optiplex /home/chris/tmp/atexit_main]
    $ ./testme
    This is main. I will sleep a while, then try to exit.
    PANIC! main() has been restarted due to a unscheduled exit!
    This is main. I will sleep a while, then try to exit.
    PANIC! main() has been restarted due to a unscheduled exit!
    This is main. I will sleep a while, then try to exit.
    PANIC! main() has been restarted due to a unscheduled exit!
    This is main. I will sleep a while, then try to exit.

travem · on Nov 24, 2014

> Just removing a convenient feature because of some risk that the feature could possibly be abused by an admin IMHO isn't the right thing to do.

From my reading of the article the author saw this issue in the wild (the assert) and then developed the 1000 touch approach to demonstrate the issue.

TheDong · on Nov 25, 2014

1) Exceptions are not a solution and are in fact no different than error codes in this case. If your init system throws an exception, what does that mean? It means your computer goes down.

The actual solution here is Maybe / Optionals (depending on your language of choice) as Haskell and Rust and Scala have. These would require the caller to unwrap the result, and thus make this bug actually impossible. The "returns -1" really is the equivalent of a null pointer exception which is an issue that is solved by Options not by Exceptions. Exceptions are definitely not a solution in any way.

2) I didn't see the "too complicated" between the lines bit. However, you're wrong about having to be root to trigger this. The watch, if you missed it, was on all of /etc. There are many folders in /etc which are not root owned. Furthermore, just because you have to be root to cause unexpected behavior means little in this case. Sure, this won't likely be exploited maliciously to take down a box and if that were the fear it wouldn't matter. That wasn't the concern though. The concern is this happening by accident. Even if a program is running as root, it shouldn't be able to bring down the box purely by accident.

3) A patch that prevents an init system from crashing at the expense of missing a file change (which was under /etc and quite possibly not even relevant to the init system) isn't that bad. Take it and add an issue to improve it (e.g. by directory walking as suggested). Seriously, your init system crashing because a few programs decide to write a few hundred default config files to /etc, is dumb.

For an easy example: using ansible-galaxy to fetch a role/playbook, by default (at least in the past) installed in /etc/ansible/roles/$ROLENAME. These roles can be on the order of several thousand files. The init system need not care about this at all.

The most important point here, though, is that Exceptions are absolutely not the solution here and, in fact, we've known that exceptions are fundamentally wrong for the last 40 years and yet people still learn them as part of Java and think they're good design.

michaelmior · on Nov 24, 2014

> Being able to quickly change something in /etc/init and then have the system react to that is actually very convenient (and a must if you are pid 1 and don't want to force restarts on users).

I'm not sure I see how it's a must. Why not just have init respond to SIGHUP like so many daemons and reload its configuration then?

vidarh · on Nov 24, 2014

And that points to a fairly simple defensive practice to avoid this type of problem:

Run it in a different process. Have said process signal if a change is found. If they want to handle the case where /etc/init.d is potentially huge and you don't want to rescan everything, have it write changes via a pipe.

(In fact there's a program that will do that for you: inotifywait)

userbinator · on Nov 24, 2014

Agreed, it's pretty standard behaviour to signal daemons to cause them to reload config. SIGHUP'ing it is how you get sysvinit to reload its config too.

I see why and I also understand that fixing it isn't that easy. Software is complicated.

The flip side of this is that perhaps we should aim to reduce the complexity of software. In this case, it's not like libnih is providing any significant increase in functionality over just using inotify anyway.

ja30278 · on Nov 25, 2014

1. Exceptions are basically 'goto' with a pretty wrapper. Over time, I've come to believe that the best place to handle an error is close to where it occurs, rather than throwing some hail mary message up the stack and hope someone else can deal with it.

2. Well, it certainly is true that more code and more functionality is likely to contain more bugs. Whether your pid 1 needs to be doing all the things that upstart does is obviously a contentious issue.

3. This is a toss-up. I usually like 'crash loudly' over 'fail silently', but I think rejecting the patch outright was a poor decision.

gohrt · on Nov 25, 2014

> Exceptions are basically 'goto' with a pretty wrapper.

Famously, "for loops" are basically 'goto' with a pretty wrapper.

cbd1984 · on Nov 25, 2014

> Famously, "for loops" are basically 'goto' with a pretty wrapper.

Trivial response: for loops don't potentially involve jumping into code in a completely different file, which could have been written by someone else.

More substantive response: for loops don't imply a specific philosophy about error handling.

That said, I do like exceptions, as I think they encourage a healthy ignorance in most code, as long as the code which can handle the exception does handle it. But that's just a "don't write bad code" admonition, and you can abuse any philosophy to write bad code.

3pt14159 · on Nov 25, 2014

1. It is a different mindset. Handling each error code until your program is correct top to bottom makes for a much more stable existence, but it's harder when you rely on something that hasn't handled an error code. With exceptions it is easier to work with shitty middleware. Also, in environments where exceptions are idiomatic, there are ways of cleaning up aborted code and safely entering exception handlers. Far more straightforward to reason about than `GOTO 34550`.

I agree with your other two points though.

dilap · on Nov 24, 2014

Yeah, asserts are pretty evil, too.

I like Go's approach: no asserts, and only use exceptions for really exceptional stuff. (Go calls exceptions panic/recover, and tweaks the recover syntax in such a way that it's much less tempting to use it for normal error handling.)

From an outsiders perspective, with zero-knowledge, it seems like libnih should be propagating back the buffer-full error.

I agree that silently failing to reload config files in only certain circumstances could be just as bad or worse behavior than just crashing, but both behaviors seem really really bad in core system software, n'est pas?

ridiculous_fish · on Nov 24, 2014

Out of curiosity, why do you like no asserts? I use assertions extensively, both as a way to check my logic and to "document" invariants.

The Go FAQ defends this decision by talking about error handling, but using assertions for error handling has always been wrong.

When events are dropped, init should manually scan the config files for changes until the event queue catches up again.

dilap · on Nov 25, 2014

So assert is an interesting combination of several things:

- A check that can be disabled at compile time - A check that blows up the program if it fails - A check that has a really convenient syntax

The "disabled at compile time" bit is pretty much insane (at least in a language with side-effects) -- you'll inevitably end up accidentally having some sort of side-effect in your asserts, and if you turn them off for your production build, you're basically inserting a whole brand-new untested configuration, running in a build that is particularly hard to debug. But probably most people just leave asserts "on" all the time. (Even if you don't have side-effects in your asserts, you're still enabling untested run-paths, which is going to give weird, hard to debug error reports.)

The second thing is blowing up the program. Sometimes, this really is all you can do, and every language has some way for this to happen -- take the head of an empty list, access beyond an array bounds, whatever.

The problem is combining it with the really convenient syntax -- it's just very tempting to assert conditions which you should actually be handling more robustly, because assert is the easiest thing to write.

I'm doing a mix of Go server-side dev and Obj-C client side dev at the moment, and I do use asserts in Obj-C land.

I try to only use them only for truly "programmer screwed up" conditions, which should all be caught during development. But we still get crashes now and then from real users from asserts that shouldn't have been asserts; we fall into temptation.

So I suspect that actually we might be better of in Obj-C land, too, avoiding asserts.

(I also like the invariant-style thinking and documentation quality of asserts, though.)

TheLoneWolfling · on Nov 26, 2014

There are languages where one cannot have side-effects in assertions.

And there are languages where assertions throw a standard (i.e. catchable) exception.

jandrese · on Nov 24, 2014

Seems to me a better approach might be to catch the error, and then retry the operation until it succeeds, maybe crashing out if no process is made after enough time (the reader died).

This will cause the library to stall, which might be unacceptable to some people, but if simply noting the error and dropping it is unacceptable then you really don't have a choice.