This is a great example of how bad many open source projects are at accepting co...

masklinn · on Nov 24, 2014

> The patch is just rejected

Technically the patch isn't rejected (and I'm kinda peeved TFA claims it is). It's in limbo, waiting for further action from the submitter or an other contributor: it's marked as needing improvements since it hides the issue under the rug instead of reporting it to the caller/user.

This is a rejected patch: https://code.launchpad.net/~jamesodhunt/libnih/bug-776532/+m..., it has a "Status: Rejected" set (and a disapproving review).

Although of course the patch could have been merged and improved later, so that libnih wouldn't blow up the whole system in case of inotify overflow.

marcosdumay · on Nov 24, 2014

The OS disobeying your configuration files is really better than it restarting?

Looks to me that Upstart kept the lesser of two evils. I'd reject that patch too.

masklinn · on Nov 24, 2014

> The OS disobeying your configuration files is really better than it restarting?

It does not restart, it locks up with a mostly useless error message, then, maybe, at one point, possibly restarts. The restart is not a policy decision it's a side effect of the box being dead. Here's the better option: bubble up the issue to the caller and let it decide what to do.

> Looks to me that Upstart kept the lesser of two evils.

Upstart didn't do anything.

> I'd reject that patch too.

The patch isn't rejected.

mynameisvlad · on Nov 25, 2014

> The patch isn't rejected.

Just because the reject status wasn't used does not mean it wasn't rejected. The patch, as it was written, was rejected. It won't be used and the recommendation was a complete rewrite in a completely different direction.

nhaehnle · on Nov 24, 2014

People legitimately object to accepting a patch that fixes one issue but can lead to other problems down the line. IMHO, even not accepting the patch, but with suggestions for additional improvements would have been better behaviour.

As it stands, the communication was:

1. There's a problem, here's a patch.

2. NAK, because (valid reasons).

3. Radio silence.

Perhaps better behaviour would have been:

2. NAK, because (valid reasons). However, I acknowledge the problem; perhaps you can fix it (some other way).

This way, the discussion is more likely to keep going and end up with a proper fix to the issue.

IshKebab · on Nov 24, 2014

The communication to me looked more like:

1. There's a problem, here's a patch.

2. We won't apply it because it doesn't fix it perfectly. Sure it's better than what we did, and you offered the patch for free out of the kindness of your heart, but we aren't going to apply it until you work on it some more.

3. But... it's better in every way!

If you write a patch for a project that is an improvement in every way, but not yet perfect and they don't apply it... Well you're probably not going to spend much more time helping that project are you?

phkahler · on Nov 24, 2014

>> But... it's better in every way!

And this is where we disagree. I believe it's worse. The original crashes the system, which is really bad. The cases where this happens are few, and people are going to be aware of what they did to cause it (unzipping a bunch of files in there was suggested as a trigger). With the proposed patch, nothing will happen. The user will not be aware of the issue and the system will not update with the changes. The user may not even notice their action had no effect and if they do, it'll be a harder mystery than the crash to figure out.

Some prefer the system to crash than silently ignore input. Error codes and messages are better than either of those.

ChuckMcM · on Nov 24, 2014

I suspect the 'goodness' or the 'badness' of the patch is contained in what init does when it misses notifies. Clearly the kernel has dropped some on the floor as it ran out of space, and it tells you this. What libnih does is then scream and shout and abort (which is a fine first configuration since you don't know how common this will be) but when you discover it does happen, you consider ignoring it, if the error is idempotent. Meaning of course if you ignore it, do you later get a notify when the kernel has more notify buffers to use? To understand that you need to read the notify code in the kernel so see how it is generating notifications.

If the kernel drops and never returns a notification, then init has to know that it missed some in order to operate under the correct set of init files. That requires a combination patch to init and to libnih.

If the kernel gets around to notifying anyway, just more slowly, then you can safely ignore it, init will eventually get the message the 'regular' way and you're done.

Given the bug, the next step might be to see if systemd suffers a similar challenge in the presence of a lot of config changes.

x0x0 · on Nov 24, 2014

I came here to say, well, something.

Obviously, the patch sucks -- it just ignores the error, instead of doing the right thing.

OTOH, init not shitting the bed and taking your whole OS down is probably a good thing, even if getting there is imperfect.

geofft · on Nov 24, 2014

I'm pretty sure James Hunt is a core Upstart developer, at least these days. Scott James Remnant, the original author, stopped being the lead maintainer when he left Canonical, but I don't remember if that had happened by 2011.