Hacker News new | past | comments | ask | show | jobs | submit login

> It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that.

Yeah, we've been bitten by this too, around once a year, even with our fairly reliable and redundant network. It's a PITA, your process just hang and there's no way to even kill it except restarting the server.




> It's a PITA, your process just hang and there's no way to even kill it except restarting the server.

If you can bring the missing server back online, the NFS mount should recover.


This sounds like a Linux client bug (failure to properly implement the “intr” mount option), not the fault of NFS itself.


It’s a failure to use the intr mount option. I’ve never had a problem using soft mounts either, which make the described problem non existent


intr/nointr are no-ops in Linux. From the nfs(5) manpage (https://www.man7.org/linux/man-pages/man5/nfs.5.html ):

> intr / nointr This option is provided for backward compatibility. It is ignored after kernel 2.6.25.

(IIRC when that change went in there was also some related changes to more reliably make processes blocked on a hung mount SIGKILL'able)


This is too bad. The sweet spot was "hard,intr" at least when I was last using NFS on a daily basis (mid 1990s). Hard mounts make sense for programs, which will happily wait indefinitely while blocked in I/O. This worked well for things like doing a build over NFS, which would hang if the server crashed and then pick right up right where it left off when the server rebooted.

Of course this is irritating if you're blocked waiting for something incidental, like your shell doing a search of PATH. In those cases you could just control-C and continue doing what you wanted to do (as long as it didn't actually need that NFS server).

However I can see that it would be difficult to implement interruptibility in various layers of the kernel.


I think the current implementation comes reasonably close to the old "intr" behavior.

AFAICT the problem with "intr" wasn't that the kernel parts were impossible to implement in the kernel, but rather an application correctness issue, as few applications are prepared to handle EINTR in any I/O syscall. However, with "nointr" the process would be blocked in uninterruptible sleep and would be impossible to kill.

However, if the process is about to be killed by the signal, then not handling EINTR is irrelevant. Thus in 2.6.25 a new process state TASK_KILLABLE was introduced (https://lwn.net/Articles/288056/ ), which is a bit like TASK_UNINTERRUPTIBLE except the task can be interrupted by a fatal signal, and the NFS client code was converted to use it in https://lkml.org/lkml/2007/12/6/329 . So the end result is that the process can be killed with Ctrl-C (as long as it hasn't installed a non-default SIGTERM handler), but doesn't need to handle EINTR for all I/O syscalls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: