Everything you never wanted to know about file locking

tedunangst · on Dec 13, 2010

Every BSD system I'm aware of has something called lockf, which may or may not be layered on top of flock (or the other way around).

Note that all of these locking schemes fall down hard on NFS, depending on client and server, regardless of what any man pages say about support. Homework question: How does a stateless protocol like NFS remember when one client locks a file and a different client tries locking it?

The failure mode on NFS can vary from "lock always succeeds, regardless of actual status" to "lock always fails, hanging forever", or my favorite, "locks work, unless your process crashes, and then you can never lock that file again until you reboot the NFS server".

DanielRibeiro · on Dec 13, 2010

I had discovered this the hard way. Replaced with Apache Zookeeper, which now is much more stable than at the time. It is meant to be a port of Google's distributed lock server Chubby.

tptacek · on Dec 13, 2010

Worth reading all the way to the bottom; the Python stuff made me giggle.

glhaynes · on Dec 13, 2010

Good article. I don't quite understand this part, though:

Still not convinced [that you shouldn't use mandatory locking]? Man, you really must like punishment. Look, imagine someone is holding a mandatory lock on a file, so you try to read() from it and get blocked. Then he releases his lock, and your read() finishes, but some other guy reacquires the lock. You fiddle with your block, modify it, and try to write() it back, but you get held up for a bit, because the guy holding the lock isn't done yet. He does his own write() to that section of the file, and releases his lock, so your write() promptly resumes and overwrites what he just did.

If you're going to read some data and then potentially write modified data back over what you just read, shouldn't the first step before you even read it be to get a lock on the file or that range? Or, if your calculation might take a long time, at least get a lock right before you write it back, verifying first that you're writing over data that hasn't changed while you were off calculating?

stingraycharles · on Dec 13, 2010

That was my first intuition too. It sounds like a classic race condition, in which case you would simply acquire the highest level of locking (exclusive) before reading the data, to make sure all your accesses to the shared data are synchronized.

But I'm pretty sure I'm misunderstanding what the author was trying to say.

IgorPartola · on Dec 13, 2010

He is talking about mandatory locks, where if I lock the file and do things properly, you cannot do operations on it, even if you disregard the locks. In other words, it's a "convenience": I can structure my program in such a way as to "guarantee" that I will have the exclusive lock on the file and even your "buggy" code won't be able to overwrite it despite the fact that your code does not acquire any locks. Seems like the perfect solution, until the madness described in the OP ensues.

Also, mandatory locks are used in Windows. Ever try to delete that virus.exe while it's still running? Yeah.

glhaynes · on Dec 13, 2010

I think I understand that, but it's still not clear to me how the type of the lock is the cause of the problem: the problem as far as I can see it is that the hypothetical programmer isn't acquiring a lock (regardless of type) before starting their operation and then holding it until completion. Mandatory locking certainly has its issues/annoyances, but in terms of the situation I quoted, if used properly [1], it works fine and there is no risk of unexpected results due to races.

[1] acquire lock for maximum level of operations you might need and hold it all the way through to completion, or to avoid holding a lock on the object for too long, get and release multiple locks verifying as needed that the data hasn't changed each time

tedunangst · on Dec 13, 2010

Mandatory locks lead people to believe that they are safe. Advisory locks come with no implicit promises, so people know that they only protect against well behaved apps (generally other instances of the same program) and adjust program behavior accordingly.

js2 · on Dec 13, 2010

This excellent writeup inspired me to publish some old dotlocking code I had lying about - https://github.com/jaysoffian/dotlock

kelnos · on Dec 14, 2010

So it turns out only one sentence near the end really matters:

"I guess lockfiles are the answer after all."

Yup. Don't use the locking APIs. Just use lockfiles be done with it.

I'd actually go a step further and suggest only using lock directories. Using lockfiles assumes O_CREAT|O_EXCL works properly everywhere (which is probably a safe assumption, but...). mkdir() will return EEXIST if you fail to acquire the 'lock'.

yread · on Dec 13, 2010

Cool writeup!

This is apparently because some versions of Windows don't understand shared locks

Which versions? It seems even the Windows 95 had FILE_SHARE_READ and FILE_SHARE_WRITE which are effectively shared locks (it just didn't have the FILE_SHARE_DELETE)

TimJYoung · on Dec 13, 2010

The author is referring to the LockFile API call, which only supports exclusive byte-range locks, vs. the LockFileEx API call, which does support shared byte-range locks, but is only available on the Windows NT variants (NT 4 and above), not the Windows 9x variants (95, 98, and ME).

danwolff · on Dec 13, 2010

Ah, the cross-disciplinary have vs. understand.

known · on Dec 13, 2010

http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc....