Exposing kernel bits via the filesystem makes less sense the more you think about it:
* You have to build a state machine inside the kernel to handle cases like applications reading a file one byte at a time.
* If the API is complex, you get to build parsers (aka, the easiest way to introduce buffer overflows in C) in kernel-mode.
* Programs now have to deal with malicious applications capable of managing mountpoints giving fake results via the filesystem. I could link to /dev/random to /dev/zero, how many programs are going to check for that?
* You can't let the program go into a chroot jail if it needs to read the kernel's magic filesystem.
* You have to mingle filesystem access bits with kernel security checks for process capabilities and the like.
It's definitely not a simple interface for the kernel to implement, and, quite frankly, it's much more complex for a security-minded application to poke at the kernel through a filesystem than it is through a syscall.
None of these should actually be a problem in practice.
> * You have to build a state machine inside the kernel to handle cases like applications reading a file one byte at a time.
You need to have this anyway, because regular files exist. Also, the logic is very simple.
> * Programs now have to deal with malicious applications capable of managing mountpoints giving fake results via the filesystem. I could link to /dev/random to /dev/zero, how many programs are going to check for that?
Only root could do that. If an attacker has root, there are many more realistic attacks.
> * You can't let the program go into a chroot jail if it needs to read the kernel's magic filesystem.
You can add the magic filesystem to a choot jail.
> * You have to mingle filesystem access bits with kernel security checks for process capabilities and the like.
> > * You have to build a state machine inside the kernel to handle cases like applications reading a file one byte at a time.
> You need to have this anyway, because regular files exist. Also, the logic is very simple.
"State machine" is a bit of a weird term here; it's more to do with "transactional state". This doesn't exist for regular files. If you read a regular file one byte at a time and something else is changing it, there's no transactional guarantee. You might get corrupted data. For that matter, you might get corrupted/half-changed data if you're reading >1 byte at a time. This is fundamental to the Unix file model (as distinct from, say, Windows), and is also the reason that file locking facilities exist.
I think GP was emphasizing the difficulty of having to bind the code that generates the contents of files in /proc with the status of any code reading those files. The reader needs a guarantee that it'll get a "snapshot" of the /proc pseudo-file as of some time (the start of the read? The end of the read? This is arguable.) Without that, there's no race-free way to ensure that all readers don't get corrupted data that's changed during the read.
This is required even if you can assume (which you can't) that all readers will use a buffer/chunk size larger than the contents of the file, and that the kernel's updates to the data backing the file are atomic.
A state machine is a common way to implement such snapshot/tx guarantees, but the fundamental desired property is a transaction, which, while I disagree with GP on some other points, is absolutely a needed and likely hard-to-get-right feature in this area.
Another entry for your list: vulnerability to file descriptor exhaustion attacks. This (and the chroot issue) is why OpenBSD promotes their arc4random() interface for generating random numbers rather than /dev/random.
That's a useful list, thanks. It's also saddening: almost all of those items are in the "it's hard to implement" category. I get that free software development is hard and self-directed, but it still bums me out that people are willing to toss the baby out with the bathwater.
A couple of those points seem plain wrong/misleading, though:
> Programs now have to deal with malicious applications capable of managing mountpoints giving fake results via the filesystem.
If malicious code has privileges to change the pseudo-device paths for /dev/zero and /dev/random on your system you can easily be compromised regardless. Such code could sniff your network traffic, put junk data into any file (or socket descriptor, since it could probably also install a tap) your program used, and likely also debug/inspect/halt/alter the runtime behavior of the victim process(es).
> You can't let the program go into a chroot jail if it needs to read the kernel's magic filesystem.
This is technically true, but I think it would be possible to develop a convention or standard for chrooted programs that resulted in certain "capabilities" (e.g. "can it use procfs? which parts? how about /dev/zero?") being present in standard locations inside the jail. It is more implementation/standardization work, though, so this isn't a criticism--more of a regret.
* You have to build a state machine inside the kernel to handle cases like applications reading a file one byte at a time.
* If the API is complex, you get to build parsers (aka, the easiest way to introduce buffer overflows in C) in kernel-mode.
* Programs now have to deal with malicious applications capable of managing mountpoints giving fake results via the filesystem. I could link to /dev/random to /dev/zero, how many programs are going to check for that?
* You can't let the program go into a chroot jail if it needs to read the kernel's magic filesystem.
* You have to mingle filesystem access bits with kernel security checks for process capabilities and the like.
It's definitely not a simple interface for the kernel to implement, and, quite frankly, it's much more complex for a security-minded application to poke at the kernel through a filesystem than it is through a syscall.