Hacker News new | past | comments | ask | show | jobs | submit login

I don't know that I see the practical difference here: a seekable stream and a random access byte array are one very thin abstraction away from each other, with the important caveat that a stream can represent a byte array which is being appended to while you're using it.



As long as only one thread of execution is using it, yes, there’s essentially no difference.

When it’s shared between several, though, seekable streams become annoying (and difficult to impossible to use correctly). When the sharing spans process boundaries, we get the well-known bane of microkernel Unices that TFA describes—you need every file position for every Unix process to be (at least potentially) tracked by a single systemwide “Unix server”. That’s a lot of pain for not a lot of win.

My point was that I can’t think of any non-niche thing that’s naturally a seekable stream—they’re usually either nonseekable or outright random-access. Tape was the natural example when the API was invented, but it is niche now.


But if the stream represents a file that can receive writes, then one way or another you need to keep track of the order in which writes and reads happen - i.e. you need locks - which means you need global state.

If the access to a file was truly read only, then trivially every reader can just have its own seekable FD tracking position locally (or permissions to access the underlying object and open new ones).


From the point of view of the application, yes, you need to coordinate. But that’s a concern for the application. From the point of view of the kernel / device server / however your microkernel OS works, requests for disk I/O arrive in some order, probably update some cache pages or whatnot, then go onto disk’s queue. That’s arguably global, but it feels like a logical extension of the fact that you only have a single physical disk. However you’re going to multiplex it, something like that is still going to happen, and it has little to do with Unix.

What is not a natural extension of the whole thing is that, even on an otherwise completely quiescent system,

  int fd1 = open("foo", O_RDWR), fd2 = dup(fd1);
behaves very differently from

  int fd1 = open("foo", O_RDWR), fd2 = open("foo", O_RDWR);
(imagine these fds are then passed out to other processes or whatnot).

Even with O_RDONLY, these are still not at all the same. Witness the epitome of CLI design that is OpenSSL /s :

   {
          openssl x509 -out "/etc/swanctl/x509/$1.pem"
          while openssl x509 -out "$t" 2>/dev/null; do
                  fp=$(openssl x509 -in "$t" -noout -md5 -fingerprint |
                       sed 's/.*=//; s/://g')
                  mv -f -- "$t" /etc/swanctl/x509ca/$fp.pem
          done
  } < "/etc/ssl/uacme/$1/cert.pem"
This is how you pick apart a PEM cert bundle using OpenSSL: the shell spawns openssl, which reads a single PEM block from stdin, does its dirty deeds with it, and leaves stdin pointing to the next one, ready to be consumed by the next instance of itself that’s yet to be spawned by the shell. You can’t do that with a per-process file position, trivially or otherwise.

Returning to the application side, imagine you’re making a concurrent B-tree. If one thread wants to write out page X and the other page Y, they have presumably already used some locking to make sure that’ll leave the data structure consistent, so they’re free to just issue their pwrite()s and let them happen in whatever order. On the other hand, if all they have is write() and seek(), they have to hit a global lock even if X and Y are completely unrelated.


That's true in serial computations, but not true in concurrent ones - the abstraction now includes mutual exclusion in the stream case, and does not require it in the array case.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: