Hacker News new | past | comments | ask | show | jobs | submit login
How to write a filesystem in 50 lines of code (ksplice.com)
129 points by ebroder on July 8, 2010 | hide | past | favorite | 28 comments



While this is a very interesting article, I think it would be great if people stopped using "How to do x in y lines of code" titles - it's just as bad as "x things about y" titles. It's a lot better when title actually describes what the article is really about.

It would be awesome if this was added to the FAQ or something.


Although I think they're overdone, I think "x in y lines" is avery relevant to the hacker community, while "x things about y" isn't always. Doing "x in y lines" generally implies an unusually elegant solution, that could be of great use.


It's such an easy metric to game, though ("The first line is 'import library_that_does_everything'") - It's a proxy for succinctness, at best.

That said, "[something non-trivial] in only 250-500 LOC" is often more interesting than "...in only 5 LOC", because generally anything that small is just gluing together libraries. There are exceptions, of course - I've seen impressive stuff in just a few lines of J/K/APL.


In this case, the tool that he built (RouteFS) is actually a really clever abstraction above two other popular libraries (Routes and FUSE).

I agree it would be neat if he talked more about RouteFS itself, but it is a neat idea, and only about 200 lines.


"How to do x in y lines of code", where x is a perceived to be something very difficult and y is a small number, encourages people to try things that they would be otherwise be too intimidated to try.


I'd usually agree with you. But, since this is about building filesystems, something I thought was reserved for kernel hackers and truckload of C, I think the LOC title is appropriate. I would have still clicked, but would have expected something way over my head.


Youch! Beware, there are hidden pitfalls here.

I tried the fuse-python xmp example and accidentally mounted the filesystem on top of an existing folder with the result being that the folder vanished. Now if I try to cd into that folder I get an input/output error. It doesn't show up on directory listings. And umount says it's not mounted. I have a backup so this is not an unmitigated disaster, but the folder is huge (it's my development folder) so I'd much prefer not to have to do a restore. This is MacFuse 2.0.3 running on Snow Leopard. Any suggestions on how to proceed would be greatly appreciated.


Your data is fine -- it's hidden while something else is mounted on top of the folder.

Rebooting will fix it. But you should be able to just unmount the filesystem, which will also fix it. Does 'mount' list something mounted there? Does 'df /path/to/folder/' reflect the FUSE filesystem rather than the underlying filesystem? The input/output error sounds like the FUSE filesystem being mounted indeed, and just being broken.


> Your data is fine -- it's hidden while something else is mounted on top of the folder.

I thought that was the case. But the truth is I don't really understand how mount points work under the hood. Where is the mount point information actually stored? I'm guessing it's in kernel memory somewhere, but I don't really know, and Googling hasn't been much help. Is there an article somewhere that explains this?

> Rebooting will fix it.

Good to know.

> But you should be able to just unmount the filesystem

Yeah, you'd think. But everything I try to do to that directory results in "Input/output error".

> Does 'mount' list something mounted there?

Yes:

Python@fuse0 on /Users/ron/devel (fusefs, nodev, nosuid, synchronous, mounted by ron)

> Does 'df /path/to/folder/' reflect the FUSE filesystem rather than the underlying filesystem?

  [ron@mickey:~]$ df devel
  df: devel: Input/output error
  [ron@mickey:~]$ umount devel
  umount: devel: not currently mounted
  [ron@mickey:~]$ ls devel
  ls: devel: Input/output error
You can hopefully see why I'm puzzled. I would really like to understand why this is happening.


Figured it out. umount wanted an absolute path.


Have you tried `fusermount -u` on the mountpoint? That's the fuse-specific command to force an unmount, which might clean up the FUSE mount even if the OS's VFS layer doesn't think there's a mountpoint there.


I'm running MacFuse. The fusermount command is specific to Linux.


Time to get a real OS, it sounds like.


have you tried rebooting?


Not yet. I was hoping to find a less drastic (and more educational) fix. I'm also a little worried that rebooting might make the situation worse since I don't really understand how FUSE works under the the hood.


I know of a few professional console games that where successfully debugged by embedded telnet servers: it is incredibly useful to be able to cd and ls around in your scene graph. I believe the ones I saw implemented their own shells and toolset, but I bet you could get pretty clever with debug builds that expose a decent set of unix tools across a remote shell.


Disclosure: author of Dropfuse fs http://github.com/arekzb/dropfuse . I wrote it to solve a personal need for fetching shared files by command line.

Yes, FUSE is indeed fun and poses an easy entry barrier to writing a user-space filesystem. Many projects that are based on it are listed on their wiki page as well and a lot of them have source code available.


It's an interesting project. I grabbed it and wrote a quick Flickr FS. It seems that it's a little limited for really laggy webservices in that you specify the type of an entry (directory, folder) by the type of data you return, and it calls the same method for readdir, getattr, and read, with no way to differentiate in your code.

In order to specify the type of an entry, you have to return the data for that entry. So a sub-folder has to return an array. There's no way to differentiate between, say, an ls on the parent folder or on the sub folder (e.g. ls / and ls /foo both return the method mapped to /foo), so you have to query all of the subfolders' contents at once AND cache it so it doesn't have to be re-queried when the user wants to look at the sub-folder.

Hopefully I'm overlooking something, but the source is pretty straightforward. The good part is it'd be easy to modify to ask for types separately. Actually just passing in another argument indicating what mode it's in would help.


Yeah, that definitely can be a weakness of RouteFS's style.

My target application was things like automounters, or the low-latency database querying sort of thing I mention in the actual blog post. Since I wanted to be able to have the filesystem structure change as it was accessed, I decided to make any sort of caching entirely an application-layer problem, not a RouteFS-layer problem.

I think it would be possible to extend RouteFS to handle this sort of case more gracefully. One option in particular might be to take advantage of python-fuse's stateful I/O feature (which lets you associate a Python object with open file descriptors in your filesystem [1]) so that reads from the same file don't result in the same lookup over and over again, although this certainly doesn't help for directories.

But in any case, I'd certainly love to see ideas for extending RouteFS to make it easier to make it more performant. Submissions in the form of patches are always excellent, but even suggestions for API changes would be welcome - feel free to open an issue on Github either way (http://github.com/ebroder/python-routefs/issues).

[1] See "Filehandles can be objects if you want" in http://fuse.cvs.sourceforge.net/viewvc/fuse/python/README.ne... for more information


Intriguing. I hadn't thought of this take on filesystems before.

I'm a big fan of text-based config files but am stuck on a Windows machine at work. I wonder, would it be possible to map the registry to a virtual filesystem I could access from Explorer?


One of the many reasons to prefer PowerShell over CMD: http://powershell.com/cs/blogs/ebook/archive/2009/03/30/chap...


Nice. I'm not too into DOS, but this definitely is an easier solution than creating a VFS for the job. Thanks for the pointer.


DOS != CMD != PowerShell

I used to be a hard-core Microsoft/Windows developer; in fact, I used to work for Microsoft! I was stuck in the Visual Studio sandbox and addicted to graphical tools. It has been a very slow transition, but I'm now addicted to my shell. If you are stuck on Windows, you should force yourself to learn and use PowerShell. And at home, you should install a Unix and force yourself to learn Bash. You'll thank me later.


I mean, I'm a Unix guy so... I know bash. I don't really want to touch PowerShell or C# or .Net or anything of the kind (at work, the first thing I do is start up a Linux VM), but it's interesting to know that there is a way to treat the Windows registry as a filesystem.


Clever. But I find the idea of RouteFS more clever than the toy FS he built...


Applications of the same idea have been around for some time, e.g. Plan 9's file system (e.g. GUI elements are part of the FS, , bash's /dev/tcp/<host>/<port> etc., and indeed /proc's file system seen in Linux and, in a limited way, in Solaris.

Seems like no Linux app framework can be complete without reinventing its own virtual file system, with various syntaxes for paths to e.g. network shares but that are inaccessible when used on the command line, etc.


The point is that a FUSE filesystem is available from the command line and anywhere else, because it is an actual filesystem. RouteFS is a way of taking any virtual filesystem-like tree that you might find useful and making it available to the entire system as a normal filesystem, just as easily as you could describe the tree in any other form.


I know all that. I was bitching about how it seems like every layer of the software cake on Linux likes to design its own virtual file system.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: