Hacker News new | past | comments | ask | show | jobs | submit login

That's exactly right. Files don't exist. The "filesystem" just a big fat KV store.

Somewhere along the way someone decided it'd be a useful abstraction to imagine a hierarchical organization system (directories) on top, so that was glommed on, but it's not real either.

It was never a perfect system, but it worked well enough when most users were fairly technical and had a humanly comprehensible set of labelled data streams.

I appreciate the effort to insulate users from the complexity that has grown up underneath them. The simple fact is that most people fail at large scale taxonomy and organization. It's hard. And it's a lot of work to maintain even if you're good at it. See: library science. So I don't think there is another model that will succeed as well as "files" have.

iOS hides the filesystem, but it's still there obviously. So far all we've seen is insulation for those who need it, as a byproduct of huge control loss for everyone. The other (valuable) byproduct is security.

We haven't found the compromise yet. There might not be one.




It's a little more than a key-value store. The keys are impure.

One of the first issues is that the disk likes to work in terms of blocks, which tend to lend themselves to arrays, preferably of fixed size so that related data is contiguous. This leads to a limit on the number of files in a directory and so nesting directories is one of the easiest ways to contain more files.

But it's also a little more than that. Directories can group semantically related files together. This means their meta-data is in the same directory inode and so can be read in as a group. This creates more efficiency. Chances are that you'll often access many related files at once, even if it is just to list a directory, so it helps the file system to have some structure so that the meta-data of related files is all read in at once. It's an optimization that takes advantage of our own semantic information to structure the data.

This arises from disks being a really crappy way to access data. They are slow and work best with sequential reads of large amounts of data. It really isn't that useful for a persistent key-value store whose meta-data alone may be much larger than the amount of available memory.

But I tend to think of it as a KV store most of the time anyway and often wonder why we have the silly idea of directories.


> I ... often wonder why we have the silly idea of directories.

We want to associate our keys with namespaces.


Yes, but often not just one. Now we can use links (mainly symbolic), but this can be a mess.

Tags may be a better way for an end user to sort out the data. You don't have to lose the hierarchical structures that directories give us. But it may be simpler to think of a file as some atomic stuff that just lies at the root of some disk, and belongs to a number of (sub)categories.


Sounds good, but what uniquely identifies a file? Right now it's path + name.

If I have two files with the same name, one tagged A and the other tagged A and B, are they the same file or not? What if I add a tag of B to the first one?

A directory hierarchy makes this unambiguous.


I think we should use several mechanisms at once to identify files.

Tags. The default mechanism for sorting and searching files. The assumption is, most files are passive data. When sharing a file, its tags should be sent along with it, so the receiving system can propose them by default to its user. Note that one may want to categorize tags themselves (meta-tags?). I'm not sure, but it may be necessary if a given system use many tags.

Descriptive names. This is the user-facing name of the file. No need for it to be unique. Like tags, a file's descriptive name should be sent along with it.

Locations. It may be of import to know where a given file is physically located. It is cool to transparently access more files when you plug you thumb drive in. It is less cool to forget to actually copy those you need.

Unique keys. Devised by the system and not directly accessible by the user. When a search yields several files with the same descriptive name, or when two files share tags and name and location, the system can be explicit about the ambiguity.

Unique names. Devised by the user. The system checks uniqueness (or, more likely, uniqueness by location). Follow a directory structure convention. Discouraged by default. Their primary usefulness would probably be for system and configuration files, which need to be accessed automatically and unambiguously by programs. May be implemented on top of descriptive names (the system could treat descriptive names that begin with "/" as unique, and enforce that uniqueness).

There. End users would primarily use tags, descriptive names, and locations. With the right default, users may actually be able to avoid making a mess of their data. To prevent unwanted access to sensitive system files, the system can by default exclude those files from search results. Typically those both tagged "system", and located on the main drive. Unique names would be for programs, power users, and those who want to fiddle with their system settings (either directly or through a friendly interface). Unique keys belong to the kernel.

So, how does that sounds?


Hard to say. It may be brilliant, and may be the Future Of Files, for all I know.

My first reaction, though, is that it sounds a bit confusing to me, and very confusing for novice users.

Right now, Mom understands that "C:\My Documents\bird.jpg" is not the same as "C:\My Documents\My Pictures\bird.jpg". The rule is simple: unique names per folder.

What's the new rule?


This is kind of a paradigmatic change. Right now, the default when dealing with files is to point to them. What I envisioned in the grandparent was to make search the default. Tags, descriptive names, and locations are all search criteria.

In a way, it is more complicated: instead of 0 or 1 file, you now get a whole list. On the other hand, everyone understands search. My hope is, the initial extra complexity would be dwarfed by the ease of sorting and finding your files. Because right now, one or the other is difficult: it's hard (or at least bothersome) to properly sort one's data in a directory tree, but it's even harder to find it if your disk is messy.

Now there are two snags we might hit: first, I'd like to do away with unique names, because they get us back to the old, difficult to manage, directory tree. Second, to have good tags, you have to internationalize them. For music stuff for instance, French speaking folks would like to use "musique", while English speaking ones will use "music". It has to work transparently when they exchange files, or else it would defeat the purpose of default tags. I can think of solutions such as aliases, normalization at download time, or standard tag names that can be translated by the system, but I'm not sure that's really feasible or usable.


I think that all these different types of identifiers might make security a challenge.


Access rights should of course not be tied to identifiers, but to files themselves.


There is a hierarchy of organization structures that are typically used for "content." For content with low numbers, a list works best. For example, the home screen is a list. With a medium number of files, hierarchical structures like traditional file systems work best. However, when you reach many, many files, tagging and searching is typically needed.

Files do exist. They are the values in the filesystem KV store, but they are also a schema for the value so that interoperability works. If we followed your logic, apps would not exist either but we all agree they do.


Filenames are the keys, file data are the values, and there's metadata too, encapsulated in the directory data, inode data, etc. But the concept of a file is just a useful metaphor. A file is exactly identical to a directory or an executable, from the KV store's low level perspective. All the handling and interpretation happens up higher.

I disagree that (most) files contain their own schema though. Sometimes the schema is incorporated by reference in the metadata, sometimes in the key name (filename).

Some structured data formats contain a sort of self descriptive sub schema, but they always(?) require something higher up the chain to make sense of it. I can't think of any examples where that isn't the case, but I'll leave the question mark on the always because I'd like to be be proven wrong!


> when you reach many, many files, tagging and searching is typically needed.

When you reach many, many files, just try to get people to tag them. Just try.


Indeed, you either tag from the get-go expecting to have many, many files in the future, or you give up on ever contextually managing those files outside of large containers.

How do you tag a file when you can't find it to tag it, and when do you tag a file that you've forgotten about?

Ultimately, someone comes up with yet another abstraction that makes it just a little bit easier... Now, if we made every application those wrote a file also tag it meaningfully, and then had meaningful translations, and.. oh geez. I normally just delete everything and start over when I realize I have no idea what 90% of the files I just scanned were for. If they were truly important, I would've known what they were. I guess the people with ten bazillion files on a PC are just data hoarders. "But, but, but I'm going to need that report one day!" (Bet you would tag it, now wouldn't you?)


Well, there is something such as digital hoarding: http://online.wsj.com/article/SB1000142405270230340470457730...

First link for "digital hoarding", but there is more to read about it.


There are auto-taggers. For example, I use Picard for music files.


The Filesystem != Files.

The File is one of the most important core abstractions of our computer systems. Stdin? Stdout? Network I/O? Etc. All files.


Actually no, they are file descriptors. They provide a unique identifier, which used in conjunction with an API call, with operate on the data in, or the meta data of a data set.

And while its true that UNIX (and taken to its logical extreme Plan 9) used a common API to handle all of the transfers between non-memory resident data sets and memory, there are many examples of operating systems that use other schemes. One of my favorites which I got to help build when I was in the kernel group at Sun was a thing called 'mmap' which assigned addresses in your memory space to locations on disk. The 'magic' happened in the VM HAT layer. That scheme is having something of a comeback in 64 bit address processors since few machines actually have 22PB of RAM it is possible to do something like:

    struct Stuff *my_stuff;
    mmap((void *)my_stuff, file_len, 0, 0, file, 0);
And then be able to make a statement

    if ((my_stuff+100)->is_ready) { ... }
 
And have the operating system translate between my notion of a memory address and the data sets offset to instance 100 of the structures contained therein.

I expect to see more of that in experimental OS code. Something like char data_file = ChuckOSOpen("some.dat"); which lets me then address data_file like an array of char so

   while (char *t = data_file, IsValid(t), t++) { ... }; 
would then iterate over the contents of data_file as 8 bit values until there wasn't any more data and thus no more validly mapped pages. Read and write are simply assignment, seek is simply math, and close is simply unmap.

All without the notion of 'files' but with a notion of 'named data sets' which clearly can be implemented in a number of ways.


> Actually no, they are file descriptors

That is a distinction without a difference.


Without distinction? Only if you look at it simplistically, form the view of a single process. Otherwise:

- Shutdown a machine, and all file descriptors are gone. It's files, one would expect, would still be there.

- you can have multiple file descriptors for a single file.

- you can have file descriptors that aren't 'attached' to a file proper (stdin, stout)


Hmm. Homescreens seem to work well enough for organisation of apps, better than start menu folders anyway. Perhaps file "grids" instead of the traditional hierarchy of folders might work better.


Instantly reminds me of coworkers' desktop filled with icons and documents


I used to be of the mindset to keep a clean desktop, then I came to my senses and used it as a working space. It's easy to get to from anywhere in the OS as it generally has it's own shortcut'd location. Temporary working files just get stored there. Permanent files might have a shortcut to them stored there instead, leaving the real file in a more suitable location. The 'ugliness' associated with such a desktop is meaningless once I realised most of the time the desktop is covered.

The most infuriating thing about Gnome 3 is that it decides for you that the desktop is an unholy place to keep anything, because you're too stupid to figure out how to do things efficiently.


It seems crazy, but it may be more intuitive for some people. People often use muscle and visual memory to remember where things are, not necessarily by name of location.

Edit: Plus, you can always add naming and search with it. And Microsoft has an interesting grouping concept in Windows 8 with grouped and optionally named sections, not folders.


Doesn't seem crazy to me at all. It works well for that use case, but it doesn't scale well to hundreds of entries.

Once you get into groupings you're creating the same problems that people have with filesystems (implicit or explicit organization challenges, loss of discoverability, etc).

Search (or something like what we call "search" today) might be the best step forward from here, but you can layer that on top of any other (or no specified) KV metaphor you like.


>It seems crazy, but it may be more intuitive for some people.

It seems crazy, but it may not be more intuitive for some other people. :)


Of course, me included. I like hierarchical organisation.

I suppose you could mix the Windows 8 approach (all groups) with iOS-esque folders, and allow subfolders, and then you have the best of both worlds.


They may be a tolerable layout for your set of apps, which tends to be pretty small, but for organizing all your files it quickly becomes a giant mess.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: