Organizing Files (2005)

gabesullice · on Aug 4, 2015

I love these kinds of old blog posts.

I admire the approach and dedication, but it feels a bit redundant. Aside from the future dated directories, some creative use of the find command could achieve a lot of this.

The problem really reminds me of inheritance vs. composition. Like inheritance, file hierarchy encourages narrow classification and makes shared traits difficult to manage. Tagging for file organization seems like a much more elegant solution, letting an article or files categorization be a sum of its basic properties. Services like Pocket have adopted that approach for a reason I think.

I haven't looked, but I'd love to see a tag based file system tool. One could easily imagine having a nested tree of tags where one would be able to navigate down to a file via multiple directory paths.

Edit: Evidently, this is called a semantic file system: https://en.wikipedia.org/wiki/Semantic_file_system and Tagsistant looks a lot like what I was imagining: http://www.tagsistant.net/

ics · on Aug 4, 2015

TMSU (http://tmsu.org/) and git-annex (https://git-annex.branchable.com/) also allow for different tag-based filing methods. OS X has had native file tagging for at least a version or two now as well, though it's less helpful with many tags.

yummyfajitas · on Aug 4, 2015

Something I'm finding handy, at least for one specific purpose, is emacs org mode. I have a big file, papers.org. It looks like this:

    * Computer science       :computer_science:
    ** [[~/org/papers/Category_Theory_Applied_to_Functional_Programming__cain-screen.pdf][Category Theory Applied to Functional Programming]]
       :PROPERTIES:
       :original-source: http://www1.eafit.edu.co/asr/pubs/cain-screen.pdf
       :END:
    ** [[file:papers/bitcoin.pdf][Bitcoin]]    :bitcoin:
    Original paper.

All the papers go into ~/org/papers/. I find stuff with text search and tags.

This, combined with org mode task management, is actually making me pretty good at queueing and reading new papers.

leni536 · on Aug 4, 2015

For managing papers I use Zotero. It has some nifty features, like recognizing the paper that you import and fill out all the metadata.

bensummers · on Aug 4, 2015

My company has built a nice business around solving this precise problem, although as a web app for internal use in an organisation rather than for organising a filesystem.

The key principle is to organise things by their description, not their position in some sort of filesystem.

If every field in your description allows multi-values, then you can put an object in multiple places. And if you can search and browse on any field, you can find it by subject, author, kind of information, and so on.

There are a few other clever ideas in there, but that's the key: Help users enter really good metadata, then use it to find things within a flat namespace.

It's open source: http://haplo.org

We're hiring developers to work in our London office: http://www.haplo-services.com/jobs

fit2rule · on Aug 4, 2015

This ties into what I've often thought about our computerized world, that it is mostly a matter of semantic differences, similarities, and identities that prove the mettle of any system considered 'productive'. If my semantic set and your set are generally coherent, we can get something done; if they are not, then we spend a lot of time trying to attain semantic equilibrium, before anything actually gets done according to the business purpose.

The tools that allow us to synchronize our own conceptual copy of the semantic universe in a way that produce a 'flow' between individuals, are the ones that mostly succeed.

Haplo seems to do a good job of giving a small group the ability to construct a semantic set with a productive goal. I've looked at it for 5 minutes, and the only thing I can think to contribute is that I feel it would be of objective benefit to give more examples of how Haplo has granted a group enough self-awareness to actually get some work done. There is kind of a meta- sense to the product, which can either work for you, or against you. Showing how you can use Haplo to build a small business production/flow-line might make it a little more clear as to what particular problem you are solving.

Another thing is that the web is really quite functional, on the one hand, but boring on the other. I wonder how you might gamify Haplo ... Well, I have some idea's, anyway ..

bensummers · on Aug 4, 2015

Humans are very good at describing the world, and communicating through common vocabulary. The problem comes when you don't model the real world, but instead model your current business processes within a traditional database.

Haplo's data model encourages you to model the real world, using the normal shared vocabulary, then hang business processes on top. This eliminates the semantic problems.

Generally problems come when you try and squeeze your information into what's easy to create with SQL databases and the current crop of NoSQL document stores. We spent the time to build an object store which was capable of handling "information", rather than "data", and it makes an enormous difference.

Regarding examples, we're working on more overview documentation and some example applications.

As a company without investors, funded entirely by revenue from our customers, those customers have to take priority over building example applications. Our aim is to open source the majority of the work we're doing, and hopefully those will be good examples.

But here's a product we've built on top of Haplo: http://www.phd-manager.co.uk

fit2rule · on Aug 4, 2015

>model the real world .. shared vocabulary .. SQL databases .. object store .. good examples.

This is the crux of the challenge, I think. Anybody who doesn't know what a scone is, can't really sell it.

The modeling occurs at a word level, like .. as a dictionary .. and everything else is just baggage. Get everyone on the same page .. of the dictionary .. and you get a working group. Isn't software secondary to human interaction?

perfunctory · on Aug 4, 2015

Is there a hosted version?

bensummers · on Aug 4, 2015

Yes! But it's priced for enterprises, rather than individuals, I'm afraid.

TheLoneWolfling · on Aug 4, 2015

I want a DB-as-FS.

These all have something in common: namely, that files must be stored in one place. But, as he discovered, files aren't organized by only one thing.

I want a FS where I can tag things. Effectively, a D(A?)G, not just a tree.

Links help some, but nowhere near enough. Especially with the quirks of some things trying to handle links.

networked · on Aug 4, 2015

>I want a DB-as-FS.

So do I. From Project Xanadu to the BeOS to the WinFS it has been a recurring idea in computing and for a good reason, I think. However, as far I am aware no popular implementation of it for Linux, the BSDs, Windows or OS X exists.

In particular, I have long wanted to implement a tag-based file system. Tagging should be easier to implement than a full-on DB-as-FS and, importantly, it would be easier to interop with existing file systems and tools that talk to them.

My design ideas for it so far are as follows:

You can map each tag to a directory at the tagging file system's mount point. Each of these N directories would then contain N-1 for each of the remaining tags to allow you to select files that have two or more tags. For example, the files with the tags "a" and "b" would be accessible through /tagfs/a, /tagfs/b, /tagfs/a/b and /tagfs/b/a.

In contrast to a DBFS accessed through, say, "/dbfs/SELECT * FROM .../" it would be possible to use "ls" to get the list of all the tags, to apply the POSIX permission model in a way that made sense and suchlike. E.g., the permissions of /tagfs/a/b/c could be an intersection of those for "a", "b" and "c".

One problem with this approach is in how ordinary (non-tag) directories would be interact with the directories that represent your tags. Not distinguishing them for the user would create a potential for confusing misfiling errors and data loss on deletion. Distinguishing them by giving the tag directories special names (e.g., ones that begin with a sigil) or permissions would limit the system's power. Extended attributes are not easy to see visualize in most GUIs, etc.

bigbugbag · on Aug 4, 2015

My experience with organizing files taught me not to consider the user home directory as owned by the user. The home directory is littered with lots of files that most software store there so my user data goes into its own subfolder.

Then I sort my files in directories with an unsual scheme, first directory is the importance of the files to me:

/buffer a space for file copies as I work on them and temporary files /collect for new files /datalibrary, /databank, /datastore, /datakeep are to separate data according to its importance to me for example the keep is a smaller size, encrypted and automatically backed up every day.

The second level of directories is the action related to the data, for example /listen will receive audio files, /watch video files, /look for pictures. other examples include /archive, /customize, /play, /install

Then depending on the content there is a sorting scheme where I either sort by genre, by theme, by name,… for example pictures, if a picture is worth keeping it could go in /by_genre/hispeed or /by_genre/tilt-shift, or in /by_theme/futurama or /by_theme/space, or /by_name/choi xoo ang

Now that I learned of TMSU, tagsistant and the like, I'm gonna try to make use of those.

mfisher87 · on Aug 5, 2015

>The home directory is littered with lots of files that most software store there so my user data goes into its own subfolder.

This is a really good idea, going to try it!

Can you go in to more depth about your first tier and sorting process? I'm most interested in how you use datalibrary, databank, datastore, datakeep.

leni536 · on Aug 4, 2015

Hard or sym links exist for a reason. One doesn't need sort everything into only one suitable category. It's not like I have too much to brag about though, my home directory is a mess too. However sorting everything by date in the filename? Files already have date metadata, I use "ls --sort=time | head" a lot.

fsiefken · on Aug 4, 2015

I used to use DevonThink OSX for organizing my info, it was excellent to archive and search through documents. Unfortunately it doesn't work on linux so I am now using org-mode wiki and projectile and the platinum searcher, just search, good naming conventions, no tags. Second rate but i can use it anywhere (including my phone).