FleetDB

tedunangst · on Jan 5, 2010

Perhaps a little editorializing of the headline is appropriate when it's one word that doesn't tell me a single thing about the link?

z8000 · on Jan 5, 2010

This looks rather neat!

I like to think that I know enough Clojure to decipher that the database and metadata is locked during compaction. Is this true? See http://github.com/mmcgrana/fleetdb/blob/master/src/clj/fleet...

I wrote something called LogStore based on the general notion of log-structured data (then learned about the work of Ousterhout et al. in the 90s). For what it's worth, I avoided the need to lock the metadata and the database during compaction, allowing the log to grow during compaction.

Instead of working through the locked metadata (offset per entry) and rewriting the entries to a new file, the compactor works from the end of the log file to the start of the log file. Each record descriptor is actually written after the record data to help the compactor easily skip over older versions of records it has processed already during the compaction iteration.

Once the compactor reaches the start of the file, it checks to see if the file grew (more appends, new data) since the beginning of the compaction, and starts over from the new EOF but stopping at the previous EOF. This repeats until the compactor fully catches up.

Then the database is temporarily locked, the new metadata is swapped in, and the existing log file is overwritten with the compacted log file.

mmcgrana · on Jan 5, 2010

The database and metadata are not locked during compaction, at least to the extent that you seem to think they are.

When a compaction is requested, the compact query enters the write queue. When the query reaches the head of queue, an asynchronous compaction is started on a snapshot of the database at that time, to a file in /tmp. Also, a buffer is added to the database metadata in which subsequent queries will be stored. After this essentially instant operation, the database can proceed to process write queries as normal. While the compaction is ongoing, write queries are appended to the buffer mentioned above. When the compaction thread that was spawned earlier finishes writing its snapshot, it inserts into the write queue a request to finalize compaction. When this request reaches the head of the queue, it appends all the buffered queries to the compacted database file. Finally, it swaps the compacted file in /tmp to the regular database path. So writes are blocked once for an instant and once for however long it takes to write those buffered queries, which shouldn't be long either. Note that reads are never blocked by compaction; indeed they are never blocked in general.

z8000 · on Jan 5, 2010

Thank you for explaining how compaction works in greater detail. As I understand it, the "buffer" used for write queries (and more) while compaction is active is in-memory, correct, an ArrayList? http://github.com/mmcgrana/fleetdb/blob/master/src/clj/fleet...

Why not just open a new file at compaction-start instead of an in-memory buffer? When compaction ends, append the newly open file to the compacted file, then swap-in the compacted file as the current log file.

I suppose deciding on whether to buffer in memory or on disk would depend on several factors:

1) how much compaction is required and thus how long compaction might take to complete

2) historical write-rate average

3) buffer size threshold

4) compaction time threshold

By thresholding I mean: start buffering in memory and then switch to a file on disk if compaction starts taking "too long" to complete or the buffer in memory becomes "too large".

pradocchia · on Jan 5, 2010

How do you recover from a compaction thread that hung or silently died? You could end up w/ quite a buffer to append in one go.

edit: very nice project, btw. I always fancied writing a db in lisp.

mmcgrana · on Jan 5, 2010

Right, there are actually two potential problems here: 1) the initial compaction takes a long time and there are a correspondingly large amount of buffered writes to append and 2) the compaction never finishes, either because compaction hung or died - and then we are adding to the buffer indefinitely.

I don't think that 1) will be that much of a problem: the amount of writes that the db can process in the time it takes to do a compaction of even a large db should be small compared to the rate at which they can be appended to the compacted file.

2) is more problematic; I'll need to add a timeout-like guard to prevent a runaway write buffer.

antirez · on Jan 5, 2010

this is more or less exactly like the BGREWRITEAOF redis command. I wonder if we both happened to think at the same solution or if you in some way know about Redis and got some idea from it. I could love to know that some Redis algorithm could be of general use for other DBs implementation.

Thanks for your work, and welcome to the in-memory database developers crew ;)

DrJokepu · on Jan 5, 2010

So what does this offer over a key-value store (such as BerkeleyDB) other than the JSON query/object abstraction layer?

patrickgzill · on Jan 5, 2010

What happens if/when the database runs out of RAM? Does the OS then start swapping to disk?

mmcgrana · on Jan 5, 2010

Yes; the current version of FleetDB is designed for datasets that fit in RAM.

AnneTheAgile · on Jan 5, 2010

Is it not true that a major purpose of a DB is to handle datasets the are too big for RAM? If the items fit in RAM, isn't speed ~1/1000th (ie ratio of RAM access to HD access time) the issue it was?

mahmud · on Jan 5, 2010

If everything fits into RAM, you can use my proprietary ArrayDB storage system which has bindings for every non-toy programming language known to man. In fact, if you're a programmer, you already know the API for ArrayDB. No downloads, no installation, minimal investment and training required!

z8000 · on Jan 5, 2010

I wonder how many people googled "ArrayDB" because of this.

lucifer · on Jan 5, 2010

I am a long time (satisfied) user of your ArrayDB [;)] but have been waiting a long time for you to release a network interface ...

DrJokepu · on Jan 5, 2010

That really depends on the query and the data structures used in the database. You can query large amounts of data on the hard drive very quickly if you store your data and indexes in data structures manually tuned for the problem. Modern operating systems / databases will try to use up your available RAM anyway to cache hard drive so keeping the whole database in RAM (if it fits into the RAM) won't help that much anyway.

va_coder · on Jan 5, 2010

It looks like the first schema-free database written in Clojure has arrived

mmcgrana · on Jan 5, 2010

Clojure is very compelling for implementing a main-memory database; if you use the persistent data structures and concurrency primitives in the right way, you get atomicity, consistency, and isolation essentially for free.

pradocchia · on Jan 5, 2010

Do the index maps contain pointers back to the base records? The examples I saw on the implementation page showed a nested copy of the data; I assume this is just a projection?

mmcgrana · on Jan 5, 2010

That's a good question; indeed the index maps just contain pointers back to the records.

netghost · on Jan 5, 2010

Looks neat, certainly something to watch.

mhendric · on Jan 5, 2010

congrats mark!

admn_is_traitor · on Jan 5, 2010

Stopped reading at "optimized for agile development."..