SQLite As The MP3 Of Data

jeffesp · on April 15, 2009

> Having access to that code allows me to tightly integrate it with Excel, so much so, that I can use Excel functions (built-in functions, VBA user-defined functions and 3rd party add-in functions) directly from SQLite’s SQL; and vice-versa, access SQL functionality via Excel "formula" calls.

This sounds really cool and is available here: http://blog.gobansaor.com/projects/xlite/

rarestnews · on April 15, 2009

I don't really work with SQLite too much (I mainly work with mySQL and in-memory-structures, like hashes), but here are a few observations that kept me from using it:

1. It uses a lot of IO unnecessarily. (Actually, necessarily, due to their motto, but 99% odf developers aren't aware that this is not necessary) SQLite-based applications often make my hard-drive sqeak like it's on a last spin - SQLite does A LOT of hdd activity (Chrome, Firefox, Adwords editor are notorious for that). It's SQLites motto - 100% consistency first! journal everything!.. but... that's my HDD and I don't want it to be dead because SQLite thinks it's great idea to write every byte it receives to insure me against 0.01% chance of data loss.

2. Head-to-head runs of mySQL vs SQLite in real-world applications - well, SQLite loses badly.. and I mean badly. I use a lot of hash-style databases (i.e. id-value storage, 2 columns only) in some of my tests SQLite lost with 10 times worse performance against mySQL (again, if they didn't try to record everything, like mySQL does with MyISAM - they would've got the chance).

That being said - in-memory SQLite tables are very performant! Very! But... almost nobody uses that. As a result - Chrome/Firefox downloads phising update and my HDD lights like a christmas tree (download Filemon from SysInternals and watch how much Chrome or other SQLite-based engine uses your HDD for simple data that MUST be in-memory... or at least used by in-memory cache, like in MySQL... so your users' HDDs will eventually die because of this). I know it's none of developers troubles to think about the fact that their software unnecessarily uses much IO and leads to early hard drive replacement, but still, that's a big reason why I avoid using SQLite.

I mean, there are a lot of in-memory structures that can (and should) be used, but once the developer sees how convinient SQLite is, he jumps right on it. I mean.. 99% of times SQLite should be used like this - load data from disk to :memory: table (i.e. replicate it to memory), work with :memory: and eventually dump it to drive (like once in a couple of minutes) and dump on "Save" command. But instead every popular program uses disk-based tables. Why? Because it's a lazy way. Who wants to add in-memory pre-caching when you can just open a table from disk in less commands?

It's sad to see the current state of programming. Coming from an area where I had to write programs that would fit into 64KILObytes of memory and 3.5MHZ (that's 1000 times slower than modern single-core processor) - (ZX Spectrum)... and we've had whole huge games in that, working in real-time, with color graphics and almost unlimited galaxies (Elite)... it's just sad to see how the available resources are used these days. People had to use a lot of tricks to make things work with acceptable speeds... And now? I mean it's 3GHZ and gigabytes of data, yet I have to wait 10 seconds for Word (or Firefox) to start up?

My main points are: 1) SQLite is convinient; 2) Disk-based SQLite (which is used by 99% or more pieces of software) is nowhere near the possible performance of in-memory; 3) disk-based SQLite is putting a heck of a load on HDD (fire up FileMon during your Chrome session and be amazed); 4) SQLite is nowhere near one-fits-almost-all solution, like mp3 is; 5) even so it will gain more popularity because it's convenient for developers.

I have MySQL table with 72billion denormalized items (72million rows) and it's in realtime application... and it performs! I can't even imagine what would SQLite be like with that kind of data.

Still, please note that I'm not saying SQLite is bad, I'm saying the default route almost all devleopers take with it has some serious issues and yet nobody seems to care.

pcc · on April 15, 2009

Because people find it useful, it seems its easy to forget it was called SQLite for a good reason.

The original purpose was to have lite embeddable sql for apps where it simply did not make sense to have to set up an additional piece of infrastructure (eg mysql). E.g. as a store for app config, as an "app file format", or even for lite qtys of relational data where setting up a mysql would be too much of a hassle.

Thinking SQLite should compete with eg MySQL on large data sets, complex queries, high concurrency etc (ie all the cases where one would normally expect to set up a separate database server), to me just seems way outside of SQLite's intended purpose.

As to HDD crunching; I venture if SQLite is used for the domains it is intended for (eg as app file format), this should not be an issue if developers think about how they're using it.

E.g. if one is in the habit of doing a tmp-write, fsync, atomic rename for every atomic change an app would usually make to its files, then one would expect significant HDD crunching anyway if one is doing this with extreme granularity.

Ie with SQLite used as app backing store, the question should be what the extent of a transaction is, before one calls commit()

For these intended domains, I'm not convinced that in-memory pre-caching for the sake of batching/deferring writes is the way to do it at all, but rather to think in terms of the commit() corresponding to the point at which the app otherwise would want to atomically update its file.

rarestnews · on April 15, 2009

Many good points! I agree on most part.

However, from what I've seen - even while you do a transaction, SQLite seems to write a "journal" file (which is still I/O and still slow).

I'm also not saying mySQL should be used for app file format (that would be pretty dumb), but rather challenging the original article's premise that SQLite (as mp3) is one-fits-all solution. There are many solutions for many situations and SQLite is far from being universal solution.

mattmcknight · on April 15, 2009

I don't think anyone is saying that it is one size fits all. To extend the MP3 analogy, SQLite data is very portable, and serverless. MySQL is a server (in my experience), so you need to have that process always running, more analogous to a streaming audio server. I think that makes SQLite a very different kind of beast, more like a file format than a database server.

hobb0001 · on April 15, 2009

BTW, not many people realize this, but SQLite has "pragma" commands that allow you to dial back its HDD synchronization and transactional consistency. I have used "pragma synchronous = 0;" many times to greatly improve the performance of my apps.

edgeztv · on April 15, 2009

I second the IO issue - it's alarming to have the HDD always churning when I have Chrome open. Assuming SQLite is the culprit, that's a good reason to think twice about embedding it into an application.

rarestnews · on April 15, 2009

BTW, if you turn off phishing protection - that lowers HDD usage significantly.

(But, reasonably enough, by "turning off phishing protection" you have be more attentive to what you're doing online)

phenylene · on April 15, 2009

What version of Chrome are you running?

jcromartie · on April 16, 2009

SQLite hits the disk on each transaction. It's up to the app developer to group their operations efficiently. Most don't. Did you make good use of transactions in your tests?

In this respect I feel like SQLite is also the OpenGL of data... not just the MP3. It's up to you to understand the correct methods of using it that result in acceptable performance.

henning · on April 15, 2009

If you have a database that's just hash style, why not stick it in a DBM-style database and get a major performance increase?

rarestnews · on April 15, 2009

The most popular is BerkeleyDB and it's license isn't very friendly for commercial apps. Rest of what I tested are really slow.

Also, most of DBM-style databases suffer from same flaws as disk-based SQLite - they don't use memory cache. I prefer to have slight chance of losing my data, but having 10-100 times faster performance on most queries (RAM vs HDD is often 1:100), rather than 100% consistency and slowness.

Last, but not least, hash-style mySQL is easily distributed and cached.

wanderingmarker · on April 15, 2009

Have you had a chance to test Tokyo Cabinet?

rarestnews · on April 15, 2009

No, actually this is the first time I hear about it. Thanks for a pointer - I'll look into that.

Although the LGPL license seems like a downer - if I understand it correctly it means that either I should distribute my software as open source or compile that particular library into some kind of external .dll, right?

ibsulon · on April 15, 2009

It's set up as an external system anyway. You'd have to do extra work to hack it in. Otherwise, that is true.

rarestnews · on April 15, 2009

The lib seems impressive. Anything which Python's bindings can do 65K inserts in 110ms deserves a lot of attention!!

Now I just have to attempt to make it's Python bindings compile under Windows :( It's hacking time!

Thanks for the tip!

aaronblohowiak · on April 15, 2009

Download VirtualBox and have the best of both worlds!

andrewf · on April 16, 2009

sqlite has a page cache but its default settings are a bit weak.

I suspect that what you want to do is open the sqlite database in exclusive mode (so that it grabs and holds a lock over the db, and should be able to assume that cached data has not changed) and increase the size of the page cache from its default of 2000k. See http://sqlite.org/pragma.html (the cache_size and locking_mode pragmas).

jf · on April 15, 2009

Did I miss something obvious?

How does one configure Excel to use SQLite?

rodrigo · on April 16, 2009

http://blog.gobansaor.com/projects/xlite/

I've used sqlite from MS Access, using sqlite ODBC driver; just for fun though, havent done anything useful with it yet.

MrRage · on April 16, 2009

I don't know how, but it's easy to imagine Excel pulling data from DBs using some sort of interface (ADO maybe?). All you need is some code the implements that interface for SQLite.

henning · on April 15, 2009

Can someone refresh my memory of how SQLite handles concurrency? Doesn't it block on writes or something?

For a long time I've thought a spiritual successor to Access, Filemaker, etc. based on SQLite would be a great idea.

fendale · on April 15, 2009

I think it blocks on write for 'the smallest possible time' (its does as much of the data change preparation as possible before locking the database).

If you are just using it as a single user for generating excel style reports then this doesn't really matter.

I have been thinking for a while that if I had to export a significant amount of data from an app for a tech savvy audience, I would probably send it out in an Sqlite DB instead of XML.

eli · on April 15, 2009

Yes, but it's a little more involved: http://www.sqlite.org/lockingv3.html

senko · on April 15, 2009

By default it does deferred transactions (exclusive lock is only acquired when you need to commit), but if you use insert/update/delete outside a transaction, each statement is it's own transaction.

Locking is database-wide, although IIRC there's some branch that can do table-level locking.

jmtulloss · on April 15, 2009

As far as I know, SQLite's version of "table-level locking" is to put each table in a different database file and ATTACH them all to one database connection.

stcredzero · on April 15, 2009

To be analogous to MP3, it would need:

    - Enabled distribution by overcoming bandwidth issues
    - Has been superseded by superior technology,
        but the majority think it's good enough already
    - Is seen as "open" even though it's proprietary
    - Because of the above, has become a defacto standard

How many of the above does SQLite fit?

jacoblyles · on April 15, 2009

This actually might be good for a problem I ran into today. I'm distributing some source code with a large data file, one that we would like to store in a SQL database for ease of use. However, we don't know that the user machines will have MySQL installed. Would it make sense to use SQLLite in this instance? It sounds like you don't have to install it.

jf · on April 15, 2009

You do need to "install" SQLite, but not in the traditional sense. Most likely all you need to do is include a library that knows how to read and write to SQLite files. Worst case, you'll need to ship your code with a small static binary and call out to that.

vicaya · on April 15, 2009

Yeah, SQLite is as lossy as MP3 :)

FWIW, I've lost some of my mails at least twice in Apple Mail due to corrupt SQLite files, but that maybe due to OS X's crazy fsync implementation.

budwin · on April 15, 2009

different tools for different tasks. as far as setup time goes, sqlite wins by a long shot.