Hacker News new | past | comments | ask | show | jobs | submit login
Work on SQLite4 has concluded (sqlite.org)
378 points by joewalnes on Nov 7, 2017 | hide | past | favorite | 157 comments



The relevant change:

> This directory contains source code to an experimental "version 4" of SQLite that was being developed between 2012 and 2014.

> All development work on SQLite4 has ended. The experiment has concluded.

> Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.

https://sqlite.org/src4/artifact/56683d66cbd41c2e


For context, SQLite4 explored reimplementing SQLite using a key-value store on log-structured merge trees, like RocksDB and Cassandra.

I'd be interested to hear why they stopped. Presumably reimplementing SQL on a KV store was seen as not worth it, when applications that are satisfied with an embedded KV store backend (which is much faster and simpler to write!) already have many options.


I've had the chance to hear Richard Hipp talk about SQLite yesterday! He mentioned that the LSM tree storage engine is available as an extension to sqlite3. More specifically, he mentioned that he didn't really get the performance improvements he had hoped for, for insertion-heavy use cases.

I think part of this is because of a fundamental limitation of sqlite that it's an embedded database that has to persist data on disk at all times: The design of LSM trees works well with databases with a resident in-memory component because it's an approximation of just dumping every new thing you see at the end of an unordered in-memory array. This is as opposed to a data structure like a b-tree where you have to /find/ exactly where to put the data first, and then put it there. This finding bit means you're doing a lot of random access in memory, which is thrashing all of your caches (CPU / disk etc). LSM trees avoid this thrashing by just dumping stuff at the end of an array. However this means you have to scan that array to do lookups (as opposed to something easier like binary search). Then as your array gets big, you merge and flush it down to a lower "layer" of the lsm tree which is slightly bigger and sorted. And when that one fills, you flush further. And these merge-flushes are nice big sequential writes so that's nice too.

Anyway, with SQLite, the highest layer of your LSM tree would probably (this is conjecture) have to be on disk because of the way that there is no server component, versus in an in-memory system it'd probably be in your L2/L3 cache or at least your main memory. So this could be one reason why that model didn't work out as well for them.


I'm jealous you got to hear Dr. Hipp, that sounds cool. Would love to hear more about the circumstances :)

Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.


I stumbled on this video presentation by Dr Hipp recently (may have been from another HN comment). I really enjoyed it. Probably because he seems so enthusiastic and passionate.

SQLite: The Database at the Edge of the Network with Dr. Richard Hipp

https://m.youtube.com/watch?v=Jib2AmRb_rk


I got lucky! It was a class event, and he's known to speak at databases / data systems classes. He's a very fun (and opinionated!) speaker.

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system.

Hmm, ok, so this contradicts my assumption. Actually, now that I think about it, other LSMs like rocksdb / leveldb work like this too (library-like model with some in-memory component when you "open" the database).

Anyway, without diving into the details of the code, there's other technical decisions that would affect this stuff.

One thing is how big your in-memory structure is (in relation to available memory and insertion workload) and how often you flush to disk is a key thing. Another thing is what your LSM tree looks like - aside from the data structure, how many tiers/levels you have is a big thing. I assume some of these are configurable parameters. E.g. rocksdb has an enormous set of parameters that handles this stuff. It's also annoying to tune.

I found this benchmark here that is illustrative: https://sqlite.org/src4/doc/trunk/www/lsmperf.wiki

The first graph is underwhelming, but when you adjust the buffers (look at the last graph) ~250k writes / second constant regardless of database size (this is why you want an LSM tree) is darned good! And this is on a spinny drive, not an SSD. Their "large" buffer sizes aren't even that large IMHO.

So maybe his mention that the LSM storage was underwhelming was overblown :-) I don't know.

Another difference is with other LSM-based systems that aren't just key-value, it's usually in the context of column stores: you keep a separate LSM for each column family (could be 1-n columns). But I can't think off the top of my head how this would cause a difference. Perhaps in how reads happen - the query engines work quite differently.

Anyway, my talk is cheap, I'm just guessing here, actually doing the analysis is the hard work :-) Also, I'm something of an amateur currently, so take my words with a grain of salt. Anyone else have any ideas re: this?



What would be performance for SQLite3 in comparable scenarios? I don't see anything comparative on that benchmarks page.


I've gotten it to do around 40-50k inserts / sec but that's a different scenario - nfs drive, different table and indexes, different queries, different configuration, etc etc. Also I didn't know if he meant that the inserts were disappointing or the overall results were (e.g. a suite of tests including reads / writes of all sorts).


I just want to hijack this thread to say hipps other creation Fossil SCM is a great SCM. Better than git imo. Everyone should check it out.


We've been using it for the last 3/4 years. It's great and really user friendly, with integrated help and a web interface - everything in a single binary. The trouble with it is that it gets slower once you have a lot of history and many files. I don't know if you can use it for huge things like the Linux kernel or the FreeBSD ports tree. I once tried to import the ports tree into fossil and gave up after 2G and an hour. It will import anything that can do git fast-export. Now it also imports svn dumps as well. Fossil is a very good replacement for SVN. You can set up a central repo where everyone syncs on commit and update.


To be fair, Fossil's intended use case is the exact opposite of the Linux kernel. See 3.3 and 3.4 of the fossil vs git page[0].

[0] https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...


I find most of the differences listed there contrived.

One big difference is that fossil includes wiki and ticketing.

Philosophy differs: Fossil intentionally limits "distributedness". For example, fossil push/pull always transfers all branches with their name. Private branches are considered an anti-feature.

Minor differences are the licence (GPL vs BSD) and the data store (files vs sqlite). Under some circumstances these details matter, but not for the majority of developers.

The rest is not significant, imho. For example, "Lots of little tools vs stand-alone executable". Who cares? In both cases you type "$VCS $SUBCOMMAND $PARAMETERS" commands.


You're right, philosophy differs. I generally dislike private branches. It goes back to the origin of git -- intended for the linux kernel, one of the most widely used open source projects with tens of thousands of contributors. Linus doesn't want or need to see a million private branches. None of the projects I work on are of that scale. When your team is under a dozen people, being able to see what your coworkers are playing with in real time (autosync) is actually incredibly useful.

Stand-alone executable is pretty significant. Git is available on most servers -- fossil is not. If it's packaged in your OS, it's often outdated. Stand-alone kinda makes up for this as you can easily get the latest version with a wget & chmod on any computer, on all 3 platforms.

As for sqlite, it is an astoundingly solid rdbms that is well battle-tested. I consider that a big difference.


Why does everybody persist in calling the great man Dr.?

He refers to himself as D. Richard Hipp.


D is an initial but Dr. D. Richard Hipp has a PhD from Duke - graduated in 1992 (his thesis can be found here - its worth a read: http://bit.ly/2ygiDWx ) hence the Dr.


Thank you. Always thought it was a weird comprehension error from the slightly unusual character-combo.

The weird error may have been mine.


Ooops!


I did something like that - LSM backend for OO-relational DBMS (inhouse).

I also hoped for big win on the insertion-heavy loads, and I also haven't succeed in that. The problem is that every insert statement must read back something from DB to verify DB state against schema for correctness. As reads in LSM are slower, the net win is either absent or negligible. I have to say I wrote "must" in sentence above because you sometimes can get away without reading back, but not always. In the end, worst case scenario is always "read and write", not just "write".

But!

I devised a scheme to lay out layers' data so that they are as contiguous as they can be. Or get a very good approximation to that contiguousness, basically (O(1) "pages" per level). Thus contiguous reads got very high performance and beat old storage on read scheme, despite the need of level merging, etc.


Perhaps I'm misunderstanding you, but sqlite supports ramdisks/"In-memory databases"

https://www.sqlite.org/inmemorydb.html

Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk. Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.

Correct me if I'm off here. Thanks.


> Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk.

This is closer to what I mean but not quite. Specifically, in-memory (or main-memory) is a technical term that talks about a specific type of database that focuses on doing operations /mostly/ in memory, with some spillover to disk as necessary, but only as an edge case. You usually then handle persistence with a sequential transaction log, perhaps in NVDIMM storage if you have some. This is in contrast to other systems where the memory is a buffer, but the "actual stuff" happens on disk. There is of course many hybrid schemes - it's really more of a spectrum. Some examples: SAP HANA, ArangoDB, MapD, MemSQL, MS SQL Server Hekaton.

There's also a lot of techniques that come with this style of thinking - often these are column stores, and often since these databases don't really hit disk, the bottleneck is moving data in and out of memory to the CPU caches, and sometimes just CPU speed. Often these use lightweight compression and SIMD instructions to tackle those problems.

So SQLite's "in-memory" database scheme doesn't quite count as something like this, it's more of a disposable database. But that's ok - it's not bad, it's just a different thing.

> Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.

See the thread above - I kind of assumed this was the case, but like you mentioned, it looks like with the LSM implementation it does kind of sort of have a memory component - it lasts until you close the database "connection", and gets flushed often.


Actually in WAL mode with `PRAGMA synchronous = NORMAL`, it only flushes to disk periodically, not after every transaction.


According to a presentation Richard Hipp did at Ohio State, they just didn't get the same level of performance from a log-structured merge vs. the b-tree in SQLite 3. If I remember correctly, while log structured merges where able to accumulate in memory actual to consolidate writes there was still a number of reads required to support joins etc. So they just determined based upon their testing that the b-tree based code was just faster. I don't recall the exact benchmarks, it is probably on the web somewhere.


Everything I hear about SQLite3 always suggests that, essentially, it is considered "done". It does what it is supposed to do with great performance. There is nothing major left to do. If it doesn't meet your needs, pick a different SQL database.

Which, while a totally alien concept in the modern software world, is actually a pretty cool thought.

(I'm sure under the hood bugs are getting fixed and all)


Done? Almost every .0 version I checked in https://www.sqlite.org/chronology.html saw some new features.


Well, obviously they should rewrite it in Rust.


Actually having an alternative implemention would be terrific.

Look how Open Office development charged after the introduction of Go OO, or MySQL after MariaDB, or more generally the entire smartphone market after the introduction of the iPhone. Competition is good.

That said, Dr. Hipp is known to be passionate about SQLite. In my previous examples the software was pretty much neglected before the competition came, which is certainly not the case with SQLite.


From the canonical source (as in, [Richard Hipp](https://en.wikipedia.org/wiki/D._Richard_Hipp)):

Rewriting SQLite in Rust, or some other trendy “safe” language, would not help. In fact it might hurt.

Prof. Regehr did not find problems with SQLite. He found constructs in the SQLite source code which under a strict reading of the C standards have “undefined behaviour”, which means that the compiler can generate whatever machine code it wants without it being called a compiler bug. That’s an important finding. But as it happens, no modern compilers that we know of actually interpret any of the SQLite source code in an unexpected or harmful way. We know this, because we have tested the SQLite machine code – every single instruction – using many different compilers, on many different CPU architectures and operating systems and with many different compile-time options. So there is nothing wrong with the sqlite3.so or sqlite3.dylib or winsqlite3.dll library that is happily running on your computer. Those files contain no source code, and hence no UB.

The point of Prof. Regehr’s post (as I understand it) is the the C programming language as evolved to contain such byzantine rules that even experts find it difficult to write complex programs that do not contain UB.

The rules of rust are less byzantine (so far – give it time :-)) and so in theory it should be easier to write programs in rust that do not contain UB. That’s all well and good. But it does not relieve the programmer of the responsibility of testing the machine code to make sure it really does work as intended. The rust compiler contains bugs. (I don’t know what they are but I feel sure there must be some.) Some well-formed rust programs will generate machine code that behaves differently from what the programmer expected. In the case of rust we get to call these “compiler bugs” whereas in the C-language world such occurrences are more often labeled “undefined behavior”. But whatever you call it, the outcome is the same: the program does not work. And the only way to find these problems is to thoroughly test the actual machine code.

And that is where rust falls down. Because it is a newer language, it does not have (afaik) tools like gcov that are so helpful for doing machine-code testing. Nor are there multiple independently-developed rust compilers for diversity testing. Perhaps that situation will change as rust becomes more popular, but that is the situation for now.

https://blog.regehr.org/archives/1292#comment-18452


Even if Rust is at parity with C in terms of tooling and ecosystem (which it probably will be in a few short years), SQLite is probably not high on the list of software to rewrite in Rust, given its fairly high quality. Let's rewrite the vulnerability-ridden ones first.


> it does not have (afaik) tools like gcov

You can just use C tooling with Rust, generally; I know kcov has worked for a long time, but it looks like easy gcov support does exist: https://github.com/kennytm/cov


Rewriting in Rust would not make any sense, sure. But rewriting in a faster and safer language, like Pony or ATS might, esp. for lots of CPU's.


Although i get the joke, it would be a fun and interesting experiment wouldn't it ? Espacially since the sqlite test suite is so exhaustive.


There is a version rewritten in C#, back in the day - not sure how up to date is:

https://www.infoq.com/news/2009/08/Porting-SQLite-to-.NET-in...


Because we’ve missed the Rust evangelism strike force so very much.

I honestly hope against hope that you’re being sarcastic.


Check out the changelog...they've added a ton of new stuff. Native json support, completely new full-text search extension, lsm key/value extension, performance improvements. I think they're looking at some changes to locking in the near future as well. Lots of stuff to find if you look.


SQLite is one of my favorite pieces of software for this exact reason, that it is actually a more or less "finished" program instead of a mire of shifting design requirements and constant security updates, a comfortably static and unchanging object against the chaotic backdrop of modern software development.


There is also a project called UnQLite for UnQL (unstructured query language) designed to be an embedded document store, announced in 2011 [1]. Looks like it's now maintained by another company. [2]

1. https://www.infoq.com/news/2011/08/UnQL

2. https://unqlite.org/


Would have been a neat way to experiment around with putting a SQL front end on various KV interfaces. Redis or etcd, for example.


CockroachDB has a good blog post[0] that describes how they implemented SQL. (CockroachDB is a key-value store.)

[0] https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...


>(CockroachDB is a key-value store.)

Correction: CockroachDB is based on a KV store, it's a full SQL RDBMS on top of this KV store.


Yes, that note was misleading. Thanks for clarifying :)


Presto has connectors for various types of non-SQL databases including Redis: https://prestodb.io/docs/current/connector/redis.html

Presto is a distributed SQL query engine for big data, so basically the complete opposite of SQLite, though it often gets used in federation scenarios, as does SQLite.

An interesting anecdote is that the team working on what would become osquery (https://osquery.io/) asked if they could reuse the SQL parser from Presto. We get that question a lot, and after explaining that the parser is the easy part (semantic analysis and execution is the real work), I determined that what they really wanted were SQLite virtual tables: https://sqlite.org/vtab.html (and those worked out great for them)


Fairly straightforward to make a mysql storage engine for it, e.g. https://github.com/AALEKH/ReEngine


Scaevolus, I recognise that name. #0x10c-dev on freenode? rmmh? skybot?

I created a logsql.py plugin for skybot, it just logs to the DB instead of to text files. Is that something you'd be interested in merging back in?


Developer of some optimization mods for Minecraft, one of which became OptiFine.


I have to say I learned more about databases from just studying SQLite code than any book on the subject. I've bought a bunch of books on DB's, some very expensive ones, but I wish someone pointed me to SQLite source early on.

To internalize it better I invented a "project" for myself - http://thredis.org/ which was (and is, but I'm not maintaining it) a Redis/SQLite hybrid. It was fun to hack on.

Another invaluable source of DB internals information is PostgreSQL. Both projects have amazingly well written and detailed comments.


SQLite is one of those awesome things that's the exact opposite of magic. It's beautiful, jaw dropping, engineering that exercises so many technical muscles.

The number of oddball, often critical, places where I've found SQLite being used would defy belief. As far as I can tell, the "expected" place for SQLite to work seems to be almost anything that's not your normal dB driving some web-based CRUD app...all kinds of embedded systems, easy to manipulate in-memory scratch pads for bioinformatics, lots of data analysis tools in mobile communications.

It's so good, and so obvious, that I think sometimes it makes other tools that might be simpler fits for many use-cases less likely to be used, like leveldb.


> your normal dB driving some web-based CRUD app

That can totally be handled with SQLite.


I have several crud apps running sql. With moderate write load and a good concurrency write error handling code it works very well. Good when the product size is not worth a postgres full blown setup.


I backup this. I've been using SQLite for some low to moderate load CRUD apps, and this always worked like a charm. SQLite also make backuping, testing and moving apps so much easier.


How do you workaround the fact that SQLite only supports a single writer? For example, your app could be blocked when you run a long operation like creating an index.


For moderate load: just catch the exception and try again. It will rarely happen, and when it does, you can easily recover from it since your site is not hammered.

For bigger load, have a worker that does the writes with a queue.


Hello Sam and/or Max,

I'm French too and I read your blog sometimes ;-)

Is the following what you suggest:

You create an index, and the index creation takes 30 seconds. Then instead of writing directly to SQLite (which won't work since the index creation blocks other writers) you suggest to store the write in a queue (for example another SQLite database for durability), and have a worker apply the write to the main database when the index creation is done?


Unless you want to scale. I'm all for the rule of least power, but as soon as the app is exposed to multiple users I would ask and be sure about the expected number of simultaneous users before going with SQlite instead of going with a Client/Server RDBMS. Still, the bound is pretty high if you keep your transactions short.


If you have the correct architecture, multiple users shouldn't matter at all-- I have used SQLite to store and query activity data with something like 10k concurrent "users" without difficulty on a single machine.

I am beginning to suspect that MySQL, PostgreSQL, DB2, Oracle, BigTable, and others allow one to get so far with the wrong architecture, that maybe some even very experienced programmers believe that to go faster they have no choice but more threads.


Can you describe the right architecture?


How do you workaround the fact that SQLite only supports a single writer? For example, your app could be blocked when you run a long operation like creating an index.


SQLite supports concurrent reads and writes since 2010 with the introduction of the "write ahead log"

https://sqlite.org/wal.html


Yes, but with a single writer at a time.

From the link you shared: "However, since there is only one WAL file, there can only be one writer at a time".


Not if you need to use multiple app servers.


Richard Hipp has said that they have signed contracts to support SQLite3 for 35 years. SQLite4 is never going to happen.


I love this section from their support page. Nothing like stability through 2050.

https://www.hwaci.com/sw/sqlite/prosupport.html

> Paid support options and products are provided by Hipp, Wyrick & Company, Inc., (Hwaci), a Georgia corporation with headquarters in Charlotte, North Carolina and has been in business since 1992. Hwaci has an international team of employees and associates representing the best available talent. We are a 100% engineering company. There is no sales staff. Our goal is to provide outstanding service and honest advice without spin or sales-talk.

> Hwaci is a small company but it is also closely held and debt-free and has low fixed costs, which means that it is largely immune to buy-outs, take-overs, and market down-turns. Hwaci intends to continue operating in its current form, and at roughly its current size until at least the year 2050. We expect to be here when you need us, even if that need is many years in the future.


Dr. Hipp will be 89 years old in 2050 so he simply made a lifetime commitment.


Look at it this way: SQLite3 is (more or less) slowly becoming SQLite4, except for the parts that did not work out.

It is not as shiny, but in the long run, you still get all the goodness. Nevermind the name / version number.


It's basically the Perl 5 of the DB world.


Perl 6 you mean?


I read that as: it gives the impression it's been here forever, it's still being actively maintained, and will be for the foreseeable future. Benefiting from the “future” (sqlite4/perl6) at a slow but steady peace. So, really like Perl 5.


perl5 has been stealing a bunch of stuff from perl6 and is still actively maintained and doing a major release with new features annually - also continues to Just Work with an extreme commitment to backcompat.

So I'm pretty sure he did mean perl5, and as a happy user of both perl5 and sqlite the comparison seems apt.


Not at all. perl5 is horrible tech, with no development and being actively destroyed by its maintainers.

Whilst SQLite3 is at the very top of its class, with lots of new features, and very well maintained.


For context, rurban has had a long-ongoing feud with the perl5 maintainers that eventually resulted in his removal from the mailing list for being unwilling or unable to disagree sufficiently civilly to hold an effective discussion when people felt he was technically incorrect.

He's also, OTOH, a technically brilliant developer whose positive contributions will be missed.

(also if you ever run into him at a conference, I'd recommend grabbing a beer with him, I've always enjoyed doing so in spite of our spirited disagreements over various things)


What kind of companies sign 35 year support contracts?


The kind that give away their product for free, give the company to their wife, develop an in-house distributed version control (fossil (which is excellent)), give that away for free and encourage other version control projects to please steal ideas from fossil. drh (Richard Hipp) is an interesting duck.


Presumably Hipp doesn't sign a 35 year support contract with his own company?


Ones that have effectively 100% test coverage on their codebases, massive existing deployments, famous levels of documentation, and world expert level comprehension of their problem space.


SQLite is used for small and embedded systems, which (if successful) can have very long lifetimes. If you were building something like an ATM for instance, you would be very sensible to sign a 35-year support contract for a crucial part of your system.


Have you seen an ATM? They're running Windows & SQL Server now. Microsoft isn't offering 35yr support contracts for Windows.

This is only partially sarcastic.


Imagine a company like Boeing then, which does have a 35 year shelf-life on their products.


Pretty soon we'll see aircraft running Kafka, microservices, elasticsearch, etc.

All running in kubernetes.


At least that way if the plane crashes you can just restart it.


Restart just the microservice that crashed!

  $ sudo service port-flaps restart


"That's not the sort of 'above the clouds' we had in mind"


My stack


Microsoft has nowhere near the quality, security and bug-fix track record, documentation coverage, nor testsuite coverage that SQLite offers. There's simply no way they can possibly provide such support.


There are products that Microsoft offers long support contracts for. To give an example, EOL for XP Embedded is in April 2019, which means XP in all its forms would've been supported for about 18 years.


I read somewhere the other day that sqlite is used somewhere in Airbus A380 passenger jets, and Airbus have a support contract for it.



SQLite was originally developed for missiles. Aircraft sound like a natural extension of that use case.


People who sell or support things with long lives where retrofitting is expensive or impossible.

Example: industrial equipment, military stuff, bridges, aircraft, etc.


I cant even imagine working on the same project for 35 years... let alone how different computing might be after that time.


They rolled with it, they made their own source control system, as evidenced by the page that was linked.


At the same time, given the quality of the code base and the test coverage, I can imagine supporting a code base that I believe in and am extremely proud of, and that is extremely popular and widely used, for 35 years.


We have production IBM iSeries code (AS/400) that has (c) 1977 in a bunch of places. The platform lives...


Every time I see something about sqlite, I become sad. It reminds me of the failure of the w3 standards comittee to accept it as web standard. They rejected sqlite because no competing implementation existed. Furthermore, "public domain" license of the software was also a hurdle, iirc.


We literally lost several years for web app advancement because of that. Reading the decision making, it seemed like overly-legalistic engineers, but I'm open to conspiracy theories that this decision enhanced mobile app store adoption.


No, it was Mozilla who killed it[1] over Apple and Google's strong objections.

[1] For pretty much complete nonsense NIH and standards-lawyering reasons.


> For pretty much complete nonsense NIH and standards-lawyering reasons.

You consider the fact that there is no spec other than "how the current implementation works", so no way to know if the code than works on version n will continue to work on version n+1, and no way to produce an alternate implementation, is a "complete nonsense reason"?

I also believe that it wasn't just Mozilla, but also Microsoft that scuttled this. Mozilla was the louder voice, but Microsoft never supported it either.


> You consider the fact that there is no spec other than "how the current implementation works", so no way to know if the code than works on version n will continue to work on version n+1, and no way to produce an alternate implementation, is a "complete nonsense reason"?

Yes. Sqlite is among the most stable and mature software in existence. I challenge you to name a single browser technology or API that has been anywhere remotely as stable as Sqlite, especially as implemented. As mentioned elsewhere ITT, Hwaci has contractual commitments to support backwards-compatible Sqlite3 for at least the next 35 years.

At that point, if I had to guess, "web browsers" won't be around any more, the web itself will be a dying or dead legacy technology, and companies yet to be founded will lead the tech sector, while Apple, Google, Microsoft and most likely of all Mozilla will be IBM-style dead letters or Digital[1]-style dead.

Sqlite was primarily written by one person and is maintained by three people.[2] If for some reason Mozilla had to fork or take over maintenance of Sqlite, it wouldn't even be worth noticing in their budget. It would, in fact, likely cost considerably less developer time than they spent designing, implementing and lobbying for IndexedDB.

More likely, if Sqlite3 were no longer being suitably maintained, all the browser vendors, plus the hundreds of other major tech companies that use it, would jointly endow a Sqlite Foundation to take over maintainence, which might cost Mozilla 5 grand or so a year, ye gods.

Even if Mozilla felt compelled for some reason (avoiding dependencies?) to rewrite Sqlite from scratch, Sqlite is 125k lines[3] of ANSI C compared to Firefox's 18 million lines[4] of C++, so Sqlite represents roughly one month of Firefox development time.

Firefox, Chrome, IE/Edge and Safari all make very extensive use of Sqlite internally, including to implement IndexedDB, so, obviously, Mozilla does not actually consider dependency on Sqlite and its API to present a meaningful risk.

Further, as things stand (and as they stood in 2010), Sqlite's massive footprint in the Firefox codebase means Mozilla already must rely on Sqlite remaining maintained, just the same as if WebSQL had been standardized.

> I also believe that it wasn't just Mozilla, but also Microsoft that scuttled this. Mozilla was the louder voice, but Microsoft never supported it either.

MS, especially at the time, ordinarily tried to slow or stall the standardization process for web technologies in general, so IE wouldn't fall too far behind. I expect better of Mozilla.

Mozilla was also by far the loudest (really the only) voice against WebSQL; MS barely participated in the conversation and was mostly noncommittal when it did. It's unlikely MS would've objected at all if Mozilla hadn't given them the opening.

[1] https://en.wikipedia.org/wiki/Digital_Equipment_Corporation

[2] https://sqlite.org/crew.html

[3] https://sqlite.org/testing.html

[4] https://www.openhub.net/p/firefox/analyses/latest/languages_...


I too am so disappointed SQLite isn't available in modern browsers (esp. since it was, for a time). But can't it be resurrected? Couldn't we set up a petition somewhere to bring it back?


It is a bit of a "would you really do that in production?" type hack, but there is a pure JS compile of SQLite3 that you could use: https://github.com/kripken/sql.js/

Possible problems:

* Nearly half an MB of library to add to your project which might be a concern on mobile (~2.1MB uncompressed.

* It handles the whole DB in memory rather than trying to use any sort of local storage as a block store, which pumps up memory use (again, mobile user may particularly find this an issue) and to persist data you have to pickle the whole DB as a single array (which could be a significant performance issue if the data changes regularly and is not very small) and reload it upon new visit.

* Concurrency between multiple tabs/windows is going to be an issue for the same reason.


Actually, it's still available in Chrome, Safari and Opera. I guess Firefox and IE still don't have it.


And the horrible excuse that SQL was too complex to be exposed to web developers, so we now have SQL emulation libraries on top of IndexedDB that is implemented with SQLite. Oh yeah!


And they think index db is better..


IndexedDB is a hellhole, but it fitted the NoSQL trend at the time - that's why it was picked and SQLite dumped as "fossil".

I'd love it if someone takes the time and creates a SQL engine based on IndexedDB so one can do serious SQL work instead of working around the many limitations in IndexedDB.


That is strange, for some reason I was under the impression that all browsers natively support sqlite. I even installed a Chrome extension when I wanted a gui to browse sqlite db's. I am disappointed to hear this.


The biggest thorn I found working with sqlite was the lack of ability to modify columns with ALTER TABLE which was a real pain.

Doesn't look like this is fixed in sqlite4 though...


There's an easy workaround: rename the table, create new table, insert into/select from.


The migration tools I work with have this already automated and (having been involved in the design of said automation) it's really not that difficult to add.

Other DBMSes often have to do complete table rewrites for certain types of ALTER anyway - so while, yes, it's faintly annoying to have to do that in userspace rather than having the db engine do it for you, it's not nearly as big a disadvantage as it first appears.


Same here. Had to switch to dockerized mariadb for local tests, because migrations wouldn't work.


Wouldn't you want your tests to be run against the same DB family (and version) as production anyway?


Depending on the situation, it can be well worth it to have your test suite run against SQLite while doing active development to be able to iterate faster and then run it again against the target database before pushing the branch.

Where possible I much prefer to spin up a version of my target database in a tempdir but "faster test cycles" is sometimes worth accepting the trade-offs.


I suppose it was a long time coming. SQLite was very convenient for me while the product was in a very early development stage, with everything changing rapidly.


Why not work around the expectation and simply migrate offline? (eg. dump DB, hack dumpfile/stream, load new CSV?) While you may lose instantaneous constraint validation, it would almost certainly be faster and allow you to work with known and well tested tools. Conforms to the Unix design philosophy: "Store data in flat text files." / "Write programs to handle text streams, because that is a universal interface." http://github.com/globalcitizen/taoup

Since you were nominally optimizing for migration, a zoom-out perspective may be to note that upgrading SQLite3 versions vs. upgrading major RDBMS versions is trivial/fast, relatively rarely required, also cohabitation of multiple versions works a lot easier, any kind of CI/CD process is going to be orders of magnitude faster and use much less CPU/memory/disk space, which means smaller build artifacts and thus faster transfer/download.


You can add columns at least.


lack of window functions kill it's usefulness for me.



So in essence it's a lot of work for a small functionality gain?

There are some things that are inconvenient to do without window functions, but my primary use for them is simple optimization. Most of the cases where I've used window function in postgres, the query would be trivial to rewrite to use a subquery instead. Window functions always led to a much faster query that scaled much better over large query sets.


Exactly why I miss them so much.


Reading through the responses just screams: not worth the effort. I know it’s an open source volunteer product and even with a commercial arm the lack of window functions (aren’t they part of the SQL standard?) is laughable.


There's an excellent ~80 minute podcast interview with the sqlite author here: https://changelog.com/podcast/201


Would love to hear some of the lessons learned...


Their main idea was that B-Trees are slow and LSMs are fast.

This was a partially right assumption, but only for writes.

If you write something in a DB you check some constraints and those checks are reads.

So most DB writes come with a bunch of reads.

The reads were slower with the LSMs, so the B-Trees performed better in "real world" writes (which come with reads) and LSMs only performed better in "artificial" writes (without reads).



SQLite is great. For an unusual application see actordb.com - a server side database that uses a large number of independent SQLite databases.


Woah, that's awesome! Any performance information anywhere?


Interesting and I saw title and thought to my self hmmmm... may be SQLite4 is just around the corner. This is a good case study to show people look sometimes classic works better and NoSQL coined terms and techniques might work in limited scenarios. Still makes me wonder if LSM would have been faster for mobile devices though, I know it might not work well for embedded devices; but with modern mobile devices (1+ GB of RAM) it might have some speed benefits.

Shameless plug https://github.com/maxpert/lsm-windows (I did port the LSM storage to windows).


Anyone know how sqlite makes money?


Possibly through their testing package (which is proprietary)

https://www.sqlite.org/testing.html

"The TH3 test harness is a set of proprietary tests, written in C that provide 100% branch test coverage (and 100% MC/DC test coverage) to the core SQLite library. The TH3 tests are designed to run on embedded and specialized platforms that would not easily support TCL or other workstation services. TH3 tests use only the published SQLite interfaces. TH3 consists of about 57.3 MB or 782.3 KSLOC of C code implementing 42213 distinct test cases. TH3 tests are heavily parameterized, though, so a full-coverage test runs about 1.7 million different test instances. The cases that provide 100% branch test coverage constitute a subset of the total TH3 test suite. A soak test prior to release does hundreds of millions of tests. Additional information on TH3 is available separately."

https://www.sqlite.org/th3.html

"TH3 License

SQLite itself is in the public domain and can be used for any purpose. But TH3 is proprietary and requires a license.

Even though open-source users do not have direct access to TH3, all users of SQLite benefit from TH3 indirectly since each version of SQLite is validated running TH3 on multiple platforms (Linux, Windows, WinRT, Mac, OpenBSD) prior to release. So anyone using an official release of SQLite can deploy their application with the confidence of knowing that it has been tested using TH3. They simply cannot rerun those tests themselves without purchasing a TH3 license."



>SQLite License. Warranty of title and perpetual right-to-use for the SQLite source code.

<from more info>

Obtaining A License To Use SQLite

Even though SQLite is in the public domain and does not require a license, some users want to obtain a license anyway. Some reasons for obtaining a license include:

    Your company desires warranty of title and/or indemnity against claims of copyright infringement.
    You are using SQLite in a jurisdiction that does not recognize the public domain.
    You are using SQLite in a jurisdiction that does not recognize the right of an author to dedicate their work to the public domain.
    You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite.
    Your legal department tells you that you have to purchase a license.
If you feel like you really need to purchase a license for SQLite, Hwaci, the company that employs all the developers of SQLite, will sell you one. All proceeds from the sale of SQLite licenses are used to fund continuing improvement and support of SQLite.

</from more info>

How is it possible that they can sell licenses to the code that was put into the public domain by other contributors?

A contributor must attach the following declaration[1] to contribute. So now their contributions are in public domain. Now in a place where the law doesn't recognize public domain, doesn't the code belong to the original authors? How can an unaffiliated company license it as if they wrote the code?

[1]: "The author or authors of this code dedicate any and all copyright interest in this code to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this code under copyright law."


>How is it possible that they can sell licenses to the code that was put into the public domain by other contributors?

"Public domain" does not mean: this has to be free, nobody can ask for money.

"Public domain" means: noone has the right to stop anyone else from doing what they want.

The same is true for things like the Mona Lisa. You can snap a photo of it (that photo may or may not be copyrightable, depends on the country, meaning you may also "steal" someone elses photo if applicable), print posters and sell them. This is possible because the intellectual property is in the public domain. It doesn't mean you have to give your posters away for free. You can even charge people to look at your photo -- disregarding the fact that noone probably would pay.


My question was the assertion of the ability of granting licenses in a place which doesn't recognize public domain. If the place doesn't recognize public domain, then in that place, the copyright of the code contributed by a user is still the users. So where does the company get the authority to license it?

Couldn't the person who put in into public domain go to the country and sue them saying the company is licensing their code? How can the company defend itself? We all know how it should go but that might not be what actually might happen.


>So where does the company get the authority to license it?

The company doesn't need any authority because anyone can "license" public domain software, just like I can sell you a star in the sky. It's a worthless piece of paper, its a gimmick and a way of making a donation to a cause you appreciate.

>Couldn't the person who put in into public domain go to the country and sue them saying the company is licensing their code?

Again, you're misunderstanding what public domain means. "The person" you mention has no right to sue anyone because the thing being discussed has been released into the public domain. Nobody owns it.

To illustrate: when you say (paraphrased); "The author could sue you for misusing what he/she put in the public domain" the logic doesn't hold, because if the author could to that it wouldn't be released into the public domain.


It's in the public domain. Of course they can sell it. So can you.


>How is it possible that they can sell licenses to the code that was put into the public domain by other contributors?

Because you're required to sign a piece of paper that says: Yes, I agree to place this code in the public domain, if you want to contribute a patch to SQLite. What you're paying Hipps company for, when you buy a license, is keep records that says that all code is in fact in the public domain.


Hwaci is a U.S. company, and the U.S. recognizes public domain. Therefore, no contributor can sue Hwaci in the U.S. for selling SQLite licenses.


I guess the question is what prevents those contributors from suing the licensee (or even Hwaci) in the country that doesn't recognize public domain?

I assume we'll never find out because contributors obviously don't intend to sue anyone if they release their code to the public domain, but it's strange that a company would be comfortable buying a possibly illegitimate license when they're uncomfortable with a public domain declaration.


I guess most companies that buy from Hwaci are more concerned about long-term support than about this public domain thingy, and "yes, we obtained a license" is just another box they need to check. $6K to check a box is pocket change to these companies anyway.

Besides, most countries in the developed world have some notion of giving away one's copyright (not moral rights, which can't be given away) so it might not be as much of a problem.


In terms of an open source projects, they do have more than a fair share of big companies that seem to take an interest:

SQLite is developed and maintained by the SQLite Consortium, who's members include Mozilla, Adobe, Oracle and Adobe. The embedded SQL database is used in a number of well-known applications, such as Adobe's Lightroom, Apple's Mac OS X operating system and Mozilla's Firefox and Thunderbird.

http://www.h-online.com/open/news/item/SQLite-3-7-0-adds-Wri...

And since the link above is fairly old, I suspect since then the list of big companies supporting the project has probably grown.


Support contracts according to another comment here. That's usually how open source projects make money as well.


Most established providers resist putting any code into production without a support contact for it.


And sqlite's 100% test coverage makes it really attractive for the kind of customers who have no problem paying for long-term support.


The title of this post, a true statement removed from its context, attracts readers like me because its implied meaning has considerable shock value. Of course, Dr. Hipp didn't help matters by naming his experimental fork "SQLite4".


I'm surprised there wasn't more investigation of SQLite and LMDB:

https://github.com/LMDB/sqlightning

The performance there shows either little to no performance difference, up to substantial speed increases.


The source tree for sqlite3 now contains an extension named lsm1 that contains both the standalone lsm kv database as well as a virtual table extension which allows you to use it directly from sqlite3. Some info on python integration can be found here:

http://charlesleifer.com/blog/using-sqlite4-s-lsm-storage-en...

In peewee 3.0a I've also added built-in support for using the lsm1 virtual table if you're interested.


instead of pretending to release a new version, it might be better to just call this fork sqlite-failed.


Doubt still exists... Does this mean 'concluded' as in "We've finished polishing the pre production code and are close to releasing it" or 'concluded' as in "We have thrown our hands up in the air and won't be working on this thing any more to bring it to production" ??!!??

EDIT Seeing as I am getting slammed by downvotes, my comment here was simply pointing out that the headline I saw on HN could be read in multiple ways. As a long time user of SQLite3, I was initially excited when I read the title as I had thought it meant something good coming from the SQLite team. Turns out not to be. That, to me, still entails doubt.


" Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4."

Right there on the web page. Highlighted in green even.


Come on. Shaming helps nobody. I know I've missed seemingly obvious things before and am sure everyone else here has as well.


Below the fold even, after about 1000 other words of text. Would be better if the title had used 'cancelled' or 'terminated' instead of 'concluded' to explain the situation at the outset?


It's a code revision, not an article. If you're a programmer the change is super obvious.


From the commit:

> This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.


The answer you seek is in the link, in highlighted green text.


Yes, it is, but I also initially assumed that this was a successor to SQLite3. No, it didn't take long to correct that misunderstanding, but it would be nice if the title were clearer.


HN, stop being so monumentally stupid. This is a genuine question, and one that I had too.

(To clarify, this is directed at all the downvoters, not the commentator I'm replying to.)


I didn't downvote, but I don't think it's too much to ask for commenters to read the linked web site carefully before they jump in with a comment. The answer is literally in the link.


It's not too much to ask. A person overlooked a thing once. Heck, I overlooked it too because on my phone the green text was so incredibly small I coudln't read it.

I don' think it's too much to ask for commenters to have a little charity.


They are just random internet points and this thread adds nothing to the discussion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: