Ask HN: Have you used SQLite as a primary database?

Glench · on April 25, 2022

Here's an all-time great post about why you might consider SQLite in production with data about performance: https://blog.wesleyac.com/posts/consider-sqlite

I use SQLite in production for my SaaS[1]. It's really great — saves me money, required basically no setup/configuration/management, and has had no scaling issues whatsoever with a few million hits a month. SQLite is really blazing fast for typical SaaS workloads. And will be easy to scale by vertically scaling the vm it's hosted on.

Litestream was the final piece of the missing puzzle that helped me use it in production — continuous backups for SQLite like other database servers have: https://litestream.io/ With Litestream, I pay literally $0 to back up customer data and have confidence nothing will be lost. And it took like 5 minutes to set up.

I'm so on-board the SQLite train you guys.

[1] https://extensionpay.com — Lets developers take payments in their browser extensions.

JohnBooty · on April 25, 2022

This was so inspiring to read. It's a very balanced take about the pros and cons of using SQLite vs. Postgres at scale.

I say "inspiring" because using SQLite reminds me of the simplicity and productivity from coding for the "early web" that lost 10-15 years ago. The days when you could spin up a website without worrying about a bunch of ancillary services and focus on the app itself.

For me, SQLite's lack of online schema changes seems like perhaps the biggest blocker to actual production. I've never had a production project where the schema didn't change a lot.

djbusby · on April 25, 2022

I have this beef too. Tooling for dumping and restoring into a new schema are easy/simple/fast. So, these schema migrations can happen w/o issue. Some tricks with the PRAGMA directive in SQLite so you can roll out changes (eg: code supports old/new schema while migrating)

JohnBooty · on April 25, 2022

Interesting, that's super cool to read.

   Tooling for dumping and restoring into a new schema are easy/simple/fast.

Any resources you can point to that expand on this? Is this standard SQLite tooling? I'm curious how it would perform with large-ish databases - a few hundred GB or perhaps several TB.

(This is one of those things where I can "just Google it" but I was wondering if perhaps there was a particularly useful article that points out potential gotchas, etc)

djbusby · on April 26, 2022

I'm just using shell scripts and the SQLite CLI. I dump/restore files to the same FS so I can mv into place when ready.

Checkout the PRAGMA user_version, I just modify on schema change, so app knows where stuff is.

https://www.sqlite.org/pragma.html

tianzhou · on April 30, 2022

We also used SQLite and leverage PRAGMA user_version to do the trick. But decided to move to PostgreSQL mainly due to the schema migration constraints.

Our experience is SQLite is OK for small to medium projects, but when the business logic/data model becomes more complex, SQLite is not sufficient.

More discussions are captured here https://news.ycombinator.com/item?id=31038614

ignoramous · on April 25, 2022

> With Litestream, I pay literally $0 to back up customer data and have confidence nothing will be lost.

This is not a guarantee Litestream makes (nor it can, since replication is async).

You'll lose things to catastrophic failures, but chances are you'd be able to restore to a last known good checkpoint.

mekster · on April 25, 2022

What is the point of using SQLite under a web service?

I thought people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant. (Things seem to be getting better with strict column types these days.)

I've recently migrated a smallish service from MySQL to PostgreSQL and figured it's quite a work if you're not careful writing by the SQL standard which means if the service had gotten bigger, your chance of moving away from SQLite kind of walks away.

So, why not use a safer choice to begin with? Nothing is complicated running MySQL/PostgreSQL unless you've sold yourself to AWS to care for the cost and don't know how to run a DB instance yourself.

oppositelock · on April 25, 2022

SQLite is far faster than Postgres or MySQL, however, the price you pay for this is having a single writer thread, and it's a library incorporated into your process, not a shared DB. It's faster because those other features of a server have a cost as well, particularly the cost of write arbitration.

SQLite is fine when all your load can be served by a single backend process on a single machine. The moment you need multiple backends to handle more load, or the moment you need high availability, you can't do it with SQLite. SQLite has very limited DDL operations, so you also can't evolve your schema over time without downtime. Now, for streaming backups - how do you come back from node failure? You're going to incur downtime downloading your DB.

I run many SQL backed production services, and my sweet spot has become a big honking Postgres instance in RDS in AWS, with WAL streaming based replicas for instant failover in case of outage, and read replicas, also via WAL. This system has been up for four years at this point, with no downtime.

I love SQLite, and use it regularly, just not for production web services.

js2 · on April 25, 2022

> my sweet spot has become a big honking Postgres instance in RDS

Why do you prefer PostgreSQL RDS over Aurora RDS? Aurora seems better in every way but price[1]. (I know it also had some growing pains at launch.)

[1]: Amazon RDS for PostgreSQL is ideal when you have a small-to-medium intense workload. It works best when you have limited concurrent connections to your database. If you’re moving from commercial database engines such as Oracle or Microsoft SQL Server, Aurora PostgreSQL is a better choice because it provides matching performance with a lower price.

https://aws.amazon.com/blogs/database/is-amazon-rds-for-post...

oppositelock · on April 26, 2022

Price is the main reason. Write performance is a second one.

js2 · on April 26, 2022

The article I linked implies that Aurora offers higher write performance:

> If your database workload reaches the Amazon RDS max limit of 80,000 IOPS, and it requires additional IOPS, Aurora PostgreSQL is the preferred database solution. If database workload requires less than 80,000 IOPS, you can choose either Amazon RDS for PostgreSQL or Aurora PostgreSQL based on supported features.

SnorkelTan · on April 25, 2022

SQLite has write ahead logging as well

https://sqlite.org/wal.html

oppositelock · on April 26, 2022

It doesn't let you run an active replica from WAL.

lm28469 · on April 25, 2022

> So, why not use a safer choice to begin with?

I know several people who build projects like that. It take them months to get a working product, just to discover it doesn't interest people or doesn't work like they expected. If for every piece of tooling you go for the "safe" and most performant one you gain bloat and complexity real quick.

People underestimate "simple" tech performance, in 99% of projects by the time your bottleneck is your DB system I can assure you that it'll be the least of your concerns

mekster · on April 25, 2022

Which part of running MySQL instead SQLite is over engineering?

post-it · on April 25, 2022

- Setting up a MySQL server on both your dev machine and server, and making sure they're the same version (extra fun if they're on different OS versions)

- Setting up an out-of-repo config file on the server with your MySQL credentials

- Setting up a backup script for your server data

It's only about an hour of work total, but it's an hour of work that I hate doing.

marcosdumay · on April 25, 2022

Hum... MySQL is one of those labor-generating technologies that a Luddite would fall in love for. Those are best avoided. But for Postgres...

- You make sure the production version is larger or equal than the development, or you make sure to not use new features before they reach production, what is quite easy. There is no problem with different OSes (except for Windows itself not being very reliable, but I imagine you are not using Windows on production, as it's another one of those labor-generating techs).

- Trusting a local user is the same level of security you get with SQLite, no credentials required.

- And setting a backup script... Wait, you don't do that for SQLite? There's something missing here.

Yes, there are a lot of small tasks that add up when setting some new software. It's a pain. But it's a pain you suffer once, and it's over. It's worth optimizing, but not at any ongoing cost.

jamesrr39 · on April 25, 2022

docker-compose is the way to go for keeping the dev versions synced with the production version. And for the backup script scheduled mysqldump and copy to storage should see you through quite far, so not really any more effort than copying an SQLite database.

orthecreedence · on April 25, 2022

This person is talking about reducing complexity, so I don't think adding more moving pieces to the machine is the way to go.

b20000 · on April 25, 2022

and now you need to learn docker and 10 other things.

mekster · on April 25, 2022

So you trade for some risk for an hour.

TheRealPomax · on April 25, 2022

Please do explain what risk you're thinking of, as anyone smart enough to write their own SaaS would not put resources in the web server's file system tree? You stick your db file in an normal secure location outside the server's root, chmodded appropriately so that it suffers the exact same risks as any other file on the OS. It's no more or less risky than /etc/shadow, while being considerably easier to work with (and less failure-prone for light db work) than an independently running database service.

mekster · on April 25, 2022

The risk is as I had written previously that it takes some effort to move away from a db to another when the need arises when I see no benefit in choosing SQLite in the beginning.

I'm not a professional db engineer but one point is that there doesn't seem to be a way to create functions in SQLite which would mean creating triggers on various tables can cause excessive amount of duplicate code.

If I rely on PostgreSQL, I feel covered for my use case for web apps but once you hit some little gotchas in SQLite, you may regret about saving 10 minutes (install db and set up a password) for nothing.

post-it · on April 26, 2022

If a project of mine ever gets enough traffic to force me away from SQLite, I expect I'll be rolling in dough and willing to put in the effort.

TheRealPomax · on April 25, 2022

That's not a risk, that's just an inefficiency further down the line (migrating data from sqlite to a "real" database can indeed be quite a chore, but far less so if you formalized your schema and constraints beforehand, so that a migration mostly involves exporting to SQL, rewriting where needed, and then importing into whatever dbms you end up using later on in the lifetime of your project).

When we're talking about risks, think security exploits: how is sqlite3 more likely to get your data leaked, or flat out copied in its entirety, compared to using a mysql/postgres/etc.

mekster · on April 25, 2022

Inefficiency or not, if you start down the path of SQLite, you need to invest good amount of time refactoring into another DB if you feel like migrating away.

When I realized SQLite would store any type of data in any kind of column type, it was obvious SQLite is different from others. They only added strict types about a year ago but scared me enough not to use it.

And how is SQLite any less secure? You can flat out copy its entirety using pg_dumpall. I'm not talking about security.

TheRealPomax · on April 28, 2022

While that's an unfortunate discovery, I do need to point out that that's a problem of your own making, not SQLite's fault. SQLite is very clear on what you can expect, with dynamic typing being one of the main differences between it and "regular SQL" databases.

Yes, plenty of folks just go for SQLite blindly, but the consequences of that aren't SQLite's doing: if you take the time to read through https://www.sqlite.org/features.html because you want to know what it can actually do, you'll almost certainly click through to https://www.sqlite.org/omitted.html because you'll want to know what it doesn't do, and then you'll see the "See also the Quirks, Caveats, and Gotchas of SQLite." link and you're going to follow it an read through https://www.sqlite.org/quirks.html because those sound pretty important to know about before you commit to using something that is going to be your application/service data store for the foreseeable future.

deckard1 · on April 25, 2022

your app has full access to the SQLite file. MySQL/PostgreSQL have users and permissions. Security is about layers, and SQLite is removing one layer of security. You can, for example, put DELETEs or access to certain tables on a separate user that your web app has no access to. With SQLite, if your app gets hacked then they can do anything with the whole DB they want to. In addition, with a separate DB process you get audit logs. If someone hacks your SQLite app they may have access for months before you realize it, if you ever do. Especially if they are doing something subtle like UPDATEs on specific tables/fields that may go unnoticed but provide the hacker some benefit. This is why you can't simply rely on the idea of using a backup. That's only going to help if the hacker totally trashes your DB.

With a separate DB you may have a hope of detecting when someone hacked your app. But without that firewall, the question becomes: how much of the data in my SQLite can now be trusted? If you don't know what backup is safe to restore, then you can't trust any of it.

Again, this is about layers. Not saying MySQL/Postgres will save you. But they can increase the odds.

TheRealPomax · on April 25, 2022

If your server or API can be exploited, it doesn't matter whether there's an auth layer in between. Your SQL server runs as a service to connect to, your sqlite3 file is a file that you need access to. They're the same kind of layer: you need to break through the server's security to ever get to them directly, and if your app gets hacked such that the hackers gain file system access, then:

1. You're fucked. The end. It doesn't matter whether you were using mysql, postgres, or sqlite3, or S3, or Redis, or any other server your app was connecting to: they can just look at your environment vars.

That's not going to happen "because you're using Sqlite3", that's going to happen because you used some obscure server software, or worse, rolled your own.

People really do seem to put too much faith into "it has a username and password, it's more secure". It's not: if someone has access to your actual server, they have access to everything your server has access to. Sqlite3 is no more or less secure than a dbms daemon (or remote) in that sense.

ipaddr · on April 25, 2022

With sqlite your server, api or application can be hacked. The most common and likely hack would be somewhere in your application. It really doesn't make sense to use sqlite here.

Setup a separate database server and use it for all of your projects. That one hour pays off each and every project.

kyawzazaw · on April 25, 2022

Pieter Levels has been using SQLite for nomadlist and I think it's been going well for him.

post-it · on April 25, 2022

Some risk of what?

dgb23 · on April 25, 2022

Running MySQL/Postgres over SQLite:

- needs to be provisioned and configured

- needs additional tooling and operational overhead

- comes with a _large_ performance overhead that is only won back if you have quite a significant load - especially writes, which means the vast majority of web projects are slower and require more resources than they should.

- it makes the whole system more complex by definition

It is a cost-benefit thing that tilts towards RDBMS as soon as you need to sustain very high transactional loads and want a managed, individually accessible server that you can query and interact with while it's running in production.

But if it is just "a website that needs durability" then you haven't yet shown how that tradeoff is worth it.

rapind · on April 25, 2022

Just wanted to add another’s scenario where postgresql has been useful to me. Functions. There are cases where you have expensive operation(s) that reference a lot of persistent data. Even without massive traffic these operations can be prohibitively expensive in the middleware. Leveraging database functions can be a massive performance improvement (100x + for me), especially if your middleware is slow (e.g. rails).

I’ve used SQLite in production once and it worked great. But that was a very simple app. For more complex (but not always higher traffic) I’m leaning more and more on postgresql and less on my middleware, like moving business logic to the database when it makes sense.

dgb23 · on April 25, 2022

If you need the expressiveness and power of Postgres then sure, it has also way better JSON support for example, there is generally better tooling for it as well and so on. But in this case, your database becomes it's own _thing_, has much more value outside of being just durability for an application. Like for example Supabase is doing things. That's a very fundamental design decision IMO. I explored this and it is very attractive and robust, but serves different use-cases.

dspillett · on April 25, 2022

> What is the point of using SQLite under a web service? … people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant.
My understanding is that for read performance SQLite is pretty damn good, outperforming MySQL and Postgres in both single and concurrent tests. The key performance issue is the single global write lock. If your data access pattern is massively read biased then SQLite is a good choice performance wise, if you see a lot of write activity then it really isn't.

With regard to being correct*, it offers proper ACID transactions and so on. Typing is a big issue for some but far from all. It is significantly more correct than mysql used to be back before InnoDB became the default table type in ~2010, at least as correct as it now (aside from the data types matter depending on which side of that you sit on).

Glench · on April 25, 2022

> What is the point of using SQLite under a web service?

Take a look at the Consider SQLite post I linked. They address your performance questions too.

For me, SQLite was a nice way to simplify launching and running my SaaS business and has had no downsides.

iampims · on April 25, 2022

Any tips you can share on how to deploy your saas app without downtime when using sqlite?

srcreigh · on April 25, 2022

For my personal project I'm planning to use a VM with an SSD. Manually I'll use caddy to switch over to a new running backend service with readiness check.

As for scaling if I need it I can increase disk space for the app server or scale out horizontally/vertically. Don't need that yet so I'm waiting for more details in the future to decide how to handle that.

iampims · on April 25, 2022

Which backend “owns” the sqlite database?

That’s what I haven’t seen mentioned anywhere, if the database is part of the application, how do you switch from one version to another without downtime.

srcreigh · on April 26, 2022

The VM owns the database. Multiple backend service versions run on that same VM. Usually just one but 2 while migrating traffic to a new version.

I haven't figured out details regarding having 2 processes both writing and reading to the SQLite DB at once. It might just be fine. With a 1 minute request timeout I can just shut down the previous version after 2 minutes (should receive no new requests after 1 minute) and caddy will have sent requests to the new version for a while.

Not sure which types of errors I'll be seeing but the client may need to retry requests in some cases.

This is all just what I've planned but hopefully kinda makes sense.

simonw · on April 25, 2022

I would expect the performance of SQLite for queries against an index to outperform MySQL and PostgreSQL in all cases - because SQLite eliminates the need for network overhead by essentially executing those queries directly as a C function call.

So no matter how optimized MySQL and PostgreSQL are, SQLite will run rings around them for basic SELECT queries.

mekster · on April 25, 2022

Running against a unix socket doesn't seem to make that a selling point for SQLite.

simonw · on April 25, 2022

There's still serialization and deserialization overhead there. I would expect SQLite to win in benchmarks against MySQL or PostgreSQL on a Unix socket for basic "select * from table where id = 5" calls, but I've not done the work to prove it myself.

robertlagrant · on April 25, 2022

> I thought people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant

While "people" complaining is basically meaningless, I don't know why they'd be doing that about SQLite. It's used in most phones, all copies of Windows 10+, and countless other places.

djbusby · on April 25, 2022

Chrome browser has loads of them in your profile directory.

divbzero · on April 25, 2022

The Consider SQLite post mentions that one of SQLite’s in the past decade as “WAL mode (enabling concurrent reads and writes)”. Does this mean that the official advice to avoid SQLite for concurrent writes [1] is no longer a big concern?

[1]: https://www.sqlite.org/whentouse.html

simonw · on April 25, 2022

I think the way it's worded in that SQLite documentation page is still accurate:

> SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.

If your application needs to support hundreds of concurrent writes a second you shouldn't use SQLite. But that's a pretty high bar!

zoomablemind · on April 25, 2022

Also it should be noted that SQLite is a library, so it charges against the quotas for the main process.

This may become an issue with large run-time datasets even in read-only shared access scenarios.

woah · on April 25, 2022

Take this with a huge grain of salt because I am by no means an expert, but I currently am working on some scripts that import a few million rows into SQLite. I am using bash and the sqlite command line. I was getting a lot of concurrent write errors from sqlite (bear in mind i am only doing inserts of separate rows so in theory there is never an actual conflict), so I tried using WAL mode. It actually resulted in more contention. I ended up just going back to non-WAL mode and implementing an exponential backoff in bash to retry writes.

srcreigh · on April 25, 2022

are you ingesting the data under 1 transaction? This is a common SQLite issue as writes aren't slow but transactions are. By default 1 write = 1 txn but you can put millions of writes into 1 txn and get many orders of magnitude speedup

woah · on April 25, 2022

Yes, each row is written with one call to the sqlite command line. I was going to do something where I wrote inserts into a file and flushed them into sqlite in batches, but sqlite is not the bottleneck, so once retries were taken care of, it was good enough.

hampelm · on April 25, 2022

Thanks for this comment! I had tried some bulk imports into SQLite and mostly gave up after hitting similar limits, thinking I was doing something wrong with the configuration.

fauigerzigerk · on April 25, 2022

Litestream is indeed a missing piece of the puzzle. But it also defeats some of the purpose of using an embedded database library in the first place. Now you're back to juggling separate processes once again.

pkhuong · on April 25, 2022

https://github.com/backtrace-labs/verneuil is in-process (precisely to minimise deployment pain).

juancampa · on April 25, 2022

I had never heard of verneuil. Thanks for sharing. For anyone curious about the differences between the two:

"This effort is incomparable with litestream: Verneuil is meant for asynchronous read replication, with streaming backups as a nice side effect. The replication approach is thus completely different. In particular, while litestream only works with SQLite databases in WAL mode, Verneuil only supports rollback journaling"

Glench · on April 25, 2022

Good point! Although practically speaking I don't mind at all. "Juggling" is too strong a word — it's literally just starting the Litestream process and never thinking about it again. It's nice that it just slides into my existing app without any code changes.

fsaintjacques · on April 25, 2022

If the application is in go, you can likely embed litestream.

therealdrag0 · on April 25, 2022

> hits a month

Is not a very useful performance metric. What is your peak hits per second?

bob1029 · on April 25, 2022

I love watching people use "x per month" as some sort of architecture selection argument, especially when these arguments conclude in a proud justification of cloud sprawl.

There are single node/CPU solutions that can process 10-100 million business events per second. I am almost certain that no one logged into HN right now has a realistic business case on their plate that would ever come close to exceeding that capacity.

E.g.: https://lmax-exchange.github.io/disruptor/disruptor.html

This stuff isn't that hard to do either. It's just different.

robertlagrant · on April 25, 2022

ESAIDBUSINESSTOOMANYTIMES

mwcampbell · on April 28, 2022

Then replace business with the problem you're trying to solve, the good you're trying to do for your users, etc. Keeping the big picture in view doesn't have to mean you're only interested in making money.

robertlagrant · on April 29, 2022

I'm not criticising making money. Just s/business//g and it reads better.

doctor_eval · on April 25, 2022

I was looking at SQLite for a product I’m working on. It looks awesome and has improved significantly since I last looked.

The reason I decided against it is that it doesn’t have proper stored procedures. I use them a lot in PGSQL. They result in far fewer lines of code.

They also have the benefit of significantly reducing round trip calls to the database, which is one of the key advantages of SQLite.

But having used stored procedures for years, I can no longer bear the thought of writing SQL code in a host language, so I’m going to stick with PG for the time being.

Would be great to see something similar in SQLite; there are other advantages such as the single file database, that would work well in a microservice environment.

config_yml · on April 26, 2022

So you want to write functions in SQL? With SQLite you can define your own functions which can be called in your queries, but they do need to be written in your application language.

doctor_eval · on April 26, 2022

Yeah I know. But I find stored SQL to be a lot more concise than the application language, and less prone to bugs. I also find that it creates a much better separation of concerns between stuff that touches the database versus non-database stuff.

Just one example, error behaviour is well defined in PLPGSQL so I don’t have to constantly check for errors. That’s not true in my host language.

I am super impressed with SQLite, it’s just not a good fit for how I use databases yet.

ledgerdev · on April 25, 2022

What about zero downtime deploy of new version of your application? You have to take it down to restart, right?

adamckay · on April 25, 2022

"(5) ... SQLite allows multiple processes to have the database file open at once, and for multiple processes to read the database at once. When any process wants to write, it must lock the entire database file for the duration of its update. But that normally only takes a few milliseconds. ..." - https://www.sqlite.org/faq.html

You can start your new version of your application in a new process that opens the same database file, switch your load balancer to the new app, allow the old to drain all requests and then terminate the old app.

no_wizard · on April 25, 2022

how do you handle things like encryption and access permissions?

The only thing I have against using SQLite in production (for my needs) is the lack of at rest encryption and row level permissions by user.

marcosdumay · on April 25, 2022

At rest encryption is a complicated subject (in general, I bet most people get negative net security from it). For SQLite, you can either encrypt your disk or get one of the versions with added encryption (I only know of proprietary ones).

You don't do row level permissions on your database. You keep it all on the application layer.

ttfkam · on April 26, 2022

> You don't do row level permissions on your database. You keep it all on the application layer.

You say this as some sort of objective truth. Keeping security about data at the data layer can often be a really good idea and just as appropriate as GRANT/REVOKE at the table and column levels.

marcosdumay · on April 26, 2022

Well, if you want to use SQLite, you keep it at the application layer. That's an objective truth.

What is better, depends on what you are doing, and is not a simple choice in any way.

ttfkam · on April 26, 2022

That's a tautology, isn't it? If the persistence engine you're using doesn't support a feature—no matter what that feature is—you'll have to fill in the shortfall somewhere else, such as the app layer.

I'm just saying if you have the option, if the engine does support the feature, and the solution is a good fit for your problem, you shouldn't shy away from it just because some other engine doesn't support it.

marcosdumay · on April 26, 2022

Hum, no. The word "tautology" doesn't mean that.

Anyway, yes, SQLite lacks a series of features you get on other SGDBs, often for good reasons. In exchange it brings an entirely different set of features. If you want Postgres, use Postgres.

manish_gill · on April 25, 2022

One of my previous employers was using SQLite as a large distributed database - they had their own custom sharding strategy, but essentially the idea was to shard A accounts * B tables * C num_of_days with a .db file for every shard.

When I first came and saw it, it...did not sound right. But I didn't want to be the guy who comes in and says "you are doing it wrong" month 1. So I went along with it.

Of course, eventually problems started to pop up. I distinctly remember that the ingestion (happening via a lot of Kafka consumers) throughput was high enough that SQLite started to crumble and even saw WAL overruns, data loss etc. Fortunately, it wasn't "real" production yet.

I suggested we move to Postgres and was eventually able to convince everyone from engineers to leadership. We moved to a custom sharded Postgres (9.6 at the time). This was in 2016. I spoke to people at the place last month, and it's still humming along nicely.

This isn't to illustrate anything bad about SQLite, to be clear! I like it for what it does. Just to show at least 1 use case where it was a bad fit.

SQLite was a tempting first answer, but what solved it was Postgres, and we eventually offloaded a lot of aggregation tables to Clickhouse and turned the whole thing into a warehouse where the events got logged.

barryhennessy · on April 25, 2022

That's a good counter-case to keep in mind, thank you.

I guess the take away here is that this underscores that sqlite isn't for the 'large number of writers' scenario.

p.s. > I didn't want to be the guy who comes in and says "you are doing it wrong" month 1 Very wise

jakearmitage · on April 25, 2022

I also experience that 1 bad use case: heavy writes. I've always used SQLite first for any project, and when I did an in-house analytics tool for tracking user-initiated events (visits, button clicks, hovers, etc) I thought SQLite could handle it well. Unfortunately, even with heavy tuning, we saw WAL overruns and missing data.

Sadly, we had to move to Postgres and eat all that scaling complexity. :(

cfcosta · on April 25, 2022

That's for sure a good point. SQLite has really well defined limitations, and most if not all of those are by design. You should consider them BEFORE starting a new project with it, but it it is a fit, it's a great experience.

srcreigh · on April 25, 2022

Data loss is a pretty serious problem.

Do you have any more information about the situation?

Could it have been in the hand spun partitioning logic instead of SQLite?

What was the ingestion throughout roughly?

manish_gill · on April 25, 2022

Unfortunately I don't have numbers on hand. We approximated our Postgres would ingest around 1 TB over the course of a year, later I think? I could be wildly wrong.

It's been more than 5 years but from what I remember, it definitely was _not_ the partitioning logic (the sharding just meant we had a huge amount of files that were hard to organise). But a single consumer doing heavy writes on a single SQLite file would see enough traffic that pretty soon you would start to see errors and your writes would start to break.

the__prestige · on April 25, 2022

The sqlite docs page has a nice article [1] on when to use an embedded database such as sqlite and when to go with a client/server model (postgres, mysql or others)

When not to use sqlite:

- Is the data separated from the application by a network?

- Many concurrent writers?

- Data size > 280 TB

For device-local storage with low writer concurrency and less than a terabyte of content, SQLite is almost always better.

[1] https://www.sqlite.org/whentouse.html

DelightOne · on April 25, 2022

What is the recommendation for offline capability with sync?

whilenot-dev · on April 25, 2022

we considered PouchDB[1] (client) and CouchDB[2] (server) for an PWA back then (2017). nowadays i would probably favor WatermelonDB[3].

[1]: https://pouchdb.com/

[2]: https://couchdb.apache.org/

[3]: https://nozbe.github.io/WatermelonDB/

qwerty456127 · on April 25, 2022

> For device-local storage with low writer concurrency and less than a terabyte of content, SQLite is almost always better.

Isn't MySQL MyISAM faster and this way constitute a better choice for a scientific number crunching application? I mean near 4GB DB, very simple schema, heavy reading load, little/no inserts and no updates.

marginalia_nu · on April 25, 2022

With 4 Gb you might as well just load the data into RAM.

dotancohen · on April 25, 2022

But then you have to implement all the SELECT and DML logic yourself. SQL makes this a breeze with JOIN, ON UPDATE CASCADE, etc. And being SQL, it is very easy to maintain, even by the PFY that replaces you.

qwerty456127 · on April 25, 2022

SQLite (also H2 and some other embedded SQL databases) can be used entirely in-memory, one can also drop an SQLite file on a RAM-hosted filesystem (tmpfs/ramdrive). You really can put everything into RAM (and still enjoy SQL) if you have enough, don't mind long cold-load and potential data loss.

dotancohen · on April 25, 2022

It looked to me that the GP was suggesting to keep the data in the application instead of in a DB. But yeah, I suppose he might have meant a HEAP table instead of MyISAM.

ttfkam · on April 26, 2022

If your RAM exceeds the size of your tables and indexes, that data will be served from RAM in any modern relational database system. No special config usually necessary for the speed but you don't lose everything when the power goes out, unlike tmpfs/ramdrive option.

dotancohen · on April 26, 2022

That would depend on the DB server settings. Such a config might be found on a dedicated database server, but I doubt such settings would make sense on a machine running e.g. an application server together with the database.

marginalia_nu · on April 25, 2022

I'd argue adding SQL into the mix makes it difficult to maintain, mixed-language codebases are almost by definition complex, and you get significant chafing when mixing a declarative language like SQL and OOP.

Since this is a no-update and no live-insert scenario we're talking about, it's fairly easy to produce code that is an order of magnitude faster than a DBMS, since they're not only primarily optimized for efficiently reading off disk (an in-memory hash table beats a B-tree every day of the week), they've got really unfortunate CPU cache characteristics, and additionally need to acquire read locks.

Karunamon · on April 25, 2022

Maybe this is a failure of imagination on my part, but won't most people be using ORMs? Again, talking about the use case of the average application that's light enough to get away with SQLite, it doesn't seem like you would need to be hand writing queries.

dotancohen · on April 25, 2022

In my experience ORMs add a layer of complexity, instead of removing one. It's nice to e.g. have a "Pythonic" interface in Python, but when working close to the data I far prefer to write a concise, clear query instead of trying to remember some ORM syntax or what they're calling VARCHARS in this particular ORM, or how they're representing JOINS, or if the condition will be on the ON clause or the WHERE clause, or how they're representing GROUP BY, etc etc.

JohnBooty · on April 25, 2022

Wrote code for many years sans ORMs.

Two features I enjoy in ActiveRecord and other ORMs, and why I would consider them a good standard practice for most things that aren't "toy" projects.

1. Easy chainability. In ActiveRecord you can have scopes like `User#older_than_65` and `User.lives_in_utah` and easily chain them: `User.older_than_65.lives_in_utah` which is occasionally very useful and way more sane than dynamically building up SQL queries "by hand."

2. Standardization. Maintenance and ongoing development (usually the biggest part of the software lifecycle) tend to get absolutely insane when you have N different coders doing things N different ways. I don't love everything ActiveRecord does, but it's generally quite sane and you can drop a new coder into a standard Rails project and they can understand it quickly. On a large team/codebase that can equate to many thousands or even millions of dollars worth of productivity.

    I far prefer to write a concise, clear query instead 
    of trying to remember some ORM syntax

100% agree.

ActiveRecord strikes a good balance here IMO. An explicit goal of ActiveRecord is to make it painless to use "raw" SQL when desired.

On non-toy projects, I think a policy of "use the ORM by default, and use raw SQL for the other N% of the time when it makes sense" is very very sane.

marginalia_nu · on April 25, 2022

ORMs integrate poorly in many languages, and perform strictly worse than hand-written SQL.

If you're just using the database for object persistence, which is common, it doesn't matter all too much. But that's not really the scenario we're discussing here, since the data is by the original problem statement, immutable.

JohnBooty · on April 27, 2022

With a database you generally are loading it into RAM, thanks to caching at the database and/or filesystem level, and you get all of the fun database features more or less for free.

There's a performance hit relative to skipping the database altogether and simply allocating 4GB of RAM and accessing it directly, of course.

samuel · on April 25, 2022

DuckDB is the OLAP equivalent of SQLite, as far as I know.

dangerface · on April 25, 2022

> Isn't MySQL MyISAM faster

I think the performance MySQL has over sqlite comes from its multithreading more than the storage engine.

In my experience sqlite is just as fast as MyISAM for single threaded work.

tpetry · on April 25, 2022

MyIsam is not crashsafe. Anytime your server crashes the MyIsam database may get corrupted.

Faster, but at what price?

syntaxfree · on April 25, 2022

qwerty456127 · on April 25, 2022

I dunno, just felt like conventional (since long ago) knowledge that MyISAM is the fastest of all SQL DBs in simplistic non-RAM scenarios. I'm not sure this is true so I ask.

astine · on April 25, 2022

The engine might be faster (I'm not sure) but SQLite has the advantage that it doesn't have to connect over a socket. Instead you load the SQLite library into your code and your application directly manipulates the database files. That's potentially a lot faster.

dspillett · on April 27, 2022

Before sqlite is definitely was said to be the fastest, I suspect the two are similar enough that it makes little or no difference these days and sqlite (unless the dynamic typing thing is an issue for you, and even that is going away as recent versions support at least some stricter type enforcement) safer and more “correct” than MyISAM in many ways.

jjoonathan · on April 25, 2022

It blew up big time. I would have saved myself lots of trouble if I had just gone with postgres from the getgo.

The workload was simple (single node work tracking) and I didn't expect it to become a bottleneck. Unfortunately, there were some default settings in the storage backend (tiny page size or WAL or something) that caused severe thrashing and a dearth of tooling to track down the issue. After making a custom build with custom instrumentation and figuring out the problem, I found an email thread where the sqlite community was arguing about this exact issue and the default settings in question. A couple of people had forseen the exact problem I had run into and suggested a fix. Their concerns were dismissed on the grounds that the problem could be configured away, and their concerns about discoverability of configuration were ignored completely. I wasn't thrilled with the crummy defaults, but seeing that the consequences had been forseen, considered, and dismissed despite what seemed like widespread consensus on the fix being simple... it really damaged my trust. How many more landmines did SQLite have?

Lack of perf tooling + bad defaults = recipe for pain.

remus · on April 26, 2022

Not to diminish your issue but choosing good defaults for software with a wide range of use cases is hard. One person's blindingly obvious use case is another's niche.

jjoonathan · on April 26, 2022

Exactly. That's why neglecting tools in favor of "simplicity" is so dangerous. If it Just Works, great! But what if it Just Doesn't?

srcreigh · on April 25, 2022

More details please if you have them. What was the throughput? number of transactions? Data size?

If you have that thread would be great to see it as well.

jjoonathan · on April 25, 2022

It was chugging at ~100 inserts per second on about ~300k rows of <100 bytes each. Just inserts, no index (it was the first thing to go), no foreign keys, one table only. SQLite was spending all the disk bandwidth copying data in and out of a tiny designated working area and flushing. This was in 2013 -- I'm afraid I've forgotten the details. It's an old problem, long fixed by now.

The problem itself didn't concern me nearly as much as "no perf tools + no perf foolproofing." That's a rough combo. If a problem this simple required this much debugging, extrapolations to problems of any complexity are terrifying. I knew that simplicity implied limitations, but this lesson taught me that simplicity could also imply danger.

srcreigh · on April 25, 2022

This is a common problem. Bulk inserts should be done within a single txn if possible. The limit isn't insert throughput, but transaction throughput.

I've commented elsewhere here for docs referencing this problem. It's FAQ#19 on the SQLite website.

Was your inserts based on HTTP requests or was it more of a batch process? we're they grouped in txns? Obviously user http requests would be harder to group up, but kudos if your service is handling 100 QPS writes as that's pretty rare level of scale approaching the real "need a beefy concurrent db server" type of problem statement.

jjoonathan · on April 26, 2022

The inserts were batched. That was the second thing I tried after getting rid of the index, and that brought it from 20/s to 100/s.

This wasn't a website, it was HPC job tracking. SQLite chugged so hard that the tail wagged the dog, though.

srcreigh · on April 26, 2022

I'm not contradicting you, it is curious what you saw, but this SQLite FAQ does claim that 50k/s inserts on a normal computer HDD is possible. Maybe there was some other resource constraint or configuration somewhere. It is very curious.

> Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.

dolmen · on April 25, 2022

Could you tell us more about that configure option?

jjoonathan · on April 25, 2022

This happened in 2013. A few years later the fix went through. The bug itself is water under the bridge, what you're hearing is that I'm still leery of "no perf tools + no perf foolproofing." These judgements may be obsolete -- once burned, twice shy.

samwillis · on April 25, 2022

Simon Willison has written about using SQLite for a "Baked in data" architecture which is a super interesting method for some situations: https://simonwillison.net/2021/Jul/28/baked-data/

As he notes https://www.mozilla.org/ uses this pattern:

> They started using SQLite back in 2018 in a system they call Bedrock ... Their site content lives in a ~22MB SQLite database file, which is built and uploaded to S3 and then downloaded on a regular basis to each of their application servers.

I'm particularly interested in the "Sessions" extension (https://www.sqlite.org/sessionintro.html) and would love to hear if anyone has successfully used it for an eventually consistent architecture built on top of SQLite?

uuyi · on April 25, 2022

I designed something that used a local SQLite database on the client and a remote postgresql instance as the master. It used read and write queues at each end for sync and was eventually consistent.

Unfortunately it was far too advanced for the org and no one else understood it so it was canned in favour of a connected solution under the guise of ubiquitous internet access being available. This is proving to be a poor technical decision so my solution may have some legs yet.

qorrect · on April 25, 2022

Hey I actually do the same in a mobile app. I dump everything into a local database, when they are connected to the internet it syncs, lets it work offline.

vatotemking · on April 25, 2022

Im working on something similar. Mine is a RFID scanner kiosk that uses a local db when offline. How do you manage sync conflicts?

qorrect · on April 26, 2022

In my workflow, I PULL when they initial get online and MERGE non conflicting changes and ACCEPT the changes with the latest timestamp if there is a conflict.

It's not perfect and people still complain, to do it cleanly I would need to prompt the user which change to take but I haven't figured out a clean way to do that yet.

LAC-Tech · on April 28, 2022

What industry was this - if you don't mind me asking? Always interested in situations where internet may not be assured

andix · on April 25, 2022

If your database is just 22MB, probably even MS Access 2000 will perform adequately.

anonyfox · on April 25, 2022

It is exceptionally great if you don't need parallel writes or have many terabytes of data - ie: for most services out there.

When embedding natively, like in a Rust app, the performance is better than any other RDBMs because no network/serialization overhead and being able to use pointers in-process if needed.

The DevOps story also is a dream: typically it is just a single file (optionally + some more for journaling) and setup is automated away (most language libs bundle it already), plus it is widely known since smartphone SDKs and all webbrowsers include/expose it.

A subtile advantage: the supported SQL subset is so small, that "if it works in sqlite, it will also work with $RDBMS" in most cases, but not the other way around. I always use it when getting started when in need of relational data, and only had to swap it out for postgres once, but not due to technical/scaling reasons (IT policy change & stuff).

Having said that, it is mind-boggling what kind of load you can handle with a small VPS that runs a Rust microservice that embeds it's own SQLite natively... that would be an expensive cluster of your typical rails/django servers and still have worse performance.

cillian64 · on April 25, 2022

A slightly unusual use-case but for my work we have our own file format which is a thinly-veiled sqlite database. Originally we used a json file but we moved to sqlite for performance reasons once the files started getting to multi-gigabyte sizes.

It works great - there are ergonomic APIs in most languages, it’s fast and reliable, and great to be able to drop into an SQL shell occasionally to work out what’s going on. A custom binary format might be slightly more optimal in some ways but using sqlite saves so much work and means a solid base we can trust.

srcreigh · on April 25, 2022

Considering adobe Photoshop use(d) SQLite for application file format, this could be very far from unusual.

mkovach · on April 25, 2022

When I maintained uptime.openacs.org (https://gitlab.com/siddfinch/uptime) and MyTurl (both running AOLserver) I wrote internal versions for a place I was working at.

I switched from Postgres to SQLite for a couple of versions, put mainly because Postgres wasn't "supported" I called SQLite an "internal database thing".

Worked flawlessly for about 7-8 years before both services were gobbled up into micro API services.

At the last count, we have about 14,000 services checked by uptime (about 1,000 every 5 minutes, 2,000 every 10 minutes, the rest every 15). Probably had about 60,000 tinyurls in MyTurl. We also ran the MyTurl urls through uptime every night to look for bad links. The system go hammered, often.

It took minor tweaking to get the the best performance out of the database and AOLserver has some nice caching features, which helped to take the load off the database a bit. But overall, it worked as well as the Postgres counterpart.

And now, I have to figure out why I never released the SQLite version of both.

simonw · on April 25, 2022

I'm running a bunch of different read-only sites and APIs on top of SQLite using Cloud Run and Vercel - abusing the fact that if the database is read-only you can package up a binary DB file as part of a Docker container or application bundle and run it on serverless hosting.

This means it won't cost any money if it's not receiving any traffic, and it can scale easily by launching additional instances.

I wrote about my patter for doing this, which I call Baked Data, here: https://simonwillison.net/2021/Jul/28/baked-data/

A few examples are listed here: https://datasette.io/examples

pieterhg · on April 25, 2022

Yes for all my sites: Nomad List, Remote OK, Hoodmaps, Rebase etc. No real issues at all.

WA · on April 25, 2022

I read in one of your Tweets that you use one database file per (unrelated) table to avoid corruption. Why did you move to this model? Are multiple tables per file really more easy to corrupt?

pieterhg · on April 26, 2022

Yes! It kinda happened because I had no idea how SQLite worked so I thought this was normal.

Then I thought this is great and makes it easier to move the db file if it just has 1 table. So I can download it easily for local dev for ex.

And yes in case there's corruption which never rly happens, only one file thus table would be affected.

PS: one thing that really helped reduce issues was setting PRAGMA MODE to WAL

barryhennessy · on April 25, 2022

I'd be interested to know what kind of corruption you were facing.

Glench · on April 25, 2022

Here's a Twitter thread with some numbers from pieterhg's use of SQLite in production: https://twitter.com/levelsio/status/1308406118314635266

el_dev_hell · on April 25, 2022

Awesome to hear!

How do you handle this? Do you store the SQLite file somewhere like s3 or just in memory?

How does this work for such high traffic sites?

pieterhg · on April 26, 2022

No in the filesystem on a VPS. All my sites just run on a VPS. Nothing fancy!

levelSEOfan · on April 25, 2022

glad to see pieter here, I am keeping an eye on rebase ;)

matthewaveryusa · on April 25, 2022

I’ve worked on several projects with sqlite, both read and write heavy, all with high concurrency, with databases in the few hundred MB with 400k server clients, and 100 bare-metal servers running at capacity. The sqlite part of our system is never the problem. In our case sqlite has been an alternative to custom files on disk or replacing a spaghetti of hashmaps in memory. we also replaced a single postgresql instance with all customers into many sqlites per customer. Performance and reliability is why I always reach for it first. At this point I’m a zealot and would argue your first ‘data structure’ of choice should be an sqlite database :)

ricg · on April 29, 2022

Did you ever run into the problem of the system running out of file descriptors? If each client gets its own database file, each connection will require a file descriptor.

For example, Linux limits the number of file descriptors to 1024 per process to each process can open up to 1024 files at any point in time.

I ran into this problem in the past in another project (the system ran out of file descriptors for another reason).

Mertax · on April 25, 2022

What pain points have you experienced with "many sqlites per customer". I'm considering transitioning to something similar but would love to know what pitfalls I might not be considering.

matthewaveryusa · on April 25, 2022

None really, but we have a fairly simple design where all the shared databases we attach are read-only (think big static lookup tables that a separate process takes care of updating.) I would probably avoid having databases attach contextually -- seems complicated and error-prone.

giantrobot · on April 25, 2022

I'm also in the SQLite is a data structure camp. You get fast access and persistence for free. You can also fix a lot of issues just exploring the database offline.

barrkel · on April 25, 2022

Don't be afraid of a database process. They are not scary, and are certainly less scary to scale up than whatever you might need to do with SQLite. There's more help available and better tooling.

SQLite may shine in edge cases where you know you can outperform a regular database server and you know why, and you could build everything either way. SQLite could be a way to e.g. decentralize state, using local instances to do local storage and compute before shipping off or coordinating elsewhere.

Otherwise, SQLite can simply be a recipe for lots of lock errors on concurrent operations. I've also never been very impressed with its performance as a general purpose replacement for postgres or MySQL.

jjoonathan · on April 25, 2022

> tooling

Yes, I learned this the hard way. I understood that simplicity meant limitations, but I did not understand that simplicity meant danger until SQLite burned me.

If your perf tanks, you don't want to have to spend days putting timers all around someone else's codebase. Caveat: SQLite may be better these days -- my incident happened in 2013 -- but I spent more time tracking that one SQLite issue than I have spent spinning up postgres instances since then.

srcreigh · on April 25, 2022

do you have any details about the situation you were facing?

theshrike79 · on April 25, 2022

My solution path for databases has been like this for a good decade:

  1) Sqlite
  2) Self-hosted Postgres
  3) Big Boy Database, with an $$$ cost. (AWS Aurora, Oracle, etc).

Most projects never leave the Sqlite level. Only one has left the Postgres level so far.

FR10 · on April 25, 2022

Im using SQLite for several personal projects as well, if you were to migrate to Postgres how would you go about it? Any tools/service you recommend?

z3ugma · on April 25, 2022

I think a lot of us fall into the trap of expecting that our apps will grow to a huge size, and that we need to be ready to scale just in case.

This is where the "MongoDB is webscale" meme came from.

The truth is SQLite and a single webserver or Docker container will be fine for 95% of web applications.

People really underestimate the advantage of simplicity vs perceived power.

Use SQLite.

munro · on April 25, 2022

One thing that really excites me is concurrent writes -- I was poking around the project, and I've seen drh has been working on this for a bit now. [1] [2]

I believe the high level approach he's taking is essentially: 1. Concurrently execute the multiple write transactions in parallel. 2. Sequentially write the changed pages to the WAL. *[3] If a previous transaction causes the next to compute differently (conflict), then rerun that next transaction & then write.

The way to detect if were conflicts is essentially:

1. Keep track of all the b-tree pages accessed before running the transaction 2. Check the WAL if any previously transaction modified one of those b-trees. If so, this means we have to rerun our transaction.

I've seen it done in software transactional memory (STM) systems as well. It's really beautifully simple, but I think there are a lot of devils in the details.

[1] https://github.com/sqlite/sqlite/blob/9077e4652fd0691f45463e...

[2] https://github.com/sqlite/sqlite/compare/begin-concurrent

[3] * Write to the WAL, so that parallel transactions see a static snapshot of the world.

stormbrew · on April 25, 2022

I wish more "self-hosted" open source projects would support sqlite out of the box. It's honestly a little ridiculous to, for example, stand up postgres for a 1 person blog on a personal domain. Or even a 10 person mastadon or pleroma instance or whatever.

That said, sqlite used 'badly' can be quite frustrating. Home Assistant, for example, is usually set up on an sd card in a raspi and then runs an sqlite database on it that it dumps massive amounts of very redundant data into as json blobs. Pretty common to have it just randomly lock up because the sd card has trouble with that frequency of writes.

config_yml · on April 25, 2022

I use it together with Rails and the horizontal sharding feature. Each customer has it's own sqlite database running in WAL mode. Since the app is internally used, traffic/writes are pretty predictable.

I also do backups periodically with ActiveJob using `.backup` on the sqlite3 client. It's simple and nice because I just have to worry about running the app, and nothing else.

ricg · on April 29, 2022

A potential downside here could be running out of file descriptors if you use one database file per client. Linux has a limit of 1024 open files per process.

Did you ever run into this problem?

pqdbr · on April 25, 2022

Would love to know more about how you set this up.

BilalBudhani · on April 25, 2022

are you doing multi tenancy in your application by creating different SQLite database for each customer? I'm curious to know more about your approach

bob1029 · on April 25, 2022

One other reason sqlite is great is the reduced latency. When you can satisfy queries in microseconds vs milliseconds, there is a fundamental shift in certain things you might try or not try.

We've been using this stuff in production for over half a decade now. Multi-user, heavily-concurrent systems too. The biggest cost savings so far has been the lack of having to screw with a separate database server per customer install (we do B2B software).

de_huit · on April 25, 2022

An interesting answer to your question from Tailscale a few weeks ago [1]: sqlite as the main database, many readers read a copy, the copies synced using litestream. [1] https://news.ycombinator.com/item?id=30883015&p=2

tlamponi · on April 25, 2022

Yes, our clustered, real-time configuration file system uses sqlite as sole backing store.

https://pve.proxmox.com/pve-docs/chapter-pmxcfs.html

https://git.proxmox.com/?p=pve-cluster.git;a=tree;f=data/src...

When it was written by our CTO over 10 years ago he tried every DB solution available, that is those that somewhat fit the picture, only sqlite survived any test thrown at them, if setup as documented it handles a pulling the power plug in any situation, at least in our experience.

It may need to be noted that the DBs are only used locally, we synchronize commits ourselves via a distributed FSM, that's mostly transforming the Extended Virtual Synchrony corosync provides to simple Virtual Synchrony.

rograndom · on April 25, 2022

I've used in "production" and as the "primary datastore", but not in the ways those terms are normally used.

1. PHP web development for the client of a client. They needed persistent data and MySQL was not available. Moving to a different webhost was straight up rejected. Used sqlite with Idiorm and it worked just fine.

2. As the local datastore for a cross platform mobile application. The sqlite DB was unique on each device. Libraries were available and worked well.

3. This is a large one. Several 10's of thousands of installs that query the filesystem, but filesystem access is throttled by the vendor. We're using sqlite to store the state of the filesystem as it doesn't really change that much. If the db is damaged or whatever, it can be wiped as it isn't the final source of truth.

bsenftner · on April 25, 2022

I no longer work there, but an enterprise facial recognition system used by NGOs, and 3-lettered government agencies has SQLite as the sole datastore. I wrote a portion of the SQLite runtime logic, a simply key/value store used all over the software.

SQLite proved to be phenomenal. We spec'ed hardware with enough RAM to hold the FR DB in memory, and damn SQLite is fast enough to keep up with the optimized FR system performing 24M face compares per second. With a 700M face training set, SQLite also proved instrumental in reducing the training time significantly. These daze, if given the opportunity to choose a DB I always choose SQLite. I use SQLite for my personal projects, and I go out of my way to not use MySQL because SQLite is so much faster.

barryhennessy · on April 25, 2022

Impressive numbers, thanks for sharing.

Out of interest, were you running on bare metal/cloud? And what kind of CPU was behind those 24M face compares per second?

bsenftner · on April 25, 2022

Running on bare metal, and those numbers come from a 3.4 GHz i9. The system is a fully integrated single executable, with embedded SQLite. Since I left the firm a year ago, new optimizations have the facial compares down to 40 nanoseconds per face.

jksmith · on April 25, 2022

I'm using for a suite of commercial desktop products and it's working out really well. You'll need to figure your multiple reader/single write connection pools, and graceful shutdowns in your custom server to avoid a data file corruption. This stuff you wouldn't normally do with a db server, but the discovery has made for some great learning and provided food for all kinds of load balancing and distributed db designs. Also started using fossil, which I really love for my small team.

Sqlite is one of the greatest open source projects in history, with awesome docs, and really is a tribute to the art of programming. I'm happy and honored to use it for the appropriate use cases (which are a lot more than one would think).

jksmith · on April 25, 2022

BTW, the only corruption I've run into with sqlite is plug-pullers with transaction frames still open, killing a debug session, the usual shenanigans that are improper with basic NTFS.

vsnf · on April 25, 2022

At my current company we deploy sqlite as the primary and only database for our server. Our use case is a little less impressive than your usual webscale startups though.

Our product is a self-hosted IoT & hub unit solution, so we have no requirements to work with thousands of users doing who knows what. For our use case, sqlite is perfect. We don’t need to work with millions of rows, don’t need to stress the relatively low-power server units with another long lived network process, have no requirements of authentication since the user owns all the data, and can easily get insights into the database both during development and during troubleshooting at customer locations.

I’d sooner leave the project than move to anything else.

thesketh · on April 25, 2022

We've used it in cloud migrations of light SQL Server workflows which were previously run on shared servers.

We replaced SSMS + SQL Server with Python + SQLite run in AWS Lambda. The jobs fetch the database from S3, update with the latest deltas and write out the database and some CSV files to S3. The CSV files drive some Tableau dashboards through Athena.

The SQL usually needs a bit of a rework to make this work, but for the volumes of data we were looking at (we're talking less than a million rows, jobs run once per day) we've seen good performance at low cost. We used DuckDB for a couple of workloads which needed more complicated queries, it's stupid quick.

phaedrus · on April 25, 2022

I use Sqlite for all my projects, but most of my projects are Windows applications. At prior employers I did work on web applications that used traditional server-based databases.

In my opinion the biggest thing separating Sqlite from a "full blown" database is actually Sqlite's lack of stored procedures. At all of the places where I worked with traditional databases, we used stored procedures to create an ersatz data access abstraction so that the database design could vary independently of the API presented to the application. With Sqlite I find myself (ab)using views as a poor man's stored procedure, but of course that only covers the read-only or "functional" (in the functional programming sense) portion of stored procedure code.

Everything other commenters have said about data size or centralization also applies, but for me (again, just personal opinion) I'd actually draw the line at the point where you can or cannot get by without stored procedures. From an operational standpoint that would be: at what point is it imperative to be able to vary the details of the database design while maintaining an abstraction layer (stored procedures) that allows application code to be blissfully unaware anything changed underneath it?

Examples of when that would be needed would be if new users + applications start having competing needs, or if you need to revamp your table structure to improve performance or get around a limitation. If you're in a startup or small company, it would be the point at when you find yourselves hiring a real Database Administrator (DBA) rather than giving DBA duties to developers. Prior to that organizational scale you may be better off with the simplicity of Sqlite; after reaching that level of organizational complexity you might need a "real" (server-based) database.

memset · on April 25, 2022

Question for people using SQLite in prod: how do you cope if your app is running on a platform like Heroku or Cloud Run, rather than a proper server or VM? Have you found a solution for the fact that those environments, and disk, is ephemeral?

Glench · on April 25, 2022

Yeah this is an annoying problem. I just run an actual server rather than an "app" platform. I haven't found it that complicated and I pay less monthly which is nice.

gwbas1c · on April 25, 2022

Postgress

About 2 years ago I wrote my own blog engine so I could get up-to-speed with NodeJS: https://andrewrondeau.herokuapp.com/

I would have loved to do SQLite with some kind of "magic" backup, as SQLite is more than enough to handle a personal blog. (It certainly would make development easier!) However, at the time Heroku only offered Postgress.

Cthulhu_ · on April 25, 2022

I'd always opt to use their hosted database in those constraints. IMO SQLite only works well if you own the storage, so bare metal (or VMs) only.

christophilus · on April 25, 2022

You could use Render or a similar service that offers persistent storage. I imagine Heroku has a similar feature.

[0] https://render-web.onrender.com/docs/disks

hobo_mark · on April 25, 2022

Fly as well: https://fly.io/docs/reference/volumes/

sirodoht · on April 25, 2022

I use it for https://chaitinschool.org/ but it's a fairly small web app and minimal traffic. It's nice to move the data around easily (which is mostly workshop/event data) but if we have more people sign up I might switch to Postres.

ayx · on April 25, 2022

Yes. For http://ht3.org which is a search engine I wrote for tech related articles. It works really well. It uses the fts5 extension, that allows full text searching. There are over a million indexed pages and it’s no trouble.

lep · on April 25, 2022

I'm also using fts5 for some small projects but i haven't looked too deeply into it so i'm wondering if you have any interesting insights. Like what kind of index/options do you use? Maybe the trigram index? And your "across boundaries" mode is just word* in fts syntax?

ayx · on April 26, 2022

Yes. Acute observation. It’s actually two separate indexes. A trigram index for sub word searching. Word* is exactly right and the other is a Unicode61 index with porter stemming (also remove diacritics). Trigram tends to work well with abbreviations and such. Whereas porter works well for general searches.

lep · on April 26, 2022

Interesting. I might play around using two indexes and try to combine both search results. This could be nice for my less technical users.

barryhennessy · on April 25, 2022

> The irritant-free web

This is a very respectable goal. I wish you great success!

Good full text search without pulling in another dependency would be quite a win. I'll add fts5 to my reading list. :)

Out of interest, what kind of compute and storage resources do you have underneath that?

ayx · on April 25, 2022

Thank you for the appreciation :)

It’s a linode. Shared cpu Plan.

1 CPU Core 50 GB Storage 2 GB RAM

It’s just $10 per month.

With linode even shared cpus are powerful and I’m yet to hit any overload. I’m sure it will at some point, I might upgrade then.

fum52882 · on April 25, 2022

Slightly off-topic, because we use PostgreSQL on the backend, but because StoryArk is offline-first, we heavily rely on SQLite in our mobile app. The backend database is mostly just there to backup your local data and to sync it across all of your devices. So there aren't that many queries being run on the backend database.

I can't really count how many times I've been pleasantly surprised by how extensive the feature set of SQLite is. I mean, it even has window functions (https://www.sqlite.org/windowfunctions.html). And being able to quickly open up the app's SQLite file in a database browser is also quite helpful during development.

barnabee · on April 25, 2022

Yes, multiple times. It went/is going great!

Pros:

- A single API server, no separate database to worry about, configure, and update.

- Backups are as simple as backing up one file every so often. SQLite even has an API to do this from a live connection.

- Handles way more concurrent users than we’ve ever needed.

- Dev and test environments are trivial and fast.

- Plenty of tools for inspecting and analysing the data.

Cons:

- There are certainly use cases it won’t scale to, or at least not without a bunch of work, but in my experience those are less than 1% of projects. YMMV.

- The type system (even with the newish stricter option) has nothing on Postgres. I realise this is basically a non-goal but I’d seriously love to somehow combine the two and get PG’s typing in a fast, single file embedded DB library.

- Postgres JSON support is also better/nicer IMO.

pphysch · on April 25, 2022

I think SQLite vs. PostgreSQL is similar to Flask+SQLAlchemy vs. Django, or similar debates.

Yeah, you probably can do everything with the "simpler" stack. It might even be nominally faster in many cases. But if there's any chance you're going end up rolling your own type-validation or ORM or admin interface or GIS... Just use the battle-tested kitchen sink from the get go.

sicp-enjoyer · on April 25, 2022

I have used SQLite for Django applications with a few thousand users. It has had no problems. However, I just use the ORM and never configure the SQL directly. The vast majority of LAMP stack style web applications would be an ideal use case.

However, I would consider how important RDMS features are to you which are not available in SQLite:

- less sophisticated type and constraint system.

- a severely limited ALTER TABLE.

- No stored procedures.

- limited selection of math and statistical functions.

- no permission and user model, not to mention row-level security.

To be clear, I don't think it's bad the SQLIte doesn't try to be an RDMS, but I would consider this perspective when making a decision, not performance which is great, and difficult to max out.

simonw · on April 25, 2022

SQLite has a pretty powerful permission model via the set_authorizer hook - you can register a callback function which will be called each time SQLite attempts to read data. It's not widely used though from what I've seen.

It's also really easy to add new custom SQL functions to a SQLite connection, which means the missing math functions shouldn't be a limitation. Here's an extension for example: https://github.com/nalgeon/sqlite-stats

claytongulick · on April 25, 2022

I would add "limited json support" to this list.

Under the hood, SQLite treats json as strings, you have to do some strange stuff with extract and computed fields indexing to index into it, which can be a bit fragile.

My use case these days is hybrid rdbms / nosql, where most of my tables have defined columns for frequently queried data, and a jsonb "data" field for everything else.

Postgres has impressive jsonb capabilities, and with 14 the index operators making querying it a dream.

I love SQLite, but postgres' jsonb handing makes the additional operational overhead worth it to me.

achillean · on April 25, 2022

https://internetdb.shodan.io is powered by a SQLite database and gets millions of requests a month. It does require a different workflow and custom synchronization scripts but otherwise it's performed well.

barryhennessy · on April 25, 2022

What kind of workflow and synchronization scripts?

These sound like the kind of hidden costs that could turn sqlite's simplicity quite complicated if you don't see them coming.

notRobot · on April 25, 2022

+1, would love to know more about these!

Cthulhu_ · on April 25, 2022

I maintain an 'older' codebase (2012) and am rebuilding it to a new version, but both use SQLite. It's a configuration management web application installed onto either bare metal or virtual machines. Generally only a handful of simultaneous users; I want to say performance isn't much of an issue or concern, but I've had to fix a bug that was caused by too many writes to the database where the system ran into IOPS limits (traditional hard drives or constrained VMs at 100 IOPS).

There is a hacky solution for redundancy; at certain events, a copy of the .db file is made and rsynced to a secondary node. This will probably fall apart if the file ever goes above a few MB in size.

Pros / reasons to use it: Self-contained, just a single file to transfer, no drivers needed, no servers running other than my own application.

Cons: No good support for ALTER TABLE queries, so things like changing the name, datatype, or default value of a column isn't happening. The workaround is to create a new table and transfer rows over, then drop the old table and rename the new table. Also the aforementioned issue if you want redundancy.

So basically, if redundancy isn't a requirement for you, sqlite is fine. It's probably ideal for single user applications, like your browser or apps (iirc sqlite is used a lot for those purposes).

simonw · on April 25, 2022

I wrote a command line tool that automates that ALTER table flow (create new table, copy data across, rename and drop old table) - you might find it useful: https://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-...

jll29 · on April 25, 2022

My preferred production DB is PostgreSQL. However, for small experiments, SQLite is more versatile due to fewer dependencies, single binary, zero install overhead etc., so I use it often, in particular for research experiments and systems prototyping. The only thing that ever bothered me was the lack of type enforcement, which has since been improved.

Production uses: 0 (1 if my Ph.D. thesis code is included, which had some C++ code that linked against version 2 of the SQLite library).

barryhennessy · on April 25, 2022

I'm coming from a similar direction. Postgres is my go-to, and I love it's reliability when you get your schema right.

Glad to hear its type enforcement situation is improving.

asymmetric · on April 25, 2022

For those who've missed the announcement, here are some past links on the topic:

    - https://www.sqlite.org/stricttables.html
    - https://news.ycombinator.com/item?id=28259104
    - https://news.ycombinator.com/item?id=29363054

alberth · on April 25, 2022

a. I'm surprised no one has mentioned WAL2 + BEGIN TRANSACTION, both of which are in separate branches with the plan to be merged into main.

Even though SQLite can handle 99% of peoples use cases, WAL2 + BEGIN TRANSACTION will greatly close that last 1% gap.

b. Expensify has created a client/server database based on SQLite called https://bedrockdb.com and years ago it was scaling to 4M+ qps https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

Jason_Gibson · on April 25, 2022

Do you mean 'BEGIN CONCURRENT'?

https://sqlite.org/cgi/src/doc/begin-concurrent-pnu-wal2/doc...

Where did you see that the plan is to bring those into the mainline distribution?

alberth · on April 25, 2022

Whoops, yes - meant BEGIN CONCURRENT (I can't update my original post).

ngrilly · on April 25, 2022

I wasn’t aware or a plan to merge wal2 and BEGIN CONCURRENT into main. That would be awesome. What is your source about this?

vanilla-almond · on April 25, 2022

Many of the replies here attest to the simplicity and fast performance of SQLite particularly for serving pages or data. But how well does SQLite fare in concurrent write/insert situations?

Although SQLite is not designed for this type of scenario, this discussion higlights there's a strong demand for a concurrent client/server RDMS that is simple, performant and easy to deploy. PostgreSQL is powerful and feature-rich, but not simple or easy to delploy. Hence the appeal of SQLite.

For example, could SQLite power a discussion forum of moderate (or more) activity i.e. users posting comments? The Nim language forum is powered by SQLite, but activity in the forum is fairly low. [1]

Between the simplicity of SQLite and the complex, heavyweight that is PostgreSQL, there is a wide gap between these database opposites. It's a shame there is no concurrent RDMS to fill that gap.

(Note: Another poster mentions the concurrent Firebird RDMS as a possible alternative, but I haven't used it. [2])

[1] https://forum.nim-lang.org/

[2] https://firebirdsql.org/en/features/

simonw · on April 25, 2022

Most SQLite writes take less than a ms, so the concurrent writes scenario isn't actually a big problem - writes can be applied via a queue.

Does your forum accept more than 1,000 writes per second? If not, SQLite should be fine.

Sohcahtoa82 · on April 25, 2022

> But how well does SQLite fare in concurrent write/insert situations?

It's pretty well-known that concurrent writes are SQLites weak point, and that if your application requires high numbers of writes, that it's not the proper solution.

The SQLite devs even acknowledge this:

> SQLite will normally work fine as the database backend to a website. But if the website is write-intensive or is so busy that it requires multiple servers, then consider using an enterprise-class client/server database engine instead of SQLite.

> [...] client/server database systems, because they have a long-running server process at hand to coordinate access, can usually handle far more write concurrency than SQLite ever will.

(https://www.sqlite.org/whentouse.html)

ttfkam · on April 26, 2022

Not simple or easy to deploy? On a server, it's "apt-get install postgresql" or "yum install postgresql". In the cloud, you choose RDS on AWS, Google Cloud SQL, Heroku Postgres, DigitalOcean managed DB, etc. Need to step it up? AWS Aurora or Aurora Serverless.

Just because Postgres has all the features and knobs doesn't mean you have to use them or turn them. You can always use it like SQLite but just one or two things more that you need.

settrans · on April 25, 2022

Things worked well at the outset, especially in local development against my NVMe drive for my small CRUD application.

Then, with a little traffic, things continued to go well in production. But as traffic scaled up (to 1-5 QPS, roughly 25% writes), they fell apart. Hard. Because my production environment was spinning rust, IO contention was a real issue and totally absent from development. This manifested as frequent database timeouts, both from reads and writes.

Echoing another commenter's sentiment: things would have gone much more smoothly from the beginning had I started with PostgreSQL, but after having written many thousands of lines of direct SQL taking intimate advantage of SQLite's surprisingly rich featureset, migrating was less than totally appealing.

The mitigation strategy, which ultimately worked out, was to implement backpressure for writes to SQLite: queuing and serializing all writes to each database in the application, failing loudly and conspicuously in the case of errors (thus forcing the client to retry), and gracefully handling the rare deadlock by crashing the process completely with a watchdog timer.

justsomeuser · on April 25, 2022

So you were using an HDD, not an SSD?

Would an SSD in production have solved the timeouts by increasing your write throughput?