More

quinthar · 2025-01-10T01:52:49 1736473969

I had some free time this holiday season and wrote up a blog post about how we use NUMA balancing on our 384-CPU, 6TB monster servers to keep the site humming reliably (and some of the challenges we've had to overcome). It's pretty low level, but I found there wasn't a lot of good info on how to optimize for high core-count servers, so I thought I'd share some experience. Enjoy!

quinthar · on Aug 15, 2018

we have been using this page locking technology for quite a while now and it works incredibly well. Having both read and write concurrency is super powerful and scales fantastically.

quinthar · on Aug 15, 2018

It's scales so perfectly we are deploying it across 3 datacenters, each with four 384 core machines with 3 terabytes of RAM. I can't speak highly enough about sqlite and the team behind it.

da_chicken · on Aug 15, 2018

I don't understand why you'd use SQLite for a deployment this heavy over PostgreSQL or MySQL. Yes, SQLite is fantastic, but why would you choose to do it this way?

andrewguenther · on Aug 15, 2018

Because SQLite is often the right tool for the job, more so than Postgres and MySQL. It scales insanely well and requires minimal configuration and near zero administration. I try to use SQLite for as long as is feasible in my projects purely because it "just works".

Don't let the "Lite" fool you. Depending on your needs, you can scale to 10s of thousand of users using just SQLite. Or not! It is all about knowing your system and properly evaluating your options. I choose to use SQLite because it often fits the use case of small to medium projects the best.

chubot · on Aug 15, 2018

Honest question: where do Postgres and MySQL fall down for such deployments? What is the administration required? Backups? Users and permissions? Indices?

FWIW I use sqlite for a personal wiki, i.e. a website with exactly one user :)

codefined · on Aug 16, 2018

I think the idea is that if your service does more, it necessarily needs more work to maintain. From a simple point of view, there are just more moving parts to go wrong.

eismcc · on Aug 16, 2018

The lite doesn’t refer to light, it refers to a “mineral”.

https://changelog.com/podcast/201

oblio · on Aug 16, 2018

I was confused by this, so I read the transcript. In case anyone else reads this, the name is supposed to be read as:

SQL (S Q L or sequel, your call) - ite

So it does sound like the name of a mineral: bauxite, boehmite, hematite, etc. Heck, kryptonite :D

TeMPOraL · on Aug 16, 2018

> Jerod Santo:

> Hm, like a mineral. Were you playing on the word "light", or were you just playing on mineral...?

> Richard Hipp:

> I was, I was.

It's pronounced as a mineral. It's still a pun on "light". And an obvious one, given what SQLite is compared to other RDBMSes.

api · on Aug 16, 2018

If someone put a good fast raft replication engine in process in front of sqlite you'd have a pretty killer db for a lot of things.

captn3m0 · on Aug 16, 2018

It exists: https://github.com/CanonicalLtd/dqlite

oconnor663 · on Aug 16, 2018

Today is your lucky day: https://github.com/rqlite/rqlite

gcbw2 · on Aug 15, 2018

I think it is a tradeoff. If you want more data resilience on disk, SQLite will require all sorts of hacks to (and more difficult, when) to dump to disk.

On the other hand, if you need every ounce of performance and do not care so much about data resiliency on disk, then the other sql applications will be too much hassle to configure in a way you do not miss any of the many safeguards to save data they ship with by default, which will kill your performance when you least expect.

emn13 · on Aug 15, 2018

It's faster? Simpler?

Conversely, why would you choose one of those alternatives?

(I don't actually think there aren't any reasons to do so, mind you, but I can well imagine that if you really KISS, well - even at impressive sizes sqlite makes sense).

vesak · on Aug 16, 2018

SQLite's typing is rather weak, is it not? That might be a tad scary for someone used who relies on that part of PostgreSQL.

kjax · on Aug 16, 2018

It sure is. After recently porting a large application from SQLite to Postgres, much of my time was spent casting types and ensuring code was updated to input the correct types into Postgres. All in all, the changes were for the better, as many bugs were discovered and resolved in the process.

da_chicken · on Aug 16, 2018

Well, the historic one is, "because you need multiple concurrent data writers."

But, that's what this branch is supposed to be for. So now the reason is: "because you need multiple concurrent data writers from multiple processes." And it might also be "you need multiple concurrent data writers and care about your data enough to not risk loss in the event of a system crash or power failure" since it uses PRAGMA synchronous=OFF.

le-mark · on Aug 15, 2018

You are using this particular branch in production, deployed as you mention, or sqlite in general? Some more info about this use case would be very interesting if you can share!

gregmac · on Aug 15, 2018

What sort of volume is this handling? Is the dataset in memory, or if not, what size and type of I/O is backing this? It seems like a ton of CPU, whereas in my experience typically I/O is the primary bottleneck for database loads.

basilgohar · on Aug 15, 2018

Note the second comment in the following thread:

https://news.ycombinator.com/item?id=12739771

quinthar posted in quite a lot of detail there as well, so it may provide some context.

charleslmunger · on Aug 15, 2018

The link says that it requires turning synchroous off, which means that you won't be waiting for real I/O on transactions, since no fsync calls are emitted. Add that to a huge amount of RAM for cache, and it's very reasonable to be CPU bound... So long as you don't mind data loss or corruption on power loss or kernel panic...

latch · on Aug 16, 2018

Doesn't a BBU Raid controllers largely address the power loss concern? And, I always thought BBU Raid controllers were just absolute common sense for any serious database (until everyone went cloud and suddenly basics like dual network card, dual PSU and raid controllers didn't fit Amazon's desire to sell complexity).

nh2 · on Aug 16, 2018

Well, data will be lost if anything below the OS buffer cache crashes (source http://sqlite.1065341.n5.nabble.com/How-dangerous-is-PRAGMA-...).

The BBU protects from power loss of the HDs, but not power loss or general failure of the mainboard or any other important component.

PAlso BBUs can run out of battery so a flash backed BBU is generally recommended.

singron · on Aug 16, 2018

Without fsync, the dirty page can sit in RAM and not even be sent to the disk, so a battery backup for the disk wouldn't solve data loss.

latch · on Aug 16, 2018

Right..but you leave fsync on in this case, no? This might be incompatible with the locking required by this server-process edition though (I guess, no clue). But more generally speaking, fsync=on with a proper raid controller gives massive performance boost (like orders of magnitude) while being relatively resilient to power loss.

nutjob2 · on Aug 15, 2018

"in my experience typically I/O is the primary bottleneck for database loads"

Not when you have potentially neatly 3TB of memory cache.

Of course it all depends on the dataset.

viburnum · on Aug 16, 2018

So is there one SQLite process running on each of the 384 cores?

est · on Aug 16, 2018

omg is this Expensify?

https://news.ycombinator.com/item?id=16118776

quinthar · on Aug 15, 2018

Expensify is powered entirely using sqlite (inside bedrockdb.com) and it is freaking amazing.

sciurus · on Aug 15, 2018

This was discussed a couple years ago at https://news.ycombinator.com/item?id=12739771

ris · on Aug 15, 2018

Good thing you don't need 64bit integers.

Seriously, as impressive as SQLite is for what it is, people shouldn't be given the impression that it doesn't come with some massive and often surprising caveats when viewed as a general purpose database.

ris · on Aug 16, 2018

Looks like I misremembered this (re 64bit integers) and it was one of the other slightly odd limitations SQLite has around rowids and foreign keys that caused me to abandon the last system I was designing with it...

As for what most concerned me most when using SQLite was the ease of which it would happily allow me to make broken queries and not raise a fuss, e.g. referencing non-GROUP BYed terms in an aggregation.

ak39 · on Aug 16, 2018

Hi ris,

Can you provide example(s) of the broken query with non-GROUP BY columns in an aggregation? (Or a reference link if this is well known - I've searched briefly but can't find anything.)

lmz · on Aug 16, 2018

Not the parent poster but it's queries like these:

Select user.id, user.name, sum(tx.amount) from user inner join txn on (user.id = txn.uid) group by user.id

User.name is not in the group by list so it is picked arbitrarily from the rows in the group. It's harmless here but not in all cases.

Groxx · on Aug 19, 2018

TBH I like the ability to do this. Sometimes you truly don't care which value you get, you just want one. Sometimes you have application-level constraints which mean the one it picks is guaranteed to be correct, but you can avoid the cost of enforcing that (at insertion time and at lookup time). It's a handy tool when it's appropriate.

But hell yes, tools should warn you about this loudly, as it's not generally correct.

yawaramin · on Aug 16, 2018

Did you file a bug report?

HumanDrivenDev · on Aug 15, 2018

SQLite has 64 bit integers. In fact row ids are 64 bit by default since version 3.

dagenix · on Aug 15, 2018

It has some quirks, but, I'd be interested to know what you thing are its "massive and surprising" issues.

Also, it has 64-bit integers - https://www.sqlite.org/datatype3.html

piyh · on Aug 15, 2018

Think you have a few misplaced double negatives.

quinthar · on Aug 4, 2018

At Expensify we have been astonished by how easy it was to set up -- as simple as ssh -- and its incredible performance relative to OpenVPN. It's night and day due to it's multi threaded design, meaning it doesn't have the same single cpu bottleneck as OpenVPN. It's clearly the future.

quinthar · on July 28, 2018

I think a big challenge (and perhaps the root problem) is weighing the relative importance of different relative changes. Car deaths are down 97%, but suicides are up 30%. On balance, are things getting better or worse? (I'd say that car deaths are by far more common so the reduction there is more significant, but suicides are probably more directly correlated to happiness that car accidents...)

I'm more persuaded by the optimistic perspective, personally, but these are great contrasting pieces to highlight that it's not a slam dunk.

dredmorbius · on July 28, 2018

The biggest reduction in car deaths per passenger mile occurred from 1915-1925, about a halving in a decade.

Since then, from 1925 - 2015, the rate has consistently halved every 20 years.

quinthar · on July 28, 2018

This is a much, much better rebuttal than the other one. Thanks!

quinthar · on July 28, 2018

Hmm how is this a good rebuttal? I haven't read the book, but the original article is based primarily on clear assertions of fact and statistics. A "good rebuttal" world presumably show that those stats are wrong, or at best misleading. But John's rebuttal is just a bunch of ponderous, abstract philosophy, with the most compelling part (and it's not very compelling) being:

"Much of its more than 500 pages consists of figures aiming to show the progress that has been made under the aegis of Enlightenment ideals. Of course, these figures settle nothing. [Uh, of course? Care to elaborate?] Like Pinker’s celebrated assertion that the world is becoming ever more peaceful – the statistical basis of which has been demolished by Nassim Nicholas Taleb [Oh snap! It was demolished by this dude I've never heard of, using reasoning that isn't provided? Dammnnnn, sick burn] everything depends on what is included in them and how they are interpreted. [Alright, I'm ready for you truth bomb]

Are the millions incarcerated in the vast American prison system and the millions more who live under parole included in the calculus that says human freedom is increasing? [Um, are they not? Or are you asking me to do the homework to see if your point has merit?] If we are to congratulate ourselves on being less cruel to animals, how much weight should be given to the uncounted numbers that suffer in factory farming and hideous medical experiments – neither of which were practised on any comparable scale in the past? [Good question... that you decline to answer]"

Basically: "X is misleading, because what about Y?" I don't know man, can you just tell me rather than leaving me guessing? It would be more compelling if he did his own homework rather than leaving his sucker punch as an exercise to the reader.

I think the original article is far more compelling than this so called rebuttal.

severine · on July 29, 2018

Background on Pinker/Taleb: https://www.quora.com/Is-Nassim-Talebs-criticism-of-Steven-P...

Latteland · on July 28, 2018

Yes, you concisely explain why that rebuttal was little more than "our side knows that can't be right".

quinthar · on May 11, 2018

Fyi, Expensify uses sqlite as it's primary database in a HA clustered way, see bedrockdb.com for more details.

quinthar · on May 7, 2018

I think salary notifications in general are a sign of the company compensating in a fundamentally unfair way. Unless you are being hired to negotiate, paying you based on your negotiation skills seems unfair. It's harder -- a lot harder -- but I'd encourage the CEOs reading this to create a compensation scheme that is "fair by design" and pays correctly without the need to negotiate. Here's what we do, though I imagine there are likely even better ways available: https://blog.expensify.com/2016/06/17/expensifys-comp-review...