Hacker Newsnew | past | comments | ask | show | jobs | submit | quinthar's commentslogin

I had some free time this holiday season and wrote up a blog post about how we use NUMA balancing on our 384-CPU, 6TB monster servers to keep the site humming reliably (and some of the challenges we've had to overcome). It's pretty low level, but I found there wasn't a lot of good info on how to optimize for high core-count servers, so I thought I'd share some experience. Enjoy!


we have been using this page locking technology for quite a while now and it works incredibly well. Having both read and write concurrency is super powerful and scales fantastically.


It's scales so perfectly we are deploying it across 3 datacenters, each with four 384 core machines with 3 terabytes of RAM. I can't speak highly enough about sqlite and the team behind it.


I don't understand why you'd use SQLite for a deployment this heavy over PostgreSQL or MySQL. Yes, SQLite is fantastic, but why would you choose to do it this way?


Because SQLite is often the right tool for the job, more so than Postgres and MySQL. It scales insanely well and requires minimal configuration and near zero administration. I try to use SQLite for as long as is feasible in my projects purely because it "just works".

Don't let the "Lite" fool you. Depending on your needs, you can scale to 10s of thousand of users using just SQLite. Or not! It is all about knowing your system and properly evaluating your options. I choose to use SQLite because it often fits the use case of small to medium projects the best.


Honest question: where do Postgres and MySQL fall down for such deployments? What is the administration required? Backups? Users and permissions? Indices?

FWIW I use sqlite for a personal wiki, i.e. a website with exactly one user :)


I think the idea is that if your service does more, it necessarily needs more work to maintain. From a simple point of view, there are just more moving parts to go wrong.


The lite doesn’t refer to light, it refers to a “mineral”.

https://changelog.com/podcast/201


I was confused by this, so I read the transcript. In case anyone else reads this, the name is supposed to be read as:

SQL (S Q L or sequel, your call) - ite

So it does sound like the name of a mineral: bauxite, boehmite, hematite, etc. Heck, kryptonite :D


> Jerod Santo:

> Hm, like a mineral. Were you playing on the word "light", or were you just playing on mineral...?

> Richard Hipp:

> I was, I was.

It's pronounced as a mineral. It's still a pun on "light". And an obvious one, given what SQLite is compared to other RDBMSes.


If someone put a good fast raft replication engine in process in front of sqlite you'd have a pretty killer db for a lot of things.



Today is your lucky day: https://github.com/rqlite/rqlite


I think it is a tradeoff. If you want more data resilience on disk, SQLite will require all sorts of hacks to (and more difficult, when) to dump to disk.

On the other hand, if you need every ounce of performance and do not care so much about data resiliency on disk, then the other sql applications will be too much hassle to configure in a way you do not miss any of the many safeguards to save data they ship with by default, which will kill your performance when you least expect.


It's faster? Simpler?

Conversely, why would you choose one of those alternatives?

(I don't actually think there aren't any reasons to do so, mind you, but I can well imagine that if you really KISS, well - even at impressive sizes sqlite makes sense).


SQLite's typing is rather weak, is it not? That might be a tad scary for someone used who relies on that part of PostgreSQL.


It sure is. After recently porting a large application from SQLite to Postgres, much of my time was spent casting types and ensuring code was updated to input the correct types into Postgres. All in all, the changes were for the better, as many bugs were discovered and resolved in the process.


Well, the historic one is, "because you need multiple concurrent data writers."

But, that's what this branch is supposed to be for. So now the reason is: "because you need multiple concurrent data writers from multiple processes." And it might also be "you need multiple concurrent data writers and care about your data enough to not risk loss in the event of a system crash or power failure" since it uses PRAGMA synchronous=OFF.


You are using this particular branch in production, deployed as you mention, or sqlite in general? Some more info about this use case would be very interesting if you can share!


What sort of volume is this handling? Is the dataset in memory, or if not, what size and type of I/O is backing this? It seems like a ton of CPU, whereas in my experience typically I/O is the primary bottleneck for database loads.


Note the second comment in the following thread:

https://news.ycombinator.com/item?id=12739771

quinthar posted in quite a lot of detail there as well, so it may provide some context.


The link says that it requires turning synchroous off, which means that you won't be waiting for real I/O on transactions, since no fsync calls are emitted. Add that to a huge amount of RAM for cache, and it's very reasonable to be CPU bound... So long as you don't mind data loss or corruption on power loss or kernel panic...


Doesn't a BBU Raid controllers largely address the power loss concern? And, I always thought BBU Raid controllers were just absolute common sense for any serious database (until everyone went cloud and suddenly basics like dual network card, dual PSU and raid controllers didn't fit Amazon's desire to sell complexity).


Well, data will be lost if anything below the OS buffer cache crashes (source http://sqlite.1065341.n5.nabble.com/How-dangerous-is-PRAGMA-...).

The BBU protects from power loss of the HDs, but not power loss or general failure of the mainboard or any other important component.

PAlso BBUs can run out of battery so a flash backed BBU is generally recommended.


Without fsync, the dirty page can sit in RAM and not even be sent to the disk, so a battery backup for the disk wouldn't solve data loss.


Right..but you leave fsync on in this case, no? This might be incompatible with the locking required by this server-process edition though (I guess, no clue). But more generally speaking, fsync=on with a proper raid controller gives massive performance boost (like orders of magnitude) while being relatively resilient to power loss.


"in my experience typically I/O is the primary bottleneck for database loads"

Not when you have potentially neatly 3TB of memory cache.

Of course it all depends on the dataset.


So is there one SQLite process running on each of the 384 cores?



Expensify is powered entirely using sqlite (inside bedrockdb.com) and it is freaking amazing.


This was discussed a couple years ago at https://news.ycombinator.com/item?id=12739771


Good thing you don't need 64bit integers.

Seriously, as impressive as SQLite is for what it is, people shouldn't be given the impression that it doesn't come with some massive and often surprising caveats when viewed as a general purpose database.


Looks like I misremembered this (re 64bit integers) and it was one of the other slightly odd limitations SQLite has around rowids and foreign keys that caused me to abandon the last system I was designing with it...

As for what most concerned me most when using SQLite was the ease of which it would happily allow me to make broken queries and not raise a fuss, e.g. referencing non-GROUP BYed terms in an aggregation.


Hi ris,

Can you provide example(s) of the broken query with non-GROUP BY columns in an aggregation? (Or a reference link if this is well known - I've searched briefly but can't find anything.)


Not the parent poster but it's queries like these:

Select user.id, user.name, sum(tx.amount) from user inner join txn on (user.id = txn.uid) group by user.id

User.name is not in the group by list so it is picked arbitrarily from the rows in the group. It's harmless here but not in all cases.


TBH I like the ability to do this. Sometimes you truly don't care which value you get, you just want one. Sometimes you have application-level constraints which mean the one it picks is guaranteed to be correct, but you can avoid the cost of enforcing that (at insertion time and at lookup time). It's a handy tool when it's appropriate.

But hell yes, tools should warn you about this loudly, as it's not generally correct.


Did you file a bug report?


SQLite has 64 bit integers. In fact row ids are 64 bit by default since version 3.


It has some quirks, but, I'd be interested to know what you thing are its "massive and surprising" issues.

Also, it has 64-bit integers - https://www.sqlite.org/datatype3.html


Think you have a few misplaced double negatives.


At Expensify we have been astonished by how easy it was to set up -- as simple as ssh -- and its incredible performance relative to OpenVPN. It's night and day due to it's multi threaded design, meaning it doesn't have the same single cpu bottleneck as OpenVPN. It's clearly the future.


I think a big challenge (and perhaps the root problem) is weighing the relative importance of different relative changes. Car deaths are down 97%, but suicides are up 30%. On balance, are things getting better or worse? (I'd say that car deaths are by far more common so the reduction there is more significant, but suicides are probably more directly correlated to happiness that car accidents...)

I'm more persuaded by the optimistic perspective, personally, but these are great contrasting pieces to highlight that it's not a slam dunk.


The biggest reduction in car deaths per passenger mile occurred from 1915-1925, about a halving in a decade.

Since then, from 1925 - 2015, the rate has consistently halved every 20 years.


This is a much, much better rebuttal than the other one. Thanks!


Hmm how is this a good rebuttal? I haven't read the book, but the original article is based primarily on clear assertions of fact and statistics. A "good rebuttal" world presumably show that those stats are wrong, or at best misleading. But John's rebuttal is just a bunch of ponderous, abstract philosophy, with the most compelling part (and it's not very compelling) being:

"Much of its more than 500 pages consists of figures aiming to show the progress that has been made under the aegis of Enlightenment ideals. Of course, these figures settle nothing. [Uh, of course? Care to elaborate?] Like Pinker’s celebrated assertion that the world is becoming ever more peaceful – the statistical basis of which has been demolished by Nassim Nicholas Taleb [Oh snap! It was demolished by this dude I've never heard of, using reasoning that isn't provided? Dammnnnn, sick burn] everything depends on what is included in them and how they are interpreted. [Alright, I'm ready for you truth bomb]

Are the millions incarcerated in the vast American prison system and the millions more who live under parole included in the calculus that says human freedom is increasing? [Um, are they not? Or are you asking me to do the homework to see if your point has merit?] If we are to congratulate ourselves on being less cruel to animals, how much weight should be given to the uncounted numbers that suffer in factory farming and hideous medical experiments – neither of which were practised on any comparable scale in the past? [Good question... that you decline to answer]"

Basically: "X is misleading, because what about Y?" I don't know man, can you just tell me rather than leaving me guessing? It would be more compelling if he did his own homework rather than leaving his sucker punch as an exercise to the reader.

I think the original article is far more compelling than this so called rebuttal.



Yes, you concisely explain why that rebuttal was little more than "our side knows that can't be right".


Fyi, Expensify uses sqlite as it's primary database in a HA clustered way, see bedrockdb.com for more details.


I think salary notifications in general are a sign of the company compensating in a fundamentally unfair way. Unless you are being hired to negotiate, paying you based on your negotiation skills seems unfair. It's harder -- a lot harder -- but I'd encourage the CEOs reading this to create a compensation scheme that is "fair by design" and pays correctly without the need to negotiate. Here's what we do, though I imagine there are likely even better ways available: https://blog.expensify.com/2016/06/17/expensifys-comp-review...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: