How does a relational database work? (2017)

averros · on April 17, 2019

"Nowadays, many developers don’t care about time complexity … and they’re right!"

Massive facepalm.

A developer who doesn't care about complexity (both structural complexity of the code, and execution-time complexity of algorithms he uses) is simply incompetent. No ifs and no buts about it.

kjeetgill · on April 17, 2019

I mean, the very next sentences are:

> But when you deal with a large amount of data (I’m not talking about thousands) or if you’re fighting for milliseconds, it becomes critical to understand this concept. And guess what, databases have to deal with both situations!

I think they're just anticipating an audience who only deal with fragments of data in frontend or "backend" services. That statement holds when you're never really dealing with more than tens of elements.

craz8 · on April 17, 2019

I fixed a performance bug today that was caused by too many queries across tens of rows of data. Hey, why keep it in memory when the database can get it for you in only 5ms! 80 times per call

Admittedly, this was a bit of an outlier case, but you get enough of those and you’re spending real money on extra hardware to keep up

quietbritishjim · on April 17, 2019

The original quote is not about things generally taking a long time, but about time complexity i.e. differences between things like O(n), O(n log n) and O(n^2). For example, if the way you fixed that performance bug reduced the time by a fixed multiple then you didn't affect the time complexity, even if it's now much faster.

Maybe you already understood that but your bug description makes it sound like complexity wasn't necessarily the issue.

craz8 · on April 17, 2019

My case was about time complexity from calling the DB 10x times more than necessary. Even though these things are all fast now, as an engineer, you still need to know the techniques to avoid the slow bits

Although my example was only O(n^2), the constant of each n being over the wire to the DB blew the runtime up to a significant number. So even an n-squared algorithm with small data sets can kill your app. We changed it to be O(n) over the wire to make it reasonable

rsanek · on April 17, 2019

If you're making the same db call for n items, 10 calls and 1 call are both O(n).

dkersten · on April 17, 2019

I'm actually dealing with the same thing. A request on my local machine against a tiny test dataset is taking 9 seconds to return ~30 rows (which is the entire table) in sorted order. The reason its so slow is because of queries that are overfetching and ORM code scattered throughout backend code, so stuff like "get list of things from DB, then loop over things and do more queries for each one" happens a lot, instead of having a single query join the data it needs. By cutting the columns down to only what was needed and moving some of these extra queries into joins or subqueries, and removing the ones that were unnecessary, I got the 9 second request to complete in about 400ms. And that's without even looking into indexing or other optimisations.

irishsultan · on April 17, 2019

If you're fighting milliseconds then time complexity is irrelevant (because time complexity is about asymptotic behaviour and ignores constants).

averros · on April 18, 2019

It's a usual CS graduate failing, actually - to think that only asymptotic time complexity matters.

A competent programmer understands that in many cases there are much faster ways of doing things with small chunks of data than asymptotically-good algorithms - and that these cases could be performance-critical if executed often.

faceyspacey · on April 17, 2019

In theory, yes. But in practice, binary “caring” is impossible—of course the amount of caring exists on a spectrum.

The point is you can care LESS (than in the past) as an effective productivity tactic.

twic · on April 17, 2019

There was an interesting thread about this on Twitter recently:

https://twitter.com/evolvable/status/1113031660637847552

The point, i believe, is that it's okay to think "i know that this code has quadratic complexity, and that is acceptable right now".

You might agree with that even though you consider that it's not okay to think "i don't know what the complexity of this code is" or "i don't know what complexity is".

It might even be okay to think "i don't know what the complexity of this code is, and that is acceptable right now". You should be able to imagine some cases where that would be true.

collyw · on April 17, 2019

I understand it, but in practice its far more useful to know what indexes to add to a database then remember what the big O notation is for it.

szafranek · on April 17, 2019

All I see is: "Error establishing a database connection".

Is it some terribly ironic post?

lpill · on April 17, 2019

Try the cached version: http://webcache.googleusercontent.com/search?q=cache:z-c0s5-...

bostonpete · on April 17, 2019

Title should be updated to "Does a relational database work?". ;-)

craz8 · on April 17, 2019

Recently, a book was published about implementing Git in Ruby as a way of learning Git internals. There was a long thread here about that, and about others porting the code to their language of choice

It occurred to me that a book about building a database system in an easy to read language would be a great way to get hands on with the various algorithms used by modern database technologies - and may be more broadly useful too

samrat · on April 17, 2019

I haven't been able to get my hands on the book, but Edward Sciore's "Database Design and Implementation" is supposed to be such a book.

voltagex_ · on April 17, 2019

There is no logical reason this book should be as expensive as it is: https://www.abebooks.com/servlet/SearchResults?cm_sp=SearchF...

Worldcat shows some libraries have it: https://www.worldcat.org/title/database-design-and-implement...

fsaintjacques · on April 17, 2019

https://cstack.github.io/db_tutorial/

jeffdo · on April 17, 2019

...thanks for reminding me of my advanced relational db exam I just had today!

But in all seriousness, the fundamentals of how relational DB's work and physically execute queries should be at least in the back of the mind of any engineer who writes SQL for a living. Great article!

kinard · on April 17, 2019

"Error establishing a database connection" - Not very well apparently :)

dddw · on April 17, 2019

"Error establishing a database connection" hn hug of death?

ironchest · on April 17, 2019

I get "Error establishing a database connection" :_D

Luctst · on April 17, 2019

On of the best article i could read on database.

Golfkid2Gadfly · on April 17, 2019

Very good indeed, Quite the explanation, the storage part would be an interesting thing to see, for instance in SQLite, what is this binary file created in the filesystem, or how is it’s data organized.

bionoid · on April 17, 2019

SQLite documentation covers this quite well, see "Database File Format" [0] or have a look at pager.c source code [1]. SQLite database is actually recommended by Library of Congress for archiving [2]

0: https://www.sqlite.org/fileformat.html

1: https://sqlite.org/src/artifact/422fd8cfa59fb917

2: https://www.loc.gov/preservation/resources/rfs/data.html