"Nowadays, many developers don’t care about time complexity … and they’re right!"
Massive facepalm.
A developer who doesn't care about complexity (both structural complexity of the code, and execution-time complexity of algorithms he uses) is simply incompetent. No ifs and no buts about it.
> But when you deal with a large amount of data (I’m not talking about thousands) or if you’re fighting for milliseconds, it becomes critical to understand this concept. And guess what, databases have to deal with both situations!
I think they're just anticipating an audience who only deal with fragments of data in frontend or "backend" services. That statement holds when you're never really dealing with more than tens of elements.
I fixed a performance bug today that was caused by too many queries across tens of rows of data. Hey, why keep it in memory when the database can get it for you in only 5ms! 80 times per call
Admittedly, this was a bit of an outlier case, but you get enough of those and you’re spending real money on extra hardware to keep up
The original quote is not about things generally taking a long time, but about time complexity i.e. differences between things like O(n), O(n log n) and O(n^2). For example, if the way you fixed that performance bug reduced the time by a fixed multiple then you didn't affect the time complexity, even if it's now much faster.
Maybe you already understood that but your bug description makes it sound like complexity wasn't necessarily the issue.
My case was about time complexity from calling the DB 10x times more than necessary. Even though these things are all fast now, as an engineer, you still need to know the techniques to avoid the slow bits
Although my example was only O(n^2), the constant of each n being over the wire to the DB blew the runtime up to a significant number. So even an n-squared algorithm with small data sets can kill your app. We changed it to be O(n) over the wire to make it reasonable
I'm actually dealing with the same thing. A request on my local machine against a tiny test dataset is taking 9 seconds to return ~30 rows (which is the entire table) in sorted order. The reason its so slow is because of queries that are overfetching and ORM code scattered throughout backend code, so stuff like "get list of things from DB, then loop over things and do more queries for each one" happens a lot, instead of having a single query join the data it needs. By cutting the columns down to only what was needed and moving some of these extra queries into joins or subqueries, and removing the ones that were unnecessary, I got the 9 second request to complete in about 400ms. And that's without even looking into indexing or other optimisations.
It's a usual CS graduate failing, actually - to think that only asymptotic time complexity matters.
A competent programmer understands that in many cases there are much faster ways of doing things with small chunks of data than asymptotically-good algorithms - and that these cases could be performance-critical if executed often.
The point, i believe, is that it's okay to think "i know that this code has quadratic complexity, and that is acceptable right now".
You might agree with that even though you consider that it's not okay to think "i don't know what the complexity of this code is" or "i don't know what complexity is".
It might even be okay to think "i don't know what the complexity of this code is, and that is acceptable right now". You should be able to imagine some cases where that would be true.
Recently, a book was published about implementing Git in Ruby as a way of learning Git internals. There was a long thread here about that, and about others porting the code to their language of choice
It occurred to me that a book about building a database system in an easy to read language would be a great way to get hands on with the various algorithms used by modern database technologies - and may be more broadly useful too
...thanks for reminding me of my advanced relational db exam I just had today!
But in all seriousness, the fundamentals of how relational DB's work and physically execute queries should be at least in the back of the mind of any engineer who writes SQL for a living. Great article!
Very good indeed,
Quite the explanation, the storage part would be an interesting thing to see, for instance in SQLite, what is this binary file created in the filesystem, or how is it’s data organized.
SQLite documentation covers this quite well, see "Database File Format" [0] or have a look at pager.c source code [1]. SQLite database is actually recommended by Library of Congress for archiving [2]
Massive facepalm.
A developer who doesn't care about complexity (both structural complexity of the code, and execution-time complexity of algorithms he uses) is simply incompetent. No ifs and no buts about it.