> But when you deal with a large amount of data (I’m not talking about thousands) or if you’re fighting for milliseconds, it becomes critical to understand this concept. And guess what, databases have to deal with both situations!
I think they're just anticipating an audience who only deal with fragments of data in frontend or "backend" services. That statement holds when you're never really dealing with more than tens of elements.
I fixed a performance bug today that was caused by too many queries across tens of rows of data. Hey, why keep it in memory when the database can get it for you in only 5ms! 80 times per call
Admittedly, this was a bit of an outlier case, but you get enough of those and you’re spending real money on extra hardware to keep up
The original quote is not about things generally taking a long time, but about time complexity i.e. differences between things like O(n), O(n log n) and O(n^2). For example, if the way you fixed that performance bug reduced the time by a fixed multiple then you didn't affect the time complexity, even if it's now much faster.
Maybe you already understood that but your bug description makes it sound like complexity wasn't necessarily the issue.
My case was about time complexity from calling the DB 10x times more than necessary. Even though these things are all fast now, as an engineer, you still need to know the techniques to avoid the slow bits
Although my example was only O(n^2), the constant of each n being over the wire to the DB blew the runtime up to a significant number. So even an n-squared algorithm with small data sets can kill your app. We changed it to be O(n) over the wire to make it reasonable
I'm actually dealing with the same thing. A request on my local machine against a tiny test dataset is taking 9 seconds to return ~30 rows (which is the entire table) in sorted order. The reason its so slow is because of queries that are overfetching and ORM code scattered throughout backend code, so stuff like "get list of things from DB, then loop over things and do more queries for each one" happens a lot, instead of having a single query join the data it needs. By cutting the columns down to only what was needed and moving some of these extra queries into joins or subqueries, and removing the ones that were unnecessary, I got the 9 second request to complete in about 400ms. And that's without even looking into indexing or other optimisations.
It's a usual CS graduate failing, actually - to think that only asymptotic time complexity matters.
A competent programmer understands that in many cases there are much faster ways of doing things with small chunks of data than asymptotically-good algorithms - and that these cases could be performance-critical if executed often.
> But when you deal with a large amount of data (I’m not talking about thousands) or if you’re fighting for milliseconds, it becomes critical to understand this concept. And guess what, databases have to deal with both situations!
I think they're just anticipating an audience who only deal with fragments of data in frontend or "backend" services. That statement holds when you're never really dealing with more than tens of elements.