>I wonder how much of this kind of stuff exists out of necessity and how much of it exists because very smart people are just bored and/or unsatisfied.
That's a ton of it. Like it or not, publishing a digital newspaper is not a hard or unsolved problem; it's one of the web's core competencies. If you hire people who want to build cool stuff to supervise a CMS, well, you get this kind of outcome.
The raw cost is understated because these experimental setups misinterpret the functionality of the new architectures/formats they're using. It doesn't truly rear its ugly head until there is a major data loss or corruption event. It's not that these never happen with RDBMS, it's just that RDBMS contemplates this possibility and tries to make it pretty hard to do that, whereas message queues just automatically delete stuff (by design, so they can serve as functional message queues!).
RDBMS have spoiled us and we take its featureset, 40+ years in the making, for granted. We need to be careful and not assume that `GROUP BY` is the only thing we leave on the table when we "adopt" (more accurately abuse) one of these new-wave solutions as a system of record.
Since no one is going to admit to their boss "this wouldn't have happened if we used Postgres", and since most bosses are not going to know what that means, most of these spectacular failures will never be accurately attributed to their true cause: developers putting their interest in trying new things above their duty to ensure their employer's systems are reliable, stable, and resilient.
There are non-negligible problems in the news space like:
1. Supporting full-text search for a fair number of concurrent users
2. Availability of the system with minimal downtime
3. Scalability within the day and year, traffic patterns around e.g., breaking news events will far surpass 2AM traffic
4. Notifications
I could go on and on but honestly, it's just a tone-deaf response.
Parting pot-shot: "No one is going to admit to their boss that the reason a worldwide news organization can't publish any stories is because their one postgres master node went down, or is waiting on a state transfer to a fallback master"
"We can't publish right now because the database had to enter an unplanned maintenance period" is a lot different from "our authoritative archive is gone and we have to try to rebuild it from all these separate 'materialized views', woops."
You may not have to worry about somebody setting a "delete all data older and/or bigger than Y or Z" but you have to worry about someone running "DELETE FROM table" without a WHERE clause. Which is easier to prevent? The one that can be done through the same mechanism as non-destructive queries? Or the one that can only be modified through a file-system configuration, completely separate from its API?
Regardless, it's a different paradigm with different "don't do that" behaviors that you need to know about.
In Kafka, if you want the persistent, append-only, write-ahead log to not delete stuff, then configure the retention period to keep things forever.
If someone runs `DELETE FROM table` without a WHERE clause, I expect:
a) the query to be tested and scripted on non-production environments first, making this a non-issue;
b) user doesn't have DELETE permissions on that table and/or the rows not intended to be deleted;
c) referential integrity to kick in and prevent the deletion of interdependent records (which is most records in a database);
d) CHECK constraints, triggers, and other validation routines to prevent this clearly-excessive operation;
e) the person executing and inspecting these queries within an ad-hoc transaction to roll it back before committing;
f) if, in the event this does occur and commit, which itself means there's a big problem with your procedure, streaming binlog archives can facilitate a point-in-time backup, audit tables can be used to rebuild the data, etc.; these aren't typically included by default (streaming point-in-time backups are on AWS Aurora) but they're conventional for many professionally-run RDBMS installations.
and I'm sure there are failsafes that I'm forgetting, and since I'm not a DBA, some I'm probably not even aware of.
How many of these can I expect to help me out when a packaging bug (or, simply a mistaken "y" on the prompt asking if I want to override the package config) clobbers the Kafka config file?
I'm not quite sure what you're insinuating about YC here but I suppose a standard "no, we don't do anything like that" is in order.
We sometimes rate-limit HN accounts when they have a habit of posting low-quality comments too quickly or (especially) getting involved in flamewars. Since we've discussed this more than once before, I assume you remember it, but other people might not know. That's all that's happening here and of course it has nothing to do with your opinions, about Kafka or YC or anything else.
"Flame wars" in the sense of actually responding to the people who are trying to have a discussion on the technical points? Suggesting that my posts involve flaming virtually ever is a flagrant mischaracterization, and we both know it.
This thread contains about the "worst" you'll find from me, with "No offense, but this reveals your ignorance...". This is a flame only insofar as "flame" constitutes any disagreement at all.
When the account was rate-limited, the complaint was a) that HN had received out-of-band complaints from people who were getting mad that I discouraged others from deploying database clusters on Kubernetes; and b) that my posts on the subject were "trite". Maybe I don't waste enough time on HN, but I'd never seen that before.
I concede there was one post in that thread that could be interpreted as borderline incendiary, suggesting that the only reason to run a database on top of k8s is to win "GCool Points" (a position I continue to maintain), but it was directed at no one in particular, intended to provide some levity to non-techno-hipsters, and prevent some of the routine noise we see every time I make that type of post from people who really don't have any counter-point except that "Google does it!". Hardly a habit of engaging in flame wars. And it was used by HN/YC as an excuse to "detach" the entire thread, in which a real technical debate was occurring, as is occurring here, instead of just the single borderline post.
I understand that you feel the need to go on the record with a denial. I hope you can appreciate that I feel that need too, whether the accusation is implied through the rate limit that prevents me from replying and may mislead others into believing that my position is indefensible, or explicit, as it is now.
Log compaction is generally what you want, which will preserve the most recent message for every key in a topic. Event streams expanding boundlessly is something very few will ever need or want, so you'll toss messages into an "event" topic of some kind, apply the event to the most recent entry in the "model" topic and store a new version (which will be kept after log compaction, the original messages can be pruned after you need to free up storage).
That's a ton of it. Like it or not, publishing a digital newspaper is not a hard or unsolved problem; it's one of the web's core competencies. If you hire people who want to build cool stuff to supervise a CMS, well, you get this kind of outcome.
The raw cost is understated because these experimental setups misinterpret the functionality of the new architectures/formats they're using. It doesn't truly rear its ugly head until there is a major data loss or corruption event. It's not that these never happen with RDBMS, it's just that RDBMS contemplates this possibility and tries to make it pretty hard to do that, whereas message queues just automatically delete stuff (by design, so they can serve as functional message queues!).
RDBMS have spoiled us and we take its featureset, 40+ years in the making, for granted. We need to be careful and not assume that `GROUP BY` is the only thing we leave on the table when we "adopt" (more accurately abuse) one of these new-wave solutions as a system of record.
Since no one is going to admit to their boss "this wouldn't have happened if we used Postgres", and since most bosses are not going to know what that means, most of these spectacular failures will never be accurately attributed to their true cause: developers putting their interest in trying new things above their duty to ensure their employer's systems are reliable, stable, and resilient.