Its honestly not that hard to build a WAL-style solution exactly the way you want it for your own application. You just have to get away from the "you shouldn't write your own xyz" boogeyman experience long enough to figure it out.
Do you know how to model your events using a type system?
Can you be bothered to implement a Serialize and Deserialize method for each event type, or simply use a JSON serializer?
Do you know how to make your favorite language compress and uncompress things to disk on a streaming basis?
Do you know how to seek file streams and/or manage a collection of them all at once?
Can you manage small caches of things in memory using Dictionary<TK,TV> and friends?
Are you interested in the exotic performance possibilities that open up to those who leverage the ring buffer and serialized busy wait consumer?
If you responded yes to most or all of the above, you are now officially granted permission to implement your own WAL/event source/streaming magic unicorn solutions.
Seriously, this stuff is really easy to play with. If you follow the rules, you would actually have a really hard time fucking it up. Even if you do, no one gets hurt. The hard part happens after you get the events to disk. Recovery, snapshots, garbage collection - that's where the pain kicks in. But, none of these areas is impossible. Recovery/Snapshots can again be handily defeated by the mighty JSON serializer if one is lazy enough to succumb to its power. Garbage collection can be a game of segmenting log files and slowly rewriting old events to the front of the log on a background thread. The nuance is in tuning all of these things in a way that makes the business + computer happy at the same time.
Plan to do it wrong like 15 times. Don't invest a whole lot into each attempt and you can pick this up really fast. Try to just write it all yourself. Aside from the base language libraries (file IO, threading, et. al), JSON serializer and GZIP, you really should just do it by hand because anything more complex is almost certainly wrong.
100% agree. I only used wal2json because my risk tolerance for the project was between "can't be bothered to pay the onboarding/maintenance cost of Kafka for Debezium" and "can't be bothered to implement a reader for the (well documented IIRC) stable binary WAL format" which is a weird spot to be in. This was a PoC written meant to demonstrate how we could implement the features we needed from ES using no more than standard Postgres tooling and a tiny service that was good enough to roll right into production with minor changes. It took under a week, though ideally I would have take the time to decoded the binary WAL format directly after saving the raw stream for safety's sake. Rust wasn't even an option at the time, nowadays it'd be mostly a bunch of macros and annotations with a sprinkingly of hand written FromBytes implementations and tiny bit of IO+serde glue code.
IIRC it took another data scientist and engineer under a month to turn the raw WAL logs into an audit log interface with pretty SSO avatars and weekly reporting. Someone from the devops team with DBA experience implemented time traveling staging DBs with continuous archiving and point in time recovery from production in the same time. Someone else later improved it so PITR used full backups created from a filtered WAL log so devs could select which parts of the production DB they copied over instead of each babying their own staging cluster that took days to rebuild. The whole project ended up giving us all of the benefits of event sourcing using standard, well tested tooling for a fraction of the cost.
We've thrown around the phrase "not invented here syndrome" so much that we've over corrected - as humans are wont to do - and now the younger generation thinks architectures like event sourcing or infrastructure like Kafka are a better solution than to just consume replication logs over a TCP connection to one of the most popular open source databases on the planet (not directed at the GP but my former coworkers :)). I'm starting to wonder if I've reached the age where I sound like the adults in the Peanuts cartoons, except the sound vaguely resembles "Get your resume driven development off my lawn!"
> "can't be bothered to pay the onboarding/maintenance cost of Kafka for Debezium"
Debezium can also be used without Kafka; either via Debezium Engine [1], where you embed it as a library into your JVM-based application and it will invoke a callback method you registered for every change event it receives. That way, you can react to change events in any way you want within your application itself, no messaging infrastructure required. The other option is using Debezium Server [2], which takes the embedded engine to connect Debezium to all sorts of messaging/streaming systems, such as Apache Pulsar, Google Cloud Pub/Sub, Amazon Kinesis, Redis Streams, etc.
Do you know how to model your events using a type system?
Can you be bothered to implement a Serialize and Deserialize method for each event type, or simply use a JSON serializer?
Do you know how to make your favorite language compress and uncompress things to disk on a streaming basis?
Do you know how to seek file streams and/or manage a collection of them all at once?
Can you manage small caches of things in memory using Dictionary<TK,TV> and friends?
Are you interested in the exotic performance possibilities that open up to those who leverage the ring buffer and serialized busy wait consumer?
If you responded yes to most or all of the above, you are now officially granted permission to implement your own WAL/event source/streaming magic unicorn solutions.
Seriously, this stuff is really easy to play with. If you follow the rules, you would actually have a really hard time fucking it up. Even if you do, no one gets hurt. The hard part happens after you get the events to disk. Recovery, snapshots, garbage collection - that's where the pain kicks in. But, none of these areas is impossible. Recovery/Snapshots can again be handily defeated by the mighty JSON serializer if one is lazy enough to succumb to its power. Garbage collection can be a game of segmenting log files and slowly rewriting old events to the front of the log on a background thread. The nuance is in tuning all of these things in a way that makes the business + computer happy at the same time.
Plan to do it wrong like 15 times. Don't invest a whole lot into each attempt and you can pick this up really fast. Try to just write it all yourself. Aside from the base language libraries (file IO, threading, et. al), JSON serializer and GZIP, you really should just do it by hand because anything more complex is almost certainly wrong.