Mistakes we made adopting event sourcing and how we recovered

lacampbell · on July 1, 2019

Well this seems like refreshing honesty. It makes a change from the usual "why paradigm/technology/methodology sucks!!" tech people love to write, which upon closer inspection is almost always a reflection of their own shortcomings.

As well as using event-sourcing, the application has a Ports-and-Adapters (aka “hexagonal”) architecture. Loading the current state of an entity was hidden from the application logic behind a Port interface that was implemented by an Adapter class. My colleague, Ivan Sanchez, was able to switch the app over to calculating an entity’s current state from its event history and treating persistent entity state as a read through cache (as described above) in about one hour.

Ports and Adapters is great, though I found it hard to sink my teeth into as it has a thousand different names. But regardless, I am never doing n-layer architecture again if I can help it.

UK-AL · on July 1, 2019

You certainly can drive activities from events in event sourcing applications. Having built many like this. You don't have to, but you can.

Just make sure services that perform actions based on events, and the services that rebuild current state based on events are different. So you replay the state building without the activities being performed.

hinkley · on July 1, 2019

Also it's good if you go very very conservative on having actions generate other events.

Since event systems are frequently meant to scale up, and tend to process events in the order of receipt, they will tolerate ('hide' might be more accurate) a percentage of all events occurring in an infinite loop before anyone even thinks to notice. Discovery phase on this sort of regression can be embarrassing, to say the least.

Due to delegation and composition it's easy for this to sneak up on you. And always, always, attach some sort of origin ID (eg, correlation ID) that follows all generated events through the system. Because trying to come up with repro cases will be hard enough without having to divine where the problem comes into the system.

groestl · on July 2, 2019

I'm curious, do you have valid examples for generating events? I'm sceptical of allowing this as well.

UK-AL · on July 2, 2019

Most actions on events don't generate others events.

They tend to be things like email customer.

codebeaker · on July 1, 2019

> Not separating persisting the event history and persisting a view of the current state

This seems like the classic half-way that people take when adopting ES without buying into CQRS.

In my experience you need a minimal amount of historical knowledge about an aggregate to validate business rules, and that is strictly orthogonal to ways you may want to build projections over the data.

If I understand OP correctly they are saying that rather than deriving state by consuming their events, they were maintaining a kind of snapshot that served as both the read/write model representation of a given aggregate in addition to storing the events.

It can be very, very tempting to try and reuse "models" in both parts of an application, but some basic scrutiny and some hard-won experience will lead to examples such as how.. in a view model for a User there's almost certainly no need to ever consume PasswordChanged type events, but in the write model (business logic) you may need to consume not only the newest state (to compare for authentication purposes) but maybe some kind of analysis over time of how frequently passwords have been changed recently. This asymmetry is a bit contrived in this case, but I'm sure OP has similar examples from their own codebase.

In reality I have found that at most a handful of fields on otherwise quite rich models ever end up being "rehydrated" in the write model, and rarely ever as dumb attr setters; in the read models by comparison the vast majority of events recorded against any given aggregate end up being used by one projection or another.

nullwasamistake · on July 2, 2019

In my experience CQRS is useless these days when I can use something like MongoDb that has great read speeds. The idea of splitting reads and writes is fine conceptually but doesn't make sense to me when databases are so fast.

I worked on multiple Event Sourcing CQRS based systems and I see no advantages vs traditional databases.

The event stream is exactly the same as a commit log in a regular DB. Building "projections" is the same as making views or calculated columns. In my experience storing the whole event history is not useful and consumes huge amounts of space. Just like everyone compacts their commit logs, you don't have to, and if you don't, isn't ES the same thing with extra steps? It stores all the changes to tables and views, which are exactly the same as projections. With ES + CQRS combined you're basically replicating a database, badly.

Sorry to be so negative about this topic but I've worked on several of these projects at a decent scale and it's been the biggest disaster of my professional life. The idea may be viable but tooling is so bad that you're almost surely making a huge mistake implementing these patterns in production code.

mycall · on July 2, 2019

> The event stream is exactly the same as a commit log in a regular DB

This assumes the content in the commit log are client events, not CRUD on table data actions (events are higher order).

> the whole event history is not useful and consumes huge amounts of space

When a greenfield project is starting up, data is gold. 8TB drives are $242. Create a data lake from the start.

nullwasamistake · on July 3, 2019

I disagree. The events tend to relate directly to what would be regular database tables in my experience.

And data is the new oil is definitely a businessweek cargo cult. None of the event data was of any use on the projects I worked on. We were under mandate to find something to do with it and still couldn't. The closest useful thing was allowing undo and replaying past events but we already had that on Postgres with hibernate auditing on interesting tables (money related stuff)

codebeaker · on July 2, 2019

> Sorry to be so negative about this topic but I've worked on several of these projects at a decent scale and it's been the biggest disaster of my professional life. The idea may be viable but tooling is so bad that you're almost surely making a huge mistake implementing these patterns in production code.

Quite, which is why I'm both writing a book, and developing tooling to help with this problem. ES/CQRS is a nice pattern that you can explain to someone on the back of an envelope in 5 minutes, but the devil really is in the details.

nullwasamistake · on July 3, 2019

I agree it's a nice pattern. What you really need though is a "database in code". A library that handles all the projections, event stream, and especially replays and upgrading event versions automatically.

I'm hopeful that one day it will be a common and useful pattern but it desperately needs language integration to help with the details. That's why right now I can just say stay the hell away. Looking forward to advances in the space to make it viable in production

pyrale · on July 3, 2019

It seems like you have problems with the language here. Using pg and a language with ADTs, we have no trouble doing everything ourselves.

equasar · on July 2, 2019

Any preview?

edvinbesic · on July 2, 2019

One thing you do get "for free" pretty much is an audit log though. Each event can easily have an ID associated with it in case you need to be able to trace such a thing. Of course you can model this outside of a CQRS system but then you basically have created a CQRS system.

stevesimmons · on July 2, 2019

And if you store events with both an event_timestamp and effective_timestamp, you get bi-temporal state for free too.

Invaluable when handing a time series of financial events subject to adjustments and corrections. For instance, backdate interest adjustments due to misbooked payments, recalculate a derivatives trade if reported market data was initially incorrect, calculate adjustments to your business end of month P&L after correcting errors from two months ago.

nullwasamistake · on July 3, 2019

You get audit logs for free on a regular database with an ORM like Hibernate or even support on database snapshot level with Postgres or MSSQL. You get the logging capabilities of CQRS without any of the complexity.

After working on several such systems, I strongly believe you should keep data storage concerns in the database. Moving it to code implementation is a massive amount of overhead for no benefit.

the_duke · on July 2, 2019

ES is a wonderful concept, but comes with many pitfalls and design contraints that are very different from the classic CRUD workflow, which is what most devs are familiar with.

The biggest problem is the lack of available tooling.

The only custom built ES database is Eventstore [1], and it has many issues.

Many are abusing solutions that were not built for the use case (like Kafka), and end up hand-building a lot of related functionality that often ends up brittle.

Ad new purpose-built DB could do wonders for the space if it had these features:

* Horizontally scalable

* schema validation for events

* schema migrations and event upcasting

* Built-in projections + state rebuilding

* index projections

* something to help out with cross-aggregate sagas and coordination, and maybe even cross-aggregate transactions

* Spinning up mirror replay databases up to a certain time/event

[1] https://eventstore.org/

mamcx · on July 1, 2019

I have a pseudo-event sourcing setup because I imagine could help in better sync mobile devices (yes it work).

But instead of using eventually consistent and get rid of the RDBMS? I invert the operation:

1- Post data to the table (as normal) record the event in Postgres with JSONB: All integrity and validations work great and without complex logic.

2- 90% of all the queries need only the most recent data. Not stupid "reload the history" for each simply operations

3- Still get full history & auditing.

This is not much different to "audit tables" actually.

Most events are Table + CRUD operation. If a table is well modeled and normalized you don't need to detail much of the event.

What I miss in the first iteration was diff the changes. This lead to much extra info. Diffing the changes auto-magically turn the event more semantic!

----

Still is an imperfect setup and rely much on dynamic data to my taste. I wish modern RDBMS support algebraic types and better ways for this scenario but with this I avoid a lot of extra work and complications...

zug_zug · on July 2, 2019

Sounds cool. The thing I'd say is, in this type of setup you can use somtehing called a "view." It's like a query that makes a virtual table.

So you can then have this "virtual table" of the most recent of each audit record, and query against that. It helps keep stuff clean, very underused in my experience.

AaronFriel · on July 2, 2019

In most relational database management systems (RDBMS) views are not persisted/materialized or indexed separately.

Some RDBMS have materialized views that update concurrently and meet atomicity, consistency, isolation, and durability (ACID) requirements, but others do not and a separate command is needed to update views, in those cases you lose many of the benefits, but the view is fine for say, analytical queries in which a little lag is fine.

I am personally a fan of temporal tables, a technique in which rows are versioned, and you can query for the state of the database "at" a point in time. Here support is even less uniform than with materialized views, but if you have control over your database access layer or object relational mapping, you can brew your own. This gives you history if you need it (rare, but incredibly helpful) and fine grained control over indexing.

I recommend the latter approach for anyone looking to implement event sourcing and auditing. Including in each table a column for the source or cause of a change for example gives you an audit log for almost free.

zug_zug · on July 2, 2019

>> In most relational database management systems (RDBMS) views are not persisted/materialized or indexed separately.

Right, which is why they're great. You can materialize a view, but then you have to manually refresh. And they are actually incredibly performant.

How they work is they are basically treated as a query clause. For example you query "Select * from my_view where key = 1" and it intelligently executes "Select * from base_table where key = 1 and record = latest" (That is to say, it doesn't stupidly requery the whole base table each time, nor is there any risk of inconsistency)

The query optimizer is incredibly smart and honestly pretty sure this can scale beyond the needs of any startup.

AaronFriel · on July 6, 2019

I know how views work in several RDBMS, and I don't think that description lines up with what I understand.

> That is to say, it doesn't stupidly requery the whole base table each time, nor is there any risk of inconsistency

I mentioned materialized views, but then you mention the base_table. I think this statement is inaccurate whether it's a materialized view or not:

1. With materialized views as implemented by PostgreSQL and MS SQL Server, it doesn't query the base_table, it queries an artificially generated table. In the case of PostgreSQL, that table will be out of date (inconsistent) the moment there is a change to base_table. A manual refresh is necessary and expensive every time the view becomes inconsistent.

2. With non-materialized views, you are incorrect with at least PostgreSQL and MS SQL Server, the view does not "cache" its filters or joins or clauses. The view just functions as a query modification, and it does in fact query the "base table each time".

I think what you need to satisfy your requirements is actually a filtered or partial index, of the form:

    CREATE INDEX latest_idx 
        ON base_table (some_base_table_keys, ...)
     WHERE record = latest;

That index will be consistent and will keep the query optimizer from, as you say, stupidly requerying the whole base table, as long as the criteria holds and the keys your searching for are a prefix of those keys.

mycall · on July 2, 2019

Views are popular for projecting denormalized information. I pity the fool who inserts into views.

kian · on July 2, 2019

Stay tuned on that last front.

ilaksh · on July 2, 2019

It seems like sometimes event sourcing is what happens when someone hears a requirement for audit logs and instead of just setting up the database audit logging system they decide to completely re-architect their entire system. So in the end it is more fun but often not necessary.

stevesimmons · on July 2, 2019

Event sourcing is most powerful when the data necessary to recompute the state is small but the amount of state you are potentially interested in subsequently accessing is large, most of which is computed.

It is also incredible useful for financial applications where risk models (credit risk, market risk, fraud, etc) need backtesting. If you can't read the time-ordering of state events directly, a Risk/Data Science team spends an inordinate amount of time reconstructing time histories of events inferred from a mix of static db tables, audit logs and periodic data warehouse snapshots.

lkrubner · on July 2, 2019

I've had several clients lately who have either built or needed to rescue an architecture that worked from events in a log. I'm amazed at how many teams seem to get it wrong. Like the author here, there are apparently a lot of different ways to interpret the design, and therefore a lot of subtle mistakes to make. I tried to summarize what I've learned in "Why are software developers confused by Kafka and immutable logs?"

http://www.smashcompany.com/technology/why-are-software-deve...

pondidum · on July 2, 2019

I agree with your post but...You almost certainly shouldn't be doing EventSourcing with Kafka. EventStreaming sure, Kafka is pretty idea for this (as your source of truth).

EventSourced aggregates need to reload their entire history (unless you are snapshotting too) before applying a command, so you shouldn't have all your aggregates in a single Kafka topic, as you would have to read all events since forever, for all aggregates, every time you loaded one.

If you put each aggregate into a separate topic, everything is fine, until you want to add projections - how do your projections know to subscribe to all the new topics being created for each aggregate?

You can probably engineer your way around these limitations, but at that point, why not use a tool which is more suitable for the eventsourcing and keep Kafka for the "Business Process Event"?

lkrubner · on July 2, 2019

About this:

“how do your projections know to subscribe to all the new topics being created for each aggregate?”

Isn’t that precisely why people use ZooKeeper?

arkh · on July 2, 2019

It is because of the usual examples taken to show off ES.

Wow! I can have a todo-list with easy undo-redo-branch! With free audit logs. We need audit logs for our new project.

Forget how your domain application must handle permissions, forget about calling third-party API or sending mails. Forget about having to remove some of the data you have about a user when they ask for it. Anyway that's the maintenance team problem, you got your bonus and are now on a new project to try those "microservice things" to pad your resume.

alecco · on July 2, 2019

> Seduced by eventual consistency

Indeed. I work at a very, very large firm suffering from problems caused by this. The system crawls and is quite expensive to run. But it's so entrenched there are no plans to replace it, only more layers put on top to mitigate it.

RDBM people took a bit more time before implementing temporal features (SQL 2011).

k8_maze · on July 1, 2019

Sorry to be that guy - the title tag has a typo: Mistaeks instead of Mistakes.

I promise that I will contribute more substantial feedback once I've read the piece :D

bdcravens · on July 1, 2019

I believe that's the intentional name of his blog.

k8_maze · on July 1, 2019

Whoops, now I feel dumb.

roro159 · on July 1, 2019

You just made a mistaek. :)

hinkley · on July 1, 2019

And bad mistaeks, I've made a fwe...

kostarelo · on July 1, 2019

Very hard to read on my mobile phone

aabbcc1241 · on July 2, 2019

I saved to pocket then read in mobile friendly way. It's helpful when you cannot modify the style as on PC.

konschubert · on July 2, 2019

Try to switch to reader mode of your phone supports it.

sheeshkebab · on July 1, 2019

a cqrs based service I deal with in prod can probably best described as write anything other last two json records > /dev/null. It’s stupid and brain dead way of dealing with data, and is not worth anything other creating deadweight of unqeuriable and unmanageable json blob history.

It sounds like author is unfamiliarity with properly and automatically dealing with schema evolution tools. cqrs is not a way out of that.