Hacker News new | past | comments | ask | show | jobs | submit login

I like this idea as I am pro-history keeping and prefer not to make irreversible changes like an update.

What are the space concerns for keeping data in this way? For something like a retail bank (audit issues aside which probably make this necessary anyway) does space suddenly become a big factor?




This is basically the Event Sourcing pattern. You start with an initial state, keep the events relating to it that might change it, and thus the current state can always be found by replaying those events. In the example above you don't even need to replay them in order, although maintaining order is definitely useful in more sophisticated uses of this pattern.

Double-entry book-keeping is done this way. The data certainly grows over time, but generally at a predictable and manageable rate.

Most people using long-lived event-sourced data structures have some kind of snapshot/checkpoint/archive mechanism, which going back to the accounting example might simply be an opening balance.


Accounting systems usually have a "close of day" or "end of year" event at which transactions can be consolidated.


With postgresql and MySQL this can be very limiting performance wise if you need to materialize the total after every transaction on a ledger that recieves a high rate of transactions. Continuous queries ought to help with this and supposedly pipeline DB is refactoring into a postgresql module; should be interesting.


This is also the pattern used by most version control systems, which use several strategies to deal with long histories efficiently. We use them all the time to move backwards in time, and between alternative realities until we eventually merge to a consistent state all users agree on.


The space concerns depend on the usage. But in general I feel people worry too much and too early about this. People also worry about calculating balance being not efficient. You need really large amounts of data before any of this becomes a problem. When it does become a problem it can be solved by caching.

I think the more realistic concern with this approach is that some queries might get complicated when they involve more complex logic based on the balance.


> When it does become a problem it can be solved by caching.

Solved or made worse by caching?


made worse by caching

Caching may relieve some stress (on say a dashboard), but if an application is doing calculations around account balances, and uses event sourcing-like structures in the database, then caching isn't going to help as those calculations are going to still need to be done for each new balance changing event.

So caching is never worse (unless you accidentally used the cached value where you needed the live value), but isn't necessarily going to relieve a lot of DB stress.


Since it's append-only, can't you "cache" or snapshot the calculation in pieces? e.g., "Today @ 00:00 all events on this account summed up to 100." Then you just replay today's events. You don't have to recalculate anything unless you're changing your transformations. Make your checkpoints whatever interval is necessary to make on-demand event replay acceptably performant.


A retail bank would already keep tabs of ALL changes to an account anyway, if not in this format, then in separate logging tables. They don't just keep only a final tally.


Typically you do both, keep a log and keep a current table (you wouldn't recalculate every time. As for space, you archive in a regular basis (say every year). That's why you cannot check you're bank statement from 2 or 3 years ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: