How we test at Nubank [video]

lovebes · on March 11, 2020

Why do banking institutions or related fintech use functional languages? Is there some more guarantee than OOP flavored languages?

Barrin92 · on March 11, 2020

I personally would want to use an immutable functional language wherever I can and only not use it if I have a good reason not too. Immutability makes reasoning about programs significantly easier, especially if they rely on concurrency.

And for finance it particular it's a very natural fit because there's just a lot of transformation of data and business logic.

carterehsmith · on March 11, 2020

So, how does immutability matter if we deal with the database -- which is what holds the state, and is obviously mutable.

fnordsensei · on March 11, 2020

Datomic is immutable in the sense that what you had for lunch today doesn't change what you had for lunch yesterday, where "lunch" is any arbitrary fact stored in the database.

I.e., you can ask to look at the entire database as it was yesterday, and run arbitrary queries against it.

You can also do speculative updates to it, in the sense of "show me the entire database as it would be if I were to have pizza for lunch".

It models this as a strictly linear succession of assertions and retractions of facts. Yesterday, `A` was true, today `A` is no longer true. While this new fact is recorded, it doesn't change the fact that yesterday, `A` was true.

carterehsmith · on March 13, 2020

Sounds great in theory.

What we see in reality is that append-only database is unusable without making additional "projections" or whatever you call them, databases that are ready to be queried/updated, with maybe specific denormalizations, indexes and so on.

And oh, btw, those later databases are not "imutable".

fnordsensei · on March 15, 2020

It’s structured so that these operations can be done pretty much instantaneously. Schema is sort of asserted at read time, not write time.

I highly recommend the talk “Domain modeling and with Datalog”[1]. It gives an explanation of how all this works, including indexing.

1: https://youtu.be/oo-7mN9WXTw

dustingetz · on March 12, 2020

Dstomic is an immutable log (kind of like git). the only operation is append. there is a head pointer stored pointing to “latest”, this is the point of mutation you’re looking for, and it’s the only point.

_asummers · on March 11, 2020

Lots and lots of ETLs. Functional languages are a great match for that class of problem.

jonahbenton · on March 11, 2020

There are a lot of different kinds of financial institutions with a lot of different kinds of needs. In general, however, functional languages are a good fit for highly regulated domains, because they encourage splitting the (stateless) business legal rules for the domain from the stateful data management.

Complecting the two into, e.g. an Account object that has both metadata related to an account and e.g. rules related to transactions that can be part of an account quickly turns into an expensive maintenance nightmare.

Scarbutt · on March 11, 2020

Looks like picking Clojure and Datomic has created a great deal of technical debt for them. They started adding Spec everywhere to specify their data because they were having big problems scaling their wild west code base. But now Spec is dead and there's a new Spec2.alpha version.

I don't even know how they have manage to scale datomic to that level, the support contract we had for datomic was only really used to report bugs[0] but they have more than 2000 datomic transactors? ouch.

[0] Yes, too much bugs and slow, but databases are hard so I guess this was expected for a closed-source niche DB with little users.

jonahbenton · on March 11, 2020

I was at the talk. From having engaged with a lot of organizations working at scale, it was pretty clear they were near the top in terms of not being hampered by technical debt. They have the capability to move quickly and evolve the system in pragmatic ways to dial the knobs towards correctness or speed or functionality or whatever. A tremendous achievement and speaks volumes for the tooling.

There is no "Datomic scale" problem. Datomic transactors are just another singleton service to deploy with your microservice pod. The underlying storage can obviously be consolidated and scaled independently. I don't recall what they're using, would guess it is pg.

Scarbutt · on March 11, 2020

Multiple datomic transactors can't talk to the same database, even if they share the same storage.

jonahbenton · on March 11, 2020

Sure they can. Just give them different database names.

Scarbutt · on March 11, 2020

A different name is literally a different database.

jonahbenton · on March 11, 2020

A single postgres daemon- a unit of scaling- can support many datomic transactors, each talking to a named "database" hosted by that one daemon.

You don't have to scale your postgres daemons with your microservices- each of which have their own transactor- which would be painful and out of the ordinary for an ops team.

Scaling datomic on top of postgres is no different from scaling any other microservice.

That all said, the architectural point that did sound painful in the presentation- and which is a common pain point in microservice architectures, not unique to what NuBank is doing with Datomic- was having to maintain an ETL for analytics purposes, to pull together all of the distinct microservice-specific Datomic-hosted data sets into a single uniformly SQL-queryable data set. The details of that implementation, and whether it made use of the new SQL interface supported by Datomic, were not discussed. But it smelled brittle and fragile.

Scarbutt · on March 11, 2020

A single postgres daemon- a unit of scaling- can support many datomic transactors

yes

,each talking to a named "database" hosted by that one daemon.

No. That would imply distributed writes and break the single write serializability property of datomic. Think about it, transactors don't sync with each other(only one for HA but that's orthogonal).

To put it simply, multiple transactors can share the same storage but only one can write to a single database at a time.

dustingetz · on March 11, 2020

datomic peers can interleave N databases at query time, which is what you want right?

Scarbutt · on March 11, 2020

To a single transactor only.

dustingetz · on March 11, 2020

I don't think that's right

Scarbutt · on March 11, 2020

Are you implying that a single peer can be configured to use multiple transactors at the same time?

dustingetz · on March 11, 2020

yes i believe they simply compete for object cache

Scarbutt · on March 11, 2020

Ok, I never tried but I don't see any reason why it wouldn't work but also don't know how the peer library handles multiple connection objects. Anyway this is a different issue from having multiple transactors configured to write to the same logical database.

dustingetz · on March 12, 2020

it is interesting to consider because it enables a constellation architecture, where writes and reads are sharded independently. you could imagine a distributed computation like a social network where each person owns their own transactor but you can still query across the parts of the social graph you care about.

dustingetz · on March 11, 2020

Isn't nubank adding 50k users per day? Founded in 2013, with seven years later a $10B valuation? They don't seem to have a technical debt problem

I was at the talk, they don't use spec they use prismatic schema iirc while waiting for spec2 to stabilize.

fnordsensei · on March 11, 2020

They officially launched in 2014, I believe. They have some 20 million customers, last I checked, which gives ~10k customers per day on average from 2014. It makes sense that it would have been fewer per day in the beginning, and then accelerated.

slifin · on March 11, 2020

Datomic scales horizontally for reads but doesn't for writes, but then natively mysql doesn't scale for reads or writes, most people looking to scale mysql up use a clustered approach with many readers and one writer using a third party or enterprise solution

I don't know of a reliable multi write system for mysql that doesn't make significant trade offs

At work we use multiple mysql servers to handle "scaling" so it's not a surprise to me that Nubank are using multiple Datomic servers

Unless you start getting into eventual consistency territory I don't think distributed writes are a trivial problem and targeting Datomic for this is a bit odd

slifin · on March 11, 2020

Also the spec stuff I don't think it's a surprise to Nubank that spec is being upgraded it literally says alpha in the namespace, they have some great Clojure engineers at Nubank particularly the developer of Pathom they're really pushing the envelope in terms of distributed graphs and front end

rapfaria · on March 11, 2020

Not only that, but they had intentions of using `Clojure for everything`. From configuration management (currently using ruby if I recall correctly) to deployment, they want to use Clojure on all fronts.