Hacker News new | past | comments | ask | show | jobs | submit login
Tolerating full cloud outages with Monzo Stand-in (monzo.com)
61 points by abritishguy 1 day ago | hide | past | favorite | 41 comments





What I wonder is “have they isolated third party dependencies?” If AWS is hard down, those may well be impacted—in some cases, by their own third party dependencies. You can test turning off your AWS environment, but you can’t really test turning off S3 for everyone…

It's a very good question. The stand-in system itself has been built to have basically no external dependencies itself.

So, the question you are really asking is "to what extent are the other parties involved in the processing of payments resilient to AWS failure" – e.g. Stripe probably isn't and that's probably a decent chunk of e-commerce.

I definitely don't think this would be anything close to smooth sailing if AWS was to fully go down, but we do have the benefit that underlying payment infra is still dominated by on-prem with leased lines etc. My best guess of the actual behaviour would be that bank transfers would keep working, the card networks themselves would keep working but the average e-commerce website would not.

Naturally, we can only control for what we can control for – and for us the primary benefit of stand-in is what it gives us in the much more likely scenario of an incident in our platform.


From what I understand of payment systems this is so that payments through card machines, contactless payments for public transport, cash withdrawals from ATMs, etc. all continue to work. A lot of those systems are surprisingly insulated from AWS simply by virtue of being extremely archaic

I wouldn’t assume that is the case. The failure modes are different that is all.

I saw a whole corp POS platform a couple of decades ago that was hanging off a TFTP server on a machine that no one dared turn off in case the world ended. One day the DC UPS failed, it didn’t come back up and they had no retail operations for several hours while they sent a bunch of cash to a guy who had left to help them fix it.

There’s stuff like that everywhere lurking in the archaic.

I know of a modem in a DC which is used to talk to a branch office running AS400 hardware that is so old they have to buy spares off eBay.


To add to this, I remember a story my father told me. This is off the top of my head and a few years ago so it might not be fully accurate.

My father worked as a banker for most of his life and when he was in his late twenties he got a position to oversee a smaller investment bank. This is sometime in the late 90s. When he started, he took a general look around, checked with everyone how things are going and happened to meet on of the few IT people working in the building. When the IT guy realized that he was speaking to a new person who might be able to change things around there, he was elated and told him that there was an issue the previous boss never took too urgently, even though it was quite critical. Apparently the servers that were running pretty much all of the transactions of that investment bank were located in the basement of that building and have literally never been migrated, upgraded or anything else. The servers that were left over from that time was literally one running machine and another machine that had died a few years prior that was now only used for spares in case anything on the singular still working machine broke. Since the hardware was so old, there apparently weren’t many replacement parts left and the ones that were left were incredibly expensive due to many bank depending on those specific servers.

Anyway, my father heard that story and immediately got the guy the funding he needed to migrate to a newer and better system. Sometimes I think about this kind of stuff, we think banks are really resilient (and they try to be), but I wouldn’t be surprised if setup like these still exist somewhere because people are too scared to touch them.


Unrelated tangent: I was reading the article and suddenly realised that I could not identify the font. After a quick search:

> Our functional typeface is Monzo Sans, a custom cut of Universal Sans, meaning it’s unique to Monzo. We chose it for maximum readability, with generous dots and curled ends.

Intersting choice, but I dig it :)


This seems especially relevant given the massive outage that Barclays, another major UK bank just suffered. Barclays was down for around two days with customers unable to spend money at all.

I suppose had they implemented a similar system, they would have degraded into a minimum viable banking system rather than the total outage that impacted so many brits.


On the last day that tax payments were due

These blog posts are why I continue to support Monzo. Their openness is really appreciated.

A decent setup which allows you to prove you are not dependent on 1 cloud provider will probably pay for itself when it's time to negotiate discounts.

I doubt the sales folks you'll be talking to will care about your multi cloud deployment, as they don't have the skills to verify something like that.

Well you can turn them off for a day and they have the skills to see that.

My only conclusion is that Monzo would rather embrace the apocalypse than rely on Microsoft Azure to provide a tertiary fallback.

Who can blame them. Me too.

Really interesting. Would love to understand how they came to the decision to build this,and whether there's any precedent for it.

Part of being a regulated bank in the UK is proving infrastructure resiliency.

Monzo were the first bank here to run entirely on the cloud, so I imagine the regulators were extra strict with them.

I'm not saying this level of resilience is due to that alone, but perhaps it started them on the path?


Payment card networks have delegated authorization plans, where if a major processor goes down, they will still route transactions and use a simplified secondary network for making approval decisions.

It's called "stand-in processing", and I assume it's the inspiration here.


The Monzo example feels different though, as they're explicitly not looking to replicate all functionality, just something minimal to get by whilst they fix the primary cloud services.

Completely unrelated to this blog post but I really dislike Fintech saying "Get paid early" in their promos.

It's clearly marketing at someone too stupid to be able to see right through how utterly useless that is. If you are celebrating getting your paycheck 1 day earlier (every time) then your financial literally and financial health are probably in the toilet. They _must_ know they are preying on people with statements like that.

Then again, 90% of Fintech seems to be just a heavy layer of lipstick over an archaic system. Often with very little care of if any of the tools actually help people and more of a focus on how flashy or how much people think they are being helped.


Though, in some cases (like when it's your bank saying it), it's usually just them frontrunning reliable (coming from a payroll provider) and predictable (getting paid the same time each month) ACH transactions with a near-zero likelihood of not settling, then crediting you the money before the ACH is totally settled, so not ALL cases are fintech gimmicks.

But most are, and unfortunately, as the proliferation of payday loans shows us, there is no shortage of desperate people and organizations willing to take advantage of that.


Right, some banks will not post a deposit to your account until after a holding period. I deal with a lot of ACH payments, and despite a very strict schedule in the network, the retail customer-facing side is surprisingly unpredictable.

So the "post credit early" promise is not a gimmick, but the whole idea of being paid early is a gimmick. The next pay period is still a full period away, so any benefit to being credited early is literally a one-time, and probably just one-day thing.


Remember that Monzo is a UK institution -- ACH isn't relevant, and they can see the payment in flight if it's using BACS.

https://monzo.com/blog/2019/08/20/monzo-now-lets-you-get-pai...


as a banker, when I first heard about that I did I wonder if they've modeled that risk correctly

it's the sort of thing that could probably wipe out their capital completely in a black swan event


There's an ocean of historical data to predict reversal or settlement failure of ACH transactions.

I would guess that payroll credits are the second most-reliable category in the ocean of ACH transactions, right after US Treasury payments.

How black would this swan need to be to blow up this stability?


not sure what american payment transfers have to do with UK BACS payments, but ok

> I would guess that payroll credits are the second most-reliable category in the ocean of ACH transactions, right after US Treasury payments.

maybe some sort of lunatic getting control of the US treasury payment systems?

I suppose that can't ever happen


This thread is "completely unrelated to this blog post" and your previous comment was responding to comments about the US ACH network.

Right, but it's talking about "Get paid early" - and in the context of this particular post that's a specific Monzo feature that has absolutely nothing to do with ACH.

Do US fintechs offer something similar? Perhaps, but I bet it works pretty differently to BACS where the bank already knows about the money transfer.


Yes, "get paid early" is a feature that US fintechs and banks offer, and have for years. So in the context of this subthread, it's all about ACH.

The feature is based on the predictable pattern of payroll direct deposits, and/or a pre-settlement view into the ACH transfers for the day. The latter sounds like what you are describing for BACS, but I don't know the UK details.


It's a risk that was very much understood and it's fully covered.

I've heard that one before

I mean, it's been in production for years, has a cap of £20k and I'm pretty sure the design of BACS means it's very difficult to recall a transaction after the 4pm time when the feature becomes available. Simon is probably in a very good position to know how often that sort of thing happened, if ever.

I'm pretty happy with them offering this based on my understanding of BACS, as a shareholder.


20k... per account? adds up

I suppose most people won't have 20k (net) payslips, or draw it all out on pay day!

I'm not saying he's wrong, but covering a risk of this nature? I'm sure you'd be able to find someone to take your premiums

whether you'd be able to collect on it when needed, in the situation where the financial system is under serious stress is something else (see: 2008)


> not ALL cases are fintech gimmicks.

Fair and that's all well and good. I'm just saying if 1-3 days delay of getting your paycheck is going to have a big impact on one's life then I encourage one to reexamine their decisions, something else is the problem.


[flagged]


I’ve been using it for years and have experienced none of what you describe.

I’ve been using it since they where in beta and I’ve never experienced this.

I had one issue with closing an investment account, and they reached out to me to let me know there was an issue and proactively rebated me.


I work in the sector. There are lucky people and unlucky people. Welcome to lucky land. So far.

Or maybe you work in tech/customer support and are on unlucky island yourself.

Or maybe I just have higher standards and better risk appraisal after seeing their customer support response for two separate people after they had problems was “fuck off and talk to the ombudsman”.

Problems being stuck payment during migration to another bank (after payment issues) and a frozen account after false positive on a fraud.

No those aren’t fuck off things they are fix them. Missing or stuck cash is not a go away thing unless you are utterly incompetent as an organisation.


TWO people you know had bad support interactions!

Well then.


I’m a statistician. What point are you trying to make? Write it off as an anecdote? It’s not 1999.

It’s two observations. Observations lead to questions which lead to collecting more data. Which lead to a conclusion that it is not pretty.

When managing risk you judge a company not on a good day but how they handle things on a bad day. And Monzo are terrible. It’s a pattern.

Go look and find.


I agree! 1,999 more data points would be better than two, but that seems like a weird number to choose.

You might be a statistician, but I'm also assuming you're human, with messy emotions (I am too). Two data points, even if they're really close and personal for you, is still only that, two data points.

I'm sure I'm going to find complaints. People aren't incensed and go online and post "hey everything's fine and went smoothly and I had no problems" when that's the case, they only do that when they're angry about something.

I don't have a Monzo account but I've been on the Internet long enough to recognize that bias, in myself, and others.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: