Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The title of this is post is incorrect. It's not just Confluence that is down, but JIRA and all Atlassian Cloud services for impacted customers.

Some customers are also told they should expect to be down for up to another two weeks, making the outage a total of 3 weeks or so [1]. 400 companies are impacted by this outage and at least one of them is a YC company, Bitrise, who are still down, Atlassian is not telling them anything specific, and data loss is a possibility for them [2].

The outage is strange to last this long as it contradicts Atlassian's own admission of how they plan for resillience and their statements of how quickly they can restore data for customers. They specifically say how they utilize multiple DCs, have zero data loss scenarios and their recovery time objective (RTO) is 6 hours for their Tier 1 products (like JIRA). They are failing this objective big time, which questions what is happening behind the scenes. [3]

The outage is especially ironic given that Atlassaian have discontinued their self-hosted Server product and are forcing all customers to move to the Atlassian Cloud - this very cloud that is down for many customers. They stopped selling Server licenses in Feb 2021, disallowed changing tiers in Server products this February, and are stopping support for Server products in Feb 2024.

This long of an outage for software which companies depend on is mind-boggling. The only similarly long outage I can recall is the 2011 Playstation Network Outage which lasted for 23 days.

Good luck with the restoration, which sounds like it's manual work. Hopefully we get a public postmortem from Atlasssian: they are not the company known to publish these, though.

[1] https://twitter.com/kjartanmuller/status/1513462616030683138...

[2] https://twitter.com/gabornadai/status/1513481270411636738?s=...

[3] https://www.atlassian.com/trust/security/data-management



You should include a link to their recovery time objective for their SaaS offerings: 6 hours.

https://www.atlassian.com/trust/security/data-management


They don’t even back up that often, why bother having a six hour target for your daily backups, I mean really… I’d rather you loose less of my data thanks.


There's no tier with an RPO of 6 hours on that page, unless it has changed. It's 1h, 1h, 8h, 24h for RPOs.


The RTO for products is listed at 6 hours. RPO is the maximum data loss. RTO is how long the recovery process will take. This is saying they should be able to recover the products in less than 6 hours, to a point in time less than one hour before the incident.


You are correct, that was my mistake.


Thanks, updated the post. It's a good callout. They do have an RTO and they clearly struggle to meet it right now.


I know it sounds pedantic but this bit me in the ass once. You are talking about a retrospective, not a post-mortem. I asked for a post-mortem because of a mistake with a product I was responsible for. The CEO set a meeting where he was asking pointed questions about if I still had faith in the product and finally asked straight out if I thought it would fail. Reason being he comes from pharma where post-mortem is an autopsy. It's not performed unless you have a corpse to dissect. I explained that in software engineer we have post-mortems about why coffee ran low. He was not amused.


No, he is talking about a post-mortem. The same word can mean different things in different contexts.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: