GitHub was down

natfriedman · on Feb 27, 2020

Sincere apologies to all GitHub users for the downtime this morning, and the brief outages last week as well. We take reliability very seriously, and will publish a full RCA in the near future.

guessmyname · on Feb 27, 2020

> […] downtime this morning, and the brief outages last week as well.

For context, there were four (public) incidents this week:

• Incident on Feb 27, 14:31 until 18:54 UTC - https://www.githubstatus.com/incidents/q07bfjh7jf1t

• Incident on Feb 25, 16:36 until 18:48 UTC - https://www.githubstatus.com/incidents/xp2qc958g4wt

• Incident on Feb 20, 21:31 until 22:16 UTC - https://www.githubstatus.com/incidents/bd29l6zgr43g

• Incident on Feb 19, 15:17 until 16:09 UTC - https://www.githubstatus.com/incidents/fxbbtd7mhz1c

_bxg1 · on Feb 27, 2020

Well Bitbucket lost all of my repositories this morning so you've got a long way to fall

forgot-my-pw · on Feb 28, 2020

Bitbucket also have frequent downtimes: https://bitbucket.status.atlassian.com/

101404 · on Feb 27, 2020

What do you mean they "lost your repositories"?

_bxg1 · on Feb 27, 2020

When I tried to log in I was prompted to "upgrade my account" to a "Bitbucket Cloud" account. After doing so, all of my repositories were gone. It seems that my repositories remained on my "Bitbucket ""Regular""" account but that my email address was no longer associated with it, giving me no way of logging in to it. I emailed support 6 hours ago and have yet to get a response.

MyelinatedT · on Feb 27, 2020

Wow Atlassian... That is quite horrifying! Glad I stuck with GitHub through the MS acquisition. Bitbucket was probably my main alternative.

zenexer · on Feb 27, 2020

For the record, Azure DevOps did the same thing to me when we switched over to Azure AD. My account and repositories ended up in an entirely corrupt state. Support was eventually able to resolve most of it, but I’m still discovering problems.

_bxg1 · on Feb 28, 2020

I shudder at the thought of GitHub one day trying to integrate with people's Microsoft accounts.

_bxg1 · on Feb 27, 2020

Yep. Luckily I was able to recover the most important one - a website I'd built a couple years ago for a paying client - from Heroku, of all places.

zeta0x10 · on Feb 27, 2020

At least your main competitor's uptime metrics are also pretty bad, so fingers crossed.

MuffinFlavored · on Feb 27, 2020

What did you guys deploy/what scale tipping point did you guys hit that caused the past 3 days of problems?

At my job, if something we go wrong... management just tells us roll it back. That always fixes the problem, right? :P

uniformlyrandom · on Feb 27, 2020

Yep, it is Dunning-Kruger effect.

If rolling back works for a simple system, why wouldn't it work for a complex one?

Because one cannot step into the same river twice. Heraclitus would probably make a good engineering manager.

tcbasche · on Feb 27, 2020

This is 100% not the Dunning-Kruger effect. How on Earth would it be?

edit - from wikipedia: In the field of psychology, the Dunning–Kruger effect is a cognitive bias in which people assess their cognitive ability as greater than it is.

behindsight · on Feb 27, 2020

Maybe not directly but it could still apply.

I interpreted GP as saying the aforementioned management has just enough cursory knowledge to want to apply the same hammer that worked on a simple system to that on a complex system, but not enough knowledge to realize the unknown-unknowns that they aren't even aware of.

tcbasche · on Feb 27, 2020

I wouldn't say that's a psychological phenomenon though, just ignorance, arrogance or over-confidence.

latchkey · on Feb 27, 2020

Thanks Nat! Keep up the good work and thanks for contributing here. Github is still my favorite. =)

pnako · on Feb 27, 2020

Do you guys ever miss meritocracy?

kevinmannix · on Feb 27, 2020

This seems to be the third or so day in the past week I've had issues with GitHub around this time in the morning. They've typically been really good. I'm a bit surprised there hasn't been more talk about it on HN.

giancarlostoro · on Feb 27, 2020

They seem to be doing heavy work on it. Now on Mobile you can't see repos in "Desktop Mode" which is unfortunate. I have to tell my browser to pretend to be in desktop mode. Plus the regex post from the other day seems to imply they are working on new things when somebody from GH replied in said thread. I don't mind improvements, but don't break production guys...

djsumdog · on Feb 27, 2020

They also changed the "Group Membership" dialog to be paginated when you add a new person to an organization. We have over 200 groups so now I have to page through for ever new hire we add. There's not even a search option.

I'm sure the pagination might better for performance, but it's terrible UI.

masklinn · on Feb 27, 2020

They may have missed that one, because at the same time they introduced both pagination and search to the repository membership page, and boy did that help us on one of our repos with a few hundred direct collaborators, by the end we could only manage access through the API because the page didn't even load most of the time.

londons_explore · on Feb 28, 2020

I see why web pages need pagination so the server or browser doesn't OOM, but there really ought to be 10000 entries per page, not 25 that most sites seem to like.

Ctrl+F on a list of 10000 entries is far easier than clicking through 400 ajaxy pages and trying to figure out some custom and buggy filtering system that probably doesn't allow regex.

Past 10000 records most sites probably ought to just let you export in something bigquery compatible anyway - Regular Joe isn't going to have more than 10000 of anything, and anyone who does can learn how to use proper data tools.

masklinn · on Feb 28, 2020

> there really ought to be 10000 entries per page

Did you miss the part where I noted Github’s lists fail to load (let alone render) long before that point?

giancarlostoro · on Feb 27, 2020

They really need beta.github.com to let people test changes that are not yet defacto. A uservoice type of thing, and the ability for people to join. I love to beta test and give feedback. Microsoft has used uservoice in the past, as has Sulake and other companies I've beta tested for (as a customer).

Edit:

Realized *.github.com takes you to your .github.io sites.

oefrha · on Feb 27, 2020

> Now on Mobile you can't see repos in "Desktop Mode" which is unfortunate.

Wait, what? On iOS Safari I can only see repos in desktop mode now (except the issue tracker which is responsive anyway). Which is a good thing. Not sure why you have the exact opposite experience?

(I do vaguely recall being asked if I would prefer desktop mode on my phone a while back, and I said yes.)

saagarjha · on Feb 27, 2020

I think it sets a cookie or similar to save this.

izolate · on Feb 27, 2020

[flagged]

askl56 · on Feb 27, 2020

Intentions are meaningless. If you're providing a service, (especially charging money for said service), you can't break it because "you're hard at work".

kempbellt · on Feb 27, 2020

>you can't break it because "you're hard at work"

Apparently you can...because it is broken.

A few days of degraded service is frustrating, but their up-time has bought them a lot of credit in my book - especially considering what they do.

This is the real world. No service is magically infallible.

cocire · on Feb 27, 2020

> Intentions are meaningless

Are you a robot? Have some humanity...

desiderantes · on Feb 27, 2020

GitHub is not a human being, it's a company.

Operyl · on Feb 27, 2020

A lot of people have patience on day one, but this is the third or so day now this has happened. It’s understandable there are ruffled feathers.

geerlingguy · on Feb 27, 2020

Yeah... once every few weeks is one thing. Once a day is getting really annoying.

djsumdog · on Feb 27, 2020

Yea, I'm surprised I didn't see anything from the outage yesterday.

mirekrusin · on Feb 27, 2020

Maybe they can’t post it because it doesn’t work /s

dirtydroog · on Feb 27, 2020

Still miles better than BitBucket.

djsumdog · on Feb 27, 2020

When was the last time BitBucket had an outage? Personally I don't see a lot of difference between the two platforms; or GitLab (my primary now). Github probably has the best UI, but Gitlab's has gotten a lot better; and there are always self hosted solutions like Gogs.

nickjj · on Feb 27, 2020

> When was the last time BitBucket had an outage?

I've been doing on-going client work for someone using BitBucket and for weeks it feels like every other day has an outage related to their pipelines (CI) feature (the thing I happen to be working on).

It was constant banners about service disruption. There's a lot of UI outage related issues too, like the pipelines page starting to show a new build but never updating any of the progress until you reload the page -- which sounds like some type of API outage somewhere. I'm not sure if that gets reported as an outage but it makes using the platform not fun.

stevekemp · on Feb 27, 2020

I'm pleased I don't have to deal with BitBucket any more, but back a year or two it felt like it had an outage that impacted work at least once every six months. Sure that might not sound like much, but it was always a pain.

Plus of course the service was so damn slow that using it was a daily pain.

leadingthenet · on Feb 27, 2020

There's a yellow banner (that you can't even close) shown every few weeks, and it's usually related to the Pipelines being down, again. That often results in degrading functionality in other parts of the project too. And it's still slow as molasses. I wish I'll never have to use Bitbucket again, in the future.

kempbellt · on Feb 27, 2020

How much data does BitBucket have to process on a daily basis compared to GitHub, or GitLab?

I imagine that there are stability issues that any provider will have to deal with as they scale to account for the masses.

dirtydroog · on Feb 27, 2020

Github's diffs are pretty much instantaneous, Bitbucket just gives up, "now that's a lot of code!". No, it was a one line change actually.

vraivroo · on Feb 27, 2020

Github will bail on large diffs, too.

apple4ever · on Feb 27, 2020

I've also much preferred GitLab's GUI. It seemed to be much cleaner and smoother than Githubs (not to mention the much better named Merge Request).

rthomas6 · on Feb 27, 2020

Sure, but not better than Gitlab.

vraivroo · on Feb 27, 2020

Do you use Gitlab? Everybody on HN loves to love Gitlab because they're the underdog, and the product isn't bad, but it's not that great either.

rthomas6 · on Feb 28, 2020

Yes, I use it for personal projects. I also use a company-hosted version at work. The built in CI is great. I can't think of a reason, other than price for companies, why to use GitHub over gitlab. Both are great, but gitlab's built-in CI I think is easier to use and better integrated.

parthdesai · on Feb 27, 2020

Yeah, you just lose 6 hours of prod data on gitlab. :)

waynenilsen · on Feb 27, 2020

Don't forget to check your SLAs

Enterprise = 99.95% (quarterly)

https://help.github.com/en/github/site-policy/github-enterpr...

They're having a bad February but January was good. We will see what March has in store

cle · on Feb 27, 2020

> How do we calculate Uptime?

> Our Uptime calculation is based on the percentage of successful requests we serve through our web, API, and Git client interfaces.

Just curious, how do they measure this? What is the actual calculation?

allanbreyes · on Feb 27, 2020

> What is the actual calculation?

Not answering this directly, but the paper Meaningful Availability [0] released recently really changed my opinion on how to calculate and visualize availability. There's a discussion on HN as well [1].

[0]: https://www.usenix.org/system/files/nsdi20spring_hauer_prepu... [1]: https://news.ycombinator.com/item?id=22424173

thanatropism · on Feb 27, 2020

That was insightful to some point. But, of course, the relevant metric is "expected availability" -- before I decide to go on the service; therefore, not the same as "customers served". If I have to think about downtime then my experience is degraded; moreso if I have to delay and batch planned interactions (all of which will later count as successful!)

[Edit: To the point: a high rate of randomly-timed failures is a kind of degraded experience, but not as critical as blocky patches of downtime. A 1% rate of randomly-timed failures is much much much preferred than having the service go out three straight days every February.]

Also: uptime is not the same as "customer delight". It's all about time.

allanbreyes · on Feb 27, 2020

> a high rate of randomly-timed failures is a kind of degraded experience, but not as critical as blocky patches of downtime

Do you think that's an accurate generalization for all software and business contexts? I think a novel insight about the paper is that windowed user uptime is able to visualize the differences. (See Figure 20 from the paper.)

johannes1234321 · on Feb 27, 2020

It certainly is not - a trade might not care about 99% of the time, but the exact moment they want to do a trade the system must work.

Whereas of some GitHub request fails and I retry it's a minor annoyance, but in most cases I won't even know whether that was GitHub's flaw, my local system or some networking in between.

jws · on Feb 27, 2020

Our Uptime calculation is based on the percentage of successful requests we serve through our web, API, and Git client interfaces.

So when a customer finds a broken service it is in their financial best interest to repeatedly hammer the broken service and drive down the uptime calculation to trigger their rebate.

Just an observation, not a suggestion. I’d fire any customer I found doing this.

Benjammer · on Feb 27, 2020

What do you mean? All modern enterprise analytics/monitoring solutions are going to be able to give you some kind of top-level "request success rate" metric. I assume they mostly just lean into whatever monitoring tooling they have set up. What kind of "calculation" are you imagining here? Like a very specific SRE formula for availability windows or something?

bradstewart · on Feb 27, 2020

Depending on what part of the system is down, how do you know you even got a request to mark as failed?

MetalMatze · on Feb 27, 2020

You want to start measuring the closet to your users. In most cases that would be some sort of load balancer. I don't think there's much we can do without going to the client side.

a012 · on Feb 27, 2020

It means their SLA bases on non 5xx and connection errors?

Benjammer · on Feb 27, 2020

I mean that seems like a question to ask an account rep? I'm sure it's also probably not a hard and fast rule for every single customer, hence the ambiguity in the general language.

dorfsmay · on Feb 27, 2020

Same here.

They obviously don't have beacons on the client side, I wonder if it's based on statistics, at this time of the day on a Tuesday we should be getting x requests but are getting only x/n.

heydabop · on Feb 27, 2020

> They obviously don't have beacons on the client side

Says who?

dorfsmay · on Feb 27, 2020

Client side beacons would have to be implemented in the git client, and making parallel requests. I've seen no evidence of it myself, and I'd think people would freak out if found inside git's source.

coryodaniel · on Feb 27, 2020

Nothing quite like the “maybe” credit backed SLA.

swyx · on Feb 27, 2020

what are the consequences of busting 99.95%? what happens if it is 99.94% vs 19.94%?

somesortofsystm · on Feb 27, 2020

Mitigate: Local repos.

a012 · on Feb 28, 2020

With their new Github Actions, these downtime would stall your entire company workflows if fully depend on it.

savrajsingh · on Feb 27, 2020

No matter how many talented engineers you have on staff, your entire service can still go down. Let's pause and reflect on that. ;)

pdelgallego · on Feb 27, 2020

"You can't legislate against failure, but you can focus on fast detection and response"

-- Chris Pinkham

salmon · on Feb 27, 2020

"I will not be harassed in my own private domicile"

-- Jesse Pinkman

edm0nd · on Feb 27, 2020

"I AM THE ONE WHO KNOCKS"

-- Walter White

PenguinCoder · on Feb 27, 2020

"Prevention is ideal, but detection is a must"

iso947 · on Feb 27, 2020

It’s amazing how this is accepted in the software world. Move fast and break things, such a different philosophy to other areas.

dasil003 · on Feb 27, 2020

It’s evolutionary pressure; software is malleable and potential functionality is limitless. Software companies that didn’t ascribe to this philosophy were repeatedly killed by ones that did until it became the status quo.

mattigames · on Feb 27, 2020

I mean, that's rather disputable: The Apollo 1 exploded, medical mistakes have a toll of 250.000 deaths per year in the US alone; among many other serious mistakes on vast different areas, I think unreliability is unfortunately a constant on the human race.

dspillett · on Feb 27, 2020

I think the quote from Pinkham is more about dealing with genuinely unavoidable failures, not those due to moving without appropriate due diligence.

amelius · on Feb 27, 2020

The interesting thing is that Git is entirely non-centralized, so in theory they could simply redirect to servers onto which the data has been mirrored.

Cthulhu_ · on Feb 27, 2020

Git is, but the APIs and all the services they provide around it aren't.

That said, I think it's a bit weird that they don't store the data of the services around the code itself in git, like they do with e.g. sites. That way you'd have an `issues` branch that you could still access if github is down.

But that would probably pave the way for easy migrations away from Github.

chrisweekly · on Feb 27, 2020

>"But that would probably pave the way for easy migrations away from Github."

bingo

amelius · on Feb 27, 2020

At least they could have used the concepts in Git's design. But it seems they didn't learn much from the tool they based their service on.

frenchy · on Feb 28, 2020

I don't think learned is the right word to use here. Github's centralized design and vendor lock-in is quite intentional.

siffland · on Feb 27, 2020

Every project i have been on where we built for availability and resilience has inevitably had at least one single point of failure. Usually it is something deemed non critical, but somehow can still bring the infrastructure down (A single DNS server at one of our production sites did this, we have 2 more accessible via a VPN tunnel, it was deemed if the production DNS went down the other two were still reachable, to bad the day it happened the tunnel was down too).

Also you have to deal with sysadmin error, i know us sysadmins are practically perfect in every way, but occasionally we make mistakes....big mistakes. ;)

So redirecting might not always be possible.....

tribaal · on Feb 27, 2020

Yes, git is.

The issues, comments, PRs, wikis etc... that we all came to depend on aren't.

jsmith45 · on Feb 28, 2020

The wiki is stored as a git repository, although to my knowledge the others are not.

legohead · on Feb 27, 2020

could learn a few things from Netflix and chaos engineering...

alexis_fr · on Feb 27, 2020

Let’s reflect on Amazon’s 99.999999999% (literally their number) durability on S3.

dragonwriter · on Feb 27, 2020

Durability isn't uptime.

brainwipe · on Feb 27, 2020

Correct. Availability is 99.99. https://aws.amazon.com/s3/storage-classes/

victords · on Feb 27, 2020

Regardless, the SLA that AWS (& other cloud providers) meet is quite impressive.

Cthulhu_ · on Feb 27, 2020

But what do they offer if they don't meet it? I mean a discount on your monthly bill somehow doesn't sound like it would cover a potential loss.

anonsivalley652 · on Feb 27, 2020

When Microsoft buys companies, they tend to progressively decay as the original architects leave, the morale of remaining employees grinds down from the stress and they bring in cheaper contractors to duct tape the bits together and plug the holes in levee with their fingers. I've BTDTBTTS. cough LinkExchange, WebTV, Hotmail, Skype, Softricity, Nokia, LinkedIn, Danger/Sidekick cough GH maybe next. ¯\_(ツ)_/¯

briffle · on Feb 27, 2020

That has nothing to do with Microsoft. That is ANY large merger.

  1. Nothing is going to change.  We bought this company because we love it
  2. We need to show a higher profit for this quarter, cut all expenses for every subsidiary by 15% by Friday
  3. Cut back on training, R&D, and support teams.  they are a huge cost center
  4. Bunch of employees leave after retention bonuses, replaced with MUCH cheaper labor
  5. Need to show better on our next quarterly filing, slightly increase prices
  6. Through attrition, replace more good people with cheap drones, until nobody knows WHY things are the way they are.
  7. More increased prices, and way increased support contracts
  8. Wonder why we have lost all this marketshare.  Look at Company X, they are doing great, lets buy them.

pimlottc · on Feb 27, 2020

Can you remove the leading spaces causing your comment to appear in a code block? It’s quite hard to read on mobile. Thanks!

shp0ngle · on Feb 27, 2020

Skype was bad before Microsoft when it was still part of eBay and stayed terrible after Microsoft. LinkedIn was legendary in dark patterns usage way before Microsoft.

Nokia... I don’t know about that one. Change to smartphones hit every “old” phone brand... Ericsson did not survive, Siemens did not survive, Alcatel is just a brand now, even Sony has a hard time... Nokia would probably die no matter what. All the Maemo/linux based OSes (that kept changing names all the time) were nice, but so was Palm’s WebOS...

erikbye · on Feb 27, 2020

IME GitHub has not had increased downtime after Microsoft's acquisition.

anonsivalley652 · on Feb 27, 2020

Well, some other people in the comments disagree. And it hasn't happened yet, but it's the way they don't manage / integrate acquisitions very well unless they're wowie complementary products like Visio. Danger dropped off a cliff and Softricity was absolutely amazing but shelved, so friends of mine basically repeated the theme for VMware View and were acquihired by VMware. Time will tell where GH goes.

bbrree66 · on Feb 27, 2020

There are so many logical flaws here.

People's comments are meaningless, you can look at historical GitHub up-time and see that it hasn't changed meaningfully.

"And it hasn't happened yet"

Ah yes, now you have to backtrack from: it happened! to... no wait I promise it will happen! Based on... what? The fact that some acquisitions don't go well?

This is all pure speculation with no substantiation.

I recommend learning about confirmation bias.

erikbye · on Feb 27, 2020

I agree, but I just now took a look here: https://www.githubstatus.com/uptime?page=7

I went back from the time of Microsoft's acquisition, and that status seems heavily underreported. At least when I checked now, it was all green, green. That does not reflect my experience.

endorphone · on Feb 27, 2020

To be fair, they didn't say that was the cause of the current outage. They made a general observation that Microsoft-acquired companies degrade, which seems like a fairly reasonable observation: The goal of being the best Git service/repo falls by the wayside as other corporate goals push in.

bennyelv · on Feb 27, 2020

>Softricity was absolutely amazing but shelved

Why do you say that? I thought it was just renamed App-V and it's still going strong to this day. https://docs.microsoft.com/en-us/windows/application-managem...

stickfigure · on Feb 27, 2020

Early Danger adopter here. The hiptop was great for its time but with the advent of the iPhone, it was obsolete. That was before the MS acquisition. Rubin (and presumably much of his team) had long prior left for Android.

Danger was already in freefall by the time of the acquisition. You can't blame Microsoft for that.

anticensor · on Feb 27, 2020

Microsoft GitHub is currently part of core Microsoft software development infrastructure. Windows source code resides on GitHub even.

bdcravens · on Feb 27, 2020

Do a search on HN for "Github down". It happened a lot before the Microsoft acquisition in mid-2018. Perhaps what you're saying is true, but your comment is entirely not relevant to this outage.

Jaygles · on Feb 27, 2020

Have they written any postmortems regarding their last couple of degradations? I tried searching their blog but the only ones that popped up were over a year old.

SuperSandro2000 · on Feb 27, 2020

I would also be really interested in them but didn't find anything yet. Maybe their new notification system has something to do with it?

goranmoomin · on Feb 27, 2020

HN already has a 'Github downtime' post as soon as I find Github weird and check HN if it's only me. How is everybody so fast? :-)

chimprich · on Feb 27, 2020

If Github is down, thousands of programmers suddenly have nothing better to do.

tasogare · on Feb 27, 2020

Conversely, if Hacker News was down GitHub would suddenly see a spike in traffic :)

ithkuil · on Feb 27, 2020

I rolled my eyes twice, then merged a few PRs manually and moved on with my day. (i.e. the git server itself and all the API required to interact with the CI automation _appears_ to work just fine)

umanwizard · on Feb 27, 2020

The actual server was broken for me:

  $ git push
  Enumerating objects: 26, done.
  Counting objects: 100% (26/26), done.
  Delta compression using up to 8 threads
  Compressing objects: 100% (15/15), done.
  Writing objects: 100% (15/15), 1.49 KiB | 1.49 MiB/s, done.
  Total 15 (delta 12), reused 0 (delta 0)
  remote: Resolving deltas: 100% (12/12), completed with 10     local objects.
  remote: Internal Server Error
  To git+ssh://github.com/<redacted>/<redacted>
   ! [remote failure]    wip -> wip (remote failed to report status)
  error: failed to push some refs to 'git+ssh://git@github.com/<redacted>/<redacted>'

davidajackson · on Feb 27, 2020

Fortunately the first I saw of the one a few days ago started right around lunchtime.

djsumdog · on Feb 27, 2020

The tragedy of the cloud.

jbverschoor · on Feb 27, 2020

git is distributed...

0xffff2 · on Feb 27, 2020

git is distributed. The GitHub issue and PR tracker isn't.

kempbellt · on Feb 27, 2020

Yeah, but people are lazy and GitHub has a user-friendly UI with big green "Merge" buttons.

mirekrusin · on Feb 27, 2020

And ci integration; which runs automated tests; which must pass to release package/docker image/whatever; which is required for deploying new version for your system etc. The whole thing is as distributed as business of a guy selling hot dogs on the street.

arparthasarathi · on Feb 27, 2020

HN is the next place after Github that a programmer frequently visits ;)

jansan · on Feb 27, 2020

If Github is down, the software developing world comes to a standstill. So people either get a coffee, take a shit or go to YCNews.

human · on Feb 27, 2020

Or a combination of all three!

zaphar · on Feb 27, 2020

Almost nothing I work on has a dependency on GitHub. Whether work or personal. We have all dependencies vendored at work and my personal stuff is the same.

There really is no good reason bee so dependent on GitHub.

bflesch · on Feb 27, 2020

Are they migrating to azure?

donkeydoug · on Feb 27, 2020

I had the same guess... migrating from aws to azure & hitting some bumps. Have to assume they won't be very forthcoming about it if that is the reason.

masklinn · on Feb 27, 2020

> migrating from aws to azure & hitting some bumps

Could be the IO? I remember colleagues working on getting stuff running on azure and they experienced horrible IO latency, as well as very low throughput for lots of small IO (aka unix-style software).

Was a few years back so it might have improved since, but if those things are still non-optimal and github is built with a unix-style vision of tons of small IO access…

This specific outage taught me that github apparently stores git repos on-disk which I was not expecting though (because the API access complained it could not delete repos until they'd fully backed to disk or something).

akx · on Feb 28, 2020

I don't think they mainly run on AWS, but on "bare metal": https://github.com/holman/ama/issues/553

anticensor · on Feb 27, 2020

Preparing the infrastructure for Windows open-source event?

krallja · on Feb 27, 2020

Fourth time this month.

charrondev · on Feb 27, 2020

It’s quite unfortunate. I’m hoping the post some kind of RCA/analysis.

I know I would if we had a month with only 2 9s of uptime.

nalllar · on Feb 27, 2020

they're doing the usual corporate status page uptime lying:

https://i.imgur.com/vy7onDT.png https://www.githubstatus.com/uptime

rohansingh · on Feb 27, 2020

I don't get it. Their own incidents tab on the same site shows four incidents this month: https://www.githubstatus.com/history

I'm not sure how that results in 99.98% uptime on the other tab.

nalllar · on Feb 27, 2020

turns out it's bad/misleading UI, there's a dropdown for which type of downtime which defaults to 'GIT Operations'

rohansingh · on Feb 27, 2020

Ah, got it. It's still incorrect though. With the incident a couple days ago, I couldn't `git push` for a couple hours.

sciurus · on Feb 27, 2020

Yeah, this is just the poor UI of their provider, Atlassian Statuspage.

grumple · on Feb 27, 2020

% uptime is a terrible metric. Begin down for an hour in the middle of the day is only .14% downtime for the month but is typically regarded as a big deal. If it happened every single workday you're still looking at 97% uptime! Sounds wonderful, right? Just work around it for an hour per day!

Number of outages and duration total are much better metrics.

mirekrusin · on Feb 27, 2020

There should be enough pixels for each minute of a day.

SuperSandro2000 · on Feb 27, 2020

I hope they catch up with this. If you guarantee SLA than you need to be honest with it or guarantee less.

nalllar · on Feb 27, 2020

looking again, the reporting under 'API Requests' downtime seems accurate but none of the earlier outages report downtime under 'Git Operations', which were also broken.

didn't notice there was a selector for which type of downtime on that page that didn't default to all.

jve · on Feb 27, 2020

Check the dropdown. Select for example "Issues, PR, Projects". Shows 4 days of issues in feb.

_pmf_ · on Feb 27, 2020

Time to get on Microsoft's paid plan ...

tomphoolery · on Feb 27, 2020

I've been noticing the issue they're describing for a few days now, with errors in GitHub Actions requiring rebuilds and webhooks not seeming to fire which caused Jira to go out of sync.

mirekrusin · on Feb 27, 2020

Mssql only once guaranteed delivery in action?

aalleavitch · on Feb 27, 2020

Centralization is great, isn't it?

dana321 · on Feb 27, 2020

I think in a few years time we will look back and think

"Waa, that was so archaic the way we used to do things!"

eeZah7Ux · on Feb 27, 2020

It would be nice if someone wrote a decentralized VCS...

mirekrusin · on Feb 27, 2020

With issues built in like fossil, oh wait...

mirekrusin · on Feb 27, 2020

Gets philosophical as well - if you use laptop as a lamp in your tent - is it good or bad?

thereyougo · on Feb 27, 2020

I appreciate the way they keep everyone informed in the downtime. It tells a lot about the company

arnonejoe · on Feb 27, 2020

https://downdetector.com/status/github/

dvdhnt · on Feb 27, 2020

Well, given the timing of GitHub reliability issues over the last few days, I think we can all agree it has everything to do with dates and timezones?

/s

Appreciate the work.

zeisss · on Feb 27, 2020

That explains why my test suite suddenly takes >1h and after I canceled it, the status was green O.o

goranmoomin · on Feb 27, 2020

Looks like they are fine now. Yay!

glenneroo · on Feb 27, 2020

> We continue to investigate the issues with GitHub services and will shift to a slower update cadence to provide more meaningful updates going forward. Posted 18 minutes ago. Feb 27, 2020 - 16:12 UTC

zymhan · on Feb 27, 2020

Statuspage still isn't green https://www.githubstatus.com/

ivanfon · on Feb 27, 2020

I'm still unable to leave any code reviews, and occasionally I'm getting error pages.

vivan · on Feb 27, 2020

What do you guys recommend as a good way to continue work undisrupted when GitHub goes down? A second remote mirror?

roblabla · on Feb 27, 2020

A second mirror doesn't really help - when github goes down, the code should still be available locally on your computer. The things that become truly available when github dies are all the non-git features: issues, PRs, etc...

There are several ways to work around this, but none are really satisfying.

cuspycode · on Feb 27, 2020

Use self-hosted repos like gitlab or fossil, and then mirror the public parts to github.

mirekrusin · on Feb 27, 2020

Git or fossil mesh.

ArchReaper · on Feb 27, 2020

>We continue to investigate the issues with GitHub services and will shift to a slower update cadence to provide more meaningful updates going forward.

Translation: our shitty software update practices are now affecting Github, not just Windows!

If anyone from Microsoft is reading this, why is your company so incompetent at software updates in the past few years?

uniformlyrandom · on Feb 27, 2020

I've read it as "we'll stop posting silly updates to this incident until we actually know something", not "we'll stop rolling out updates to our services".

I'd wage that Microsoft practices had nothing to do with this, but I'll wait for RCA.

salmon30salmon · on Feb 27, 2020

You honestly believe that a company the size of GitHub has had their software update practices appreciably change since the acquisition? Relax, GitHub had update issues before and they will have them again.

MrMorden · on Feb 27, 2020

All of this has happened before and all of this will happen again. So say we all.

applecrazy · on Feb 27, 2020

Github is part of MS, but as I understand it, is run separately from the rest of Microsoft.

herbderb · on Feb 27, 2020

I think that's talking about status update cadence, not software updates.

scumbert · on Feb 27, 2020

Is it not ostensibly the same company, just with a new owner...?

barbecue_sauce · on Feb 27, 2020

Initially, but that is subject to change or more likely evolve over time.