Even though the status page shows all green checkmarks, we are still experiencing issues. The web interface is showing stale data and we can't perform merges in git.
Looks like some of their systems got out of sync and they aren't done resynchronising yet.
Update: After nothing happened, I rewrote the last commit and force pushed to trigger an update. That seemed to do the trick.
I have no inside knowledge, but from the outside it looks like whatever outage they had broke the propagation of commit data from the git repo to to their MySQL database. Maybe they use webhooks for their internal systems as well? That would explain why they first saw issues with webhooks, and later issues with pull requests that might depend on them.
It also looks like after they fixed the issue, they didn't replay the failed notifications. That's why I saw stale data even after they apparently fixed the issue. Then my push to the repo seemed to trigger an update, and now it's back in sync.
I'm curious to hear what really happened, but I doubt this incident is important enough to anyone to warrant a detailed blog post.
It took a community drumbeat and persistence from an enterprise customer to get the status message to even show a problem last month. [1]
It does suck and I do think there must be some political infighting going on that the service is having so many disruptions.
There’s no excuse for something this important to not only have so much unplanned downtime, but no resources to connect with the community by offering post mortems or other reasonable interactions.
That said, I’m still all in on GA. It’s amazing and the coupling with repos is great. It continues to be subtly refined. So I just hope whoever is holding this product back gets out of the way.
All I know is that it doesn't seem like a wise choice to be locked into GitHub features or even use their tools with these frequent downtime episodes.
If a large open-source organisation was to rely on say, GitHub Actions for example, well you'll probably see more and more of "GitHub down" posts and they'll be unable to push that critical patch or run that cloud CI on GitHub, and some maybe considering solutions like this [0].
Every time this happens, you'll be completely locked in and ending up contacting / complaining to the CEO of GitHub for support via Twitter.
Good deal for us, actually. I used to get upset about this, but the alternative really does suck more.
You can either deal with the occasional non-productivity from a SAAS offering (which for GH has never lasted more than ~half a work day), or you can spin up all your own stack on-prem and generate 10 additional full-time problems in the quest to solve this one periodic issue.
The trick is to never put yourself in a position where your tools absolutely must work immediately or you lose a customer. Why make a promise on delivering a piece of software until its already in hand? Also, if github goes down and I really wanted to get an issue comment in, I can just open a text editor and keep a note around in my local repo until everything is back up. I can even do some crazy things, like share my local branches with other developers over side channels until things get back to normal in the centralized system.
Wasn't this kinda the entire point of talking developers into moving to the git model? Would be fun to rewind the clock and use these takes as an argument for sticking with TFS, et. al.
I agree with you. To me it just sounds like people being entitled, expecting a service to never go down, ever. Shit happens and things go down. Design your processes around that fact and have procedures lined up for when it does happen.
However, saying "just get GitLab and deploy it to your own server" glosses over the huge time sink it is, especially for small companies that are already short-staffed, to maintain something like that. I sure as heck do not want to be responsible for keeping my GitLab server up.
As you say, if you're writing an issue, put it in an editor issues.md file or something. If you're working on code, even better, just commit locally.
Hosting Gitlab hasn’t been a huge time sink for me and I’m a one-person show who is conscious about my time. I set up automatic updates and haven’t SSH’d into that machine in about a year.
I agree that in many scenarios you will find that your approach is perfectly valid.
For us, we have ~8 people that need to use the system all at the same time. We utilize issues very heavily (we are entering 5 figures), with lots of data-heavy QA content throughout (screenshots/videos/binaries/etc). Additionally, our customer environments are actually configured to talk directly to our GitHub repository for purposes of rebuilding themselves from source at update time.
Because of the number of participants who are involved with our particular usage of GitHub, we find that a hosted solution with horizontal scalability and resilience to be an excellent fit. We have made the decision to make it Microsoft's problem to figure out how to eventually deal with 10k+ issues and 200+ employees/clients trying to hit the same host all at the same time.
If we had decided to host our own GitHub/Lab server in our cloud environment, we would be having to constantly review the capacity of the IT systems. As we add employees and customers, the load we put on our source control solution will increase linearly. Additionally, because of the deploy-time approach, having a solution that is backed by someone else's network means that we don't have to worry about our private network being slammed by outside requests. Our total checkout is nearing a gigabyte, so you can see how this might scale poorly if we operated out of our own infrastructure.
I almost feel like we are abusive of Microsoft's generosity considering the sheer amount of content we have throughout our organization's account. Every day I wonder when I am going to get some email demanding that we switch to a more expensive enterprise plan because of how we use the service. Maybe that day will never come. Even if it does, I will gladly shell out for the bigger contract.
I suppose it is different for you because you are all in private repositories but... have you seen github repos for very popular open source projects? Angular (the framework) alone has nearly 20k issues, the cli repo another 10k. React another 10k, golang 41k (35k closed, 6k open!)
I have to imagine that you're not exactly a small fish, but also not making them sweat too much either.
There's not much special about github actions. All the data needed to repo the actions without github is there. I'd be surprised if someone hasn't already made a "run your github actions outside github" system somewhere.
If you're working on a serious project, hosting it mainly on GitHub via Git and don't already have a backup solution in place, I'm afraid you're late. But better late than never! Make sure you can always deploy when less reliable services are down, and GitHub has always been one of those. Git makes it incredibly easy as well, as long as you have your CI/CD externalized already.
I think if revenue or product quality is tied to a VCS, having an active-active or active-passive setup is the way to go.
Fortunately, I'm on an on-prem product so that investment hasn't seemed worth it yet.
This doesn't mean we don't escrow our code, but rather than try to rebuild from source, I just take a short coffee break and wait for the impacted service to come back up :)
I'm consulting for a company that uses Azure DevOps and I cannot believe how much harder it is to use than Github Enterprise for getting things done. The documentation is also strictly worse, and localization is just not there at all.
-edit-
I assume someone who works on Azure DevOps might be looking, so a few small specific things so you don't think I'm just a hater.
- It is hard to use markdown consistently, and it is particularly painful when doing any project management work on DevOps (which I assume is one of its strong points)
- The lack of a Japanese menu really sucks for something that is supposedly aimed at enterprises. Having to explain both native and English language vocabulary terms is double plus ungood (again, in an enterprise product put out by a company that has usually done excellent business in Japan)
- It's baffling that I can't make changes to the template when you give me a template option. I assume it is a permissions issue, but really?
I personally know multiple senior ex-githubbers who quit over the ICE contracts; it seems plausible to me that they’ve lost key expertise and can’t safely integrate with their older systems.
Based off these reports most of their recent issues have been them hitting scale limits for their MySQL configurations and not having sufficient monitoring.
I recently left a project using AWS for one using Azure. I thought the AWS API's were inconsistent and janky but they look great compared to Azure. Azure is also extremely slow to perform actions in my experience and the documentation is very heavily tilted towards being sales funnels. I do like the keyvault service and the idea of resource groups. The whole tenant / subscription / roles / user mess of permissions not so much, but I expected that from Microsoft.
The AWS API is just fragmented. Too many teams that probably didn't communicate very closely. That being said, you can sort of follow the logic - or the multiple logics.
But that's really the only negative.
I'm trying to find a single term to describe all of Azure, and I'm having difficulty with it. Sophomoric? Like a place where the leaders are a bunch of B-class players, who lead all sorts of C-, D- and all the way down to Z-class characters.
And hopefully you'll never need support with Azure, especially urgent support, because it's atrocious.
What specific issues are you having? What do you mean by "B-class players, who lead all sorts of C-, D- and all the way down to Z-class characters"? I assume in this ranking you are an A+ player yourself, which is very impressive.
This same pattern happened last week - Webhooks reporting issues, then Pull Requests, etc. This one seems to be affecting the rest of their systems a bit more (or maybe they're just being a bit more detailed in exactly what is down this time).
I believe this incident has been happening for more than 30 mins. I've had problems with gradle pipelines failing without reason for at least the last 6 hours
A few years ago GH actually had a fairly transparent status page that displayed error rates across components over narrow timeslices. I guess they canned it not too long after their $100M a16z raise.
As others have noted, GitHub Search is not particularly reliable. It might make you think you're finding all uses of some token, but that can be misleading as not all instances are necessarily shown:
I'm sorry code search isn't working. I'm more than happy to help. If you reach out to me via my email I can see whats up. neovintage [at] github [dot] com.
GitHub search is not very smart, but I prefer that when searching code. When searching code, I'm usually trying to find exact tokens i.e. a variable name or an error message string vs. searching documents.
That requires you to clone. It's a minor hassle if you search over a single repo, but when searching across an organization or the whole site, using grep is not an option.
> That requires you to clone. It's a minor hassle if you search over a single repo, but when searching across an organization or the whole site, using grep is not an option.
Are you kidding? The finer granularity that searching over an organization or whole site makes grep the far better choice especially since its output can be fed as input to more filtering steps.
I can't understand your point. GitHub can search over finer granularities e.g. single repo. And did you understand my use case at all, about grep not being an option when searching at wider scales?
Git was designed for decentralized use. It's not as slick and polished as github, but it can get the job done and scales to infinity (the linux kernel has thousands of contributors and runs entirely dencentralized with git).
For write access it's more difficult without running your own server (which is super easy with gitea, gogs, etc. and just a couple clicks to setup on popular hosts like digital ocean). You could take an entirely decentralized approach and run things like the linux kernel--all patches (aka pull requests) get sent to an email list where they're reviewed, discussed, and integrated by the maintainer of the read-only repo.
Play stupid games with centralized entities, win stupid centralized prizes. I can understand using github/lab/etc if you're forced to by work in order to earn money in order to live. But willingly choosing them for personal projects is just stupid.
So was, "Facebook is a bad idea." in 2008. But the course of a proprietary social network run by a single corporation only has one outcome. It's just a matter of time.
Receiving crumbs of info as to WHY a service is down is interesting to some. These are also great for tracking postmortems (unless they take awhile, then it's a separate post).
Speculation is entertaining to many as well. Or perhaps this sparks an idea for someone (omg, GH is down ALL THE TIME, time to build a novel competitor!).
And given the crap state status pages are in these days (stop showing green when your site is down!), these are great for knowing when a service is operating again.
I thought this was obvious, but apparently it isn't - not 100% of things posted on this forum (or any forum, or the vast majority of human interactions) are driven purely by some abstract "intellectual curiosity" concept.
Probably something about centralization and relying on a company for keeping your business running. But, of course GitHub Enterprise Server deployments aren't down, so anyone that really needs uptime can pay for the privilege (or can pay for GH One which has a "30-minute SLA").
On a related note, I have been unable to transfer a repo for two weeks now. The sender is able to initiate it but no notification ever appears on my end. If anyone knows how to get the attention of someone at GitHub please let me know. I wish GitHub had a paid support option that was reasonably priced for a single inquiry.
The main web backend is still ruby it seems: https://github.blog/2020-08-25-upgrading-github-to-ruby-2-7/ I can't remember where it was exactly but I remember recently stumbling on a random github-related OSS project that had chatter in the issues from githubbers talking about .NET and C#. It would not surprise me to see significant pressure from the now 2-year post-acquisition engineering org to get in line with the rest of MS's online services that are all .NET stack.
Microsoft does not have a consistent internal stack nor is anyone "encouraged" to use .NET. I don't know where this idea that Microsoft's online services all use .NET, for example, my team which delivers a large service as part of a top-level Azure offering is mostly all Go.
> I don't know where this idea that Microsoft's online services all use .NET
My only knowledge of this happening is when Microsoft took over Hotmail and replaced FreeBSD with Windows. I assumed like the other comments that it would be political suicide to continue to support a non-Microsoft stack
Ehh, I worked there on online services before Azure even existed and it would have been career suicide and an enormous legal fight to use anything outside MS's stack--IIS, MSSQL, COM & C++, then .NET. Hotmail got away with it for some years because they were an acquisition.
Why pick up a new gun you’re not used to, when you have a perfectly working footgun you know inside and out—even if it only allows you to shoot yourself in the foot?
In other news, folks who are self-hosting GitLab are in great shape. And once the GitHub issues are sorted we can make sure to push to those public end-points of convenience.
Self hosting critical services (email, chat, git, etc) is not a terrible idea. Of course, CBA/risk factor for your team.
So when your self hosted instance goes down and you are working on it, do your coworkers post message saying “in other news, GitHub.com is in great shape”?
Looks like some of their systems got out of sync and they aren't done resynchronising yet.