Can you imagine if Twitter and Google went down at the same time?
People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.
Edit: The points on this post keep going up and down every time I check these comments. Yes, it was sarcasm, I was joking, but I was trying to point out that most people rely on a small set of services. "Cloud" has centralized things a lot.
Whenever I hear when some service is down, I immediately go to that service to confirm. Then I repeatedly hit reload if it doesn't work to see if it can come up. I guess many people do the same and that may contribute to the problem...
Years ago I worked at a large online casual gaming company who's name ended in -ynga. Our web tier was split into two: one for serving static content required to load the HTML, Flash app, assets, etc. The other was for actual communication regarding actions taken in game.
Whenever we had any sort of issue we could generally get a good idea of what was happening by looking at changes in traffic in those two web tiers.
If people couldn't play for most reasons, game action traffic would drop to near zero, but the static asset tier traffic would usually at least triple.
So yeah, there are a lot of F5 buttons being hit out there when pages don't load.
I don't believe Gmail was ever fully down. For me, I was just having problems with attachments. I also noticed app icons in the play store failing to load.
I experienced issues with Drive last night but it was never fully down. I was trying to work on a 500GB file and the API would drop the link intermittently. I could parse the directory no problem, just couldn't reliably access the files.
Back in my tech support days I received an email from a customer "I am unable to send or receive emails" I replied "I am very sorry for your inconvenience, I have resolved the issue".
Customers in 1999 really couldn't believe no one had replied to their emails within a day or two.
When it all goes down at the same time you should be worried. Not because of the lack of Twitter or FB or Gmail bit for what it means if it's all down.
it really goes to show how different all of our feeds are based on who we are friends with (and of those, who we interact with the most). Anecdotally, I have 3 people who will share any "conservative" attack meme (at this point, I don't even know if it counts as conservative so much as just outright attacking Democrats. Sometimes it reads more like an attack for the sake of attacking than a statement of belief in something different. Kind of weird. Part of me wonders if maybe some of these accounts do conservative attacks and some do liberal attacks in an attempt to get shares and what not with no political interest whatsoever: effectively acting as an arms dealer of the meme variety.) they can get their hands on and of my friends of a more liberal view, it is mostly policy things they share (pro-choice, anti-rape, etc.). There's one dude that is pretty anti-Trump, but his for the most part stand out as an exception. Most of the ones I see referencing Trump directly (if it isn't during a period where he did something that Democrats felt was highly suspect) are more in support of him than anything.
Which is why I have to refrain from taking an "over the top much?" slant when people post the pro-Trump/victim of the left type memes as I don't see a ton of attacks on him directly, but then again, their feed could be totally different from mine so who knows?
I've got a few people in my feed who just dump on the right no matter what. I've fact checked them a little bit. Again it's about a 2-1 for me fact checking my right leaning and left leaning friends (typically older). But I keep all of them on there just to have an ear to the ground.
Yeah, I appreciate the various view points and items shared in my feed from both conservative/liberal viewpoints have been wrong (almost consistently so).
I think the real bummer is when you present the actual video of what the person says, and the response is essentially “yeah? Well, they still suck” or something in that ballpark. I have zero issue with someone not liking another person’s views, but a lot of it is just outright libel.
I was out shoveling, and came back in to my phone blowing up. Our systems at IronMountain (formerly Fortrust) in Denver all rebooted at once. These are all on redundant power, each systems redundant power supplies connecting to different circuits entering the cabinet, and those two circuits fed from 3 PDUs (two separate, one share). Each of those is supposed to be fed by a separate UPS and generator. Last status update I had says that they are running off generators, but they've been shockingly tight-lipped about it.
Don't get me wrong, it was hi-LAR-ious to call into their NOC and have them pretend that I was the only one having problems. "Can you tell me if there is a major data center outage going on?" "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls." "... Why are you making a bunch of client calls if you aren't having an outage?"
They do run quarterly 'storms' where a datacenter is shut down to test failover and resiliency. I have no idea if today is one of those days, since I left last year.
Theoretically a real shutdown might go in a different way than previous tests or simulations. For instance in a test you might cut the connection completely, while in the real case only some power circuits go down or whatever.
For instance GitHub's relatively recent shutdown was due to a fail-over heartbeat not going as expected.
Test failures are all well and good, but don't always match reality. In this case, the design of the power infrastructure was solid, and their plans include running monthly generator testing and quarterly "disconnect from the grid" testing. But apparently something about this failure of both of the incoming power lines caused failures in multiple UPSes. Still waiting on the after-action review.
it's only so quick because stuff isn't actually turned off with disks wiped. The machines are still running, with applications loaded, just with no traffic directed towards them.
Last time I dealt with a cloud provider outage the status page was unresponsive during the outage because the status page had some kind of dependency on the resources that were down...
Sure, I understand the "so let us do our job". I've been on the other side of that.
On the other hand, I need information to be able to do my job: Is this only our cabinet having problems and I need to start rolling to the datacenter (in the middle of a giant blizzard)? Is this possibly some sort of problem with our own power infrastructure? Is something on fire (an EPO triggered by fire could cause this)? Did the roof cave in under the weight of the snow we are getting? Is the power stabilized or is there some indication that power might be up and down?
In short, I need answers to: Do I need to gracefully take down my site to prevent lost transactions and database corruption? Do I need to switch to our backup site?
For context: All of our servers powering off at once and then back on shouldn't be possible. It should require the failure of at least 3 independent pieces of equipment (except at the breaker panel or in our cabinet where it could be only two failures). It is extremely unusual for this to happen, first time it's happened for me and I've been in that facility since 2004.
So, yes, I respect that you need to do your job. But I also need to do my job.
Plus, I'm pretty sure the guy answering the trouble line, his job WAS talking with the customers. The people working the problem likely didn't include him. This is a huge data center run by a ginormous company. I don't think I was taking him away from twisting a wrench. :-)
I wouldn't be surprised if they think a status page would open up liability for not putting it up soon enough, or for too long, or for some text that turned out to be wrong or unnecessary.
"The storm"? It's sunny in the Bay Area for the first time in I don't know how long. I imagine it's nice in other parts of the world as well, other than where this localized "storm" is.
Yes. Believe it or not, it's not really major news for those not impacted. We have the local evening news on most nights in the background, haven't heard a thing about it. I also regularly read NYT and WaPo and follow the Internets.
It's sad that such outlets don't bother to print about stuff that impacts middle America. They could regain some credibility with a lot of said middle Americans without ever changing their political alignment just by giving them a little more coverage.
Probably a combination of that and to curtail the "I just spoke to Brad in Customer Service who confirmed _the whole datacenter if offline_" type posts.
But that's my presumption, I don't actually know anything and don't want to imply I do.
It's easy to be cynical but it's optimistic expectation management.
It might be resolved, it has to get worse before you escalate it further. They might not know the full facts. Might be worse than it really is. How do you know? You can't judge that because your personal rendering of Facebook failed. You have load balancers and CDNs and A/B testers all getting in the way of delivering data to your machine.
It's too easy to draw a conclusion from the client-side armchair and the provider is absolutely not going to make false promises, for the worse or for the better.
You want to hope that Facebook, in this case, acts on more complete information.
That's the trust issue with current agreements we are solving.
If an API is down the bound agreement is enforced instantly with our platform, no lies, no call, no pain.
We are actually onboarding companies to try it out!
https://stacktical.com
TLDR: Because Smart Contracts on the blockchain are the right tool for Secure Digital Agreements.
Paperweight contracts are irrelevant in a world of data
* A Smart Contract is cheaper to publish that the stack of paper handled by lawyers.
* Code is cheap to iterate from whereas traditional SLA are expensive/slow to renegociate.
Over time, SLAs drive behaviors that are focused on delivering a minimum level of service at minimum cost to the provider.
* A Smart Contract is a code you can trust, understand and expect to behave instantly compared to the traditional SLM.
so, are you saying we should replace social media platforms w/ decentralized sharing & aggregation driven by smart contracts? sounds intriguing but daunting
unscheduled outages are always painful and people will always call, I agree.
But instant compensation is doing a better job at damage control that a status page.
Keeping customer satisfaction even in bad situation is key in a world of high availability expectations.
And with a distributed, non partisan metric sourcing about the availability of an API, it's not possible for a Service Provider to lie anymore.
I don’t understand something: what kind of company is so down to the wire with cash flow that an outage requires income within seconds/minutes instead of weeks? Anyone with a financial runway so short that it can be described as “instantaneous” doesn’t sound like a customer you would want to be in business with.
The kind that will make a lot of noise as publicly as possible and create ample work for your support/admin people if you don't keep them happy...
> doesn’t sound like a customer you would want to be in business with
I could say that about most of the companies I have had the dubious pleasure of doing business with! Very few are pleasant when something goes awry even for a moment.
A company as large and sophisticated as FB has data centers and cloud services in multiple countries, and in the US, probably colocated data centers. Certainly nothing localized to where you are.
If the outage is at all infrastructure-related, the root cause was something that at some point was local and cascaded. Unless someone git pushed to a repo used by both companies and it's taken all day to get it git revert'd, their redundancy obviously didn't work, did it? There's effectively a category-2 hurricane moving from the Rockies through the mid-west right now.
Reminds me when I was contacting Deutsche Telekom last year regarding an outage in Monschau area. "We have no problems". In fact, the whole exchange was down, press got a sniff of it when people could not contact emergency services anymore: https://www.aachener-nachrichten.de/lokales/eifel/netzstoeru...
Are you saying that a cold-war-era system like the internet/arpanet meant to survive a nuclear war might be vulnerable to an attack if we take all the code and data and store it in the same place?
Once upon a time Multics' developers predicted that someday computing power would be treated like a public utility that homes and businesses would buy like electricity or water or natural gas.
Given the proliferation of minute/hourly billing among service providers, it looks like the Multics folks guessed right. It just happened on top of Unix(-like systems) instead of Multics.
I wonder how long it'll be before we start seeing municipal datacenters?
C++ is the most popular language at Facebook, I know that one for sure. They used to run PHP on HipHop VM which was written in C++, but now they transpile PHP to C++.
The transpiler is HipHop. They discontinued that in favor of HHVM, which does JIT compilation instead. More info: https://hhvm.com/
EDIT: apparently, though, HHVM stopped supporting PHP itself last month; now it only supports Hack. I'm not familiar enough with Hack to know how much it actually deviates from / improves upon PHP.
Facebook has to hire thousands of engineers per year. They may incorporate more Erlang into Facebook, but they have to have a core tech stack that can easily onboard engineers from a variety of backgrounds. I don't have the foggiest idea of whether Erlang can be part of that or not, but people talk about it as if it's a special-purpose tool.
When I interviewed for SRE at Google, they'd had a non-trivial cross product outage days before. Good conversation starter, but I couldn't get many details out of them.
I’ve seen many systems go down over the last few days worldwide. Aside from the possibility of a mega-DDoS attack (which Facebook denies), all of these organizations have fairly diverse tech stacks to my knowledge. Google’s issue (supposedly) had to do with their Blobstore API, we don’t know what happened with Facebook, and many other, smaller services have had issues as well, including three intranet services at my workplace.
This leaves me wondering what software all these places have in common. The application layers are all different, the databases are all different, the containerization and provisioning systems are different, but I imagine that all these systems rely on two things: the global Internet backbone, and maybe the Linux kernel.
Have there been major security vulnerabilities patched lately in the Linux kernel that could have had unintended consequences?
Both companies are massive and have tons of developers. It becomes almost impossible to look at the system as a whole with the amount of changes coming through. And, you get scenarios where small failures cascade through the stack reaking havok. Often times its just one config change
Its telling that one of the hottest areas of distributed systems research these days is the boring topic of configuration management. Google, Microsoft, etc are paying researchers top dollar to figure out how to prevent massive outages through novel techniques. It is one of the harder problems to solve and requires massive investment in tooling, refactoring, etc.
You’re undeniably right about not looking at Facebook or Google as one whole system, but there have also been what seems like an unprecedented number of strange little outages (see the ones mentioned by https://news.ycombinator.com/item?id=19382418) that aren’t huge companies. My workplace had some of their own today that I haven’t heard an incident report about (it’s a pretty large company and I’m not in IT).
The best explanation is coincidence, I think. I have direct knowledge of two of the incidents in the past few weeks, and they have completely unrelated causes.
That’s certainly possible. We’re probably still too early to tell, but the innate conspiracy theorist slash pattern-matching part of my brain wants to find a probable connection.
FANGs all use white box hardware with “merchant silicon” meaning they buy the chipsets directly from Broadcom, Mellanox, etc. and build their own devices. However, they do all have Broadcom and Mellanox in common and Cisco, juniper, and arista do too.
I run a messenger bot platform - the webhooks stopped being delivered _hours_ ago... nothing on their status page until it had been down for hours.
Their current issue...
"We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution."
I'm pretty sure businesses use status pages to divert attention from support resources, they never seem to give useful information about outages and half the time don't even mention the outage.
But for someone who runs a business that relies on Instagram for marketing, and pays for advertising on that platform, it's a bit scary when the whole thing is down. Obviously this was only temporary this morning, and sure, no charges while down, but doesn't do me much good...
It looks like something much larger is going on. If you look at the front page of https://downdetector.com/ you'll see most major sites/backbones are having issues (Verizon/ATT/Sprint/CenturyLink/TMobile/Comcast/Level3/etc).
I strongly suspect users are reporting "my Internet is having troubles" because their FB, Messenger, etc. isn't working right.
For example, in the comments of the T-Mobile outage page, there's stuff like "Haven't been able to upload anything to social media all day" and "Cannot send pictures through whatsapp and fb messenger".
That doesn't make any sense, given that the "traffic" tab's scale says "7% above normal".
The red are the areas with the most attacks, and as you'd expect, they correspond to large population centers. (It's also not very granular, and appears to largely correspond to "where does Akamai have a datacenter".)
So yesterday Google had a major (and out of character) outage across its apps, and today Facebook has a major (and also out of character) outage across its apps.
I can't wait to see the RCA for both of these and if they're related.
Private post Morten:
The NSA middleware we are required to run (that took time to deploy to each of our social partners) is breaking something so let’s revert.
That's why it's called PRISM. It's exactly what you describe. Splitting an optical signal into 2 using, basically, a prism. One signal goes out to the net as normal, the other goes to their own datacenters, that they keep continually building and expanding. The newer ones are being build on military bases, for added security. Check em out. Look at the size and cost of them. Some are over a million sq. ft. That's a lot of data. They measure it in terms of zottabytes and zettabytes (in 2013, a lifetime ago in terms of storage space):
Nah, PRISM referred to the front door for lawful access to customer records under warrant. That's the sort of portal that China once hacked Gmail by gaining access to.. the companies explicitly built those access relays.
The beam splitter stuff (e.g. Room 641A) went by different codenames, TRAFFICTHIEF and TURMOIL iirc. That's the back door.
> The newer ones are being build on military bases, for added security.
IIRC, the NSA is organizationally part of the military, and it's currently headed by a military officer who gives congressional testimony in his uniform (https://www.youtube.com/watch?v=nMi241XLeQ8). It makes sense they'd build on military bases, it'd be kinda weird if they didn't.
Also, the majority of tech and data companies have closed this loophole by encrypting traffic between data centers. Nobody thought it was necessary to do it before over dark fiber before because, hey, who was listening? (answer: the NSA was)
One of my coworkers came from a large telecom. He mentioned they had to get technology from an Israeli firm that specializes in quantum cryptography on the fiber optic line to fend of NSA and GCHQ, who are apparently worse than NSA. (iirc) the tech apparently encrypts data streams on one side and check to see if the hash is the same on the other side, if somethings off (evidence of tamper) it instantly changes the cipher.
And they have been using the USS Jimmy Carter sub with the front huge cable splice bay for decades to compromise all undersea cables.
>>>The New York Times reported in 2005 that the USS Jimmy Carter, a highly advanced submarine that was the only one of its class built, had a capability to tap undersea cables. An Institute of Electrical and Electronics Engineers report speculated that a 45-foot extension added to the Jimmy Carter provided this capability by allowing engineers to bring the cable up into a floodable chamber to install a tap. But it is unlikely that the USS Jimmy Carter routinely taps cables since U.S. intelligence agencies can much more easily (and lawfully) obtain cable data through taps at above-ground cable landing stations.
Optical tap for unauthorized access, but surely port mirrors for the national-security-letter stuff. Wouldn't make sense to go through the hassle of a tap install and the ongoing risk of it failing, versus using a capability available on almost all serious switching hardware to give you a guaranteed 1:1.
interesting that facebooks cavalrylogger is still being sucessfully injected despite their being nothing but a blank page
also interesting that cavalrylogger has a function that lets you bind key-presses to callbacks
even more interesting is that cavalrylogger seems to come prepackaged with any facebook like button! cheers for the keylogger facebook
I don't think it's the NSA this time, for once they don't have to do deep package analysis or install any MITM device since they get the whole info in bulk, maybe it's just a 400-pound hacker.
The heavy traffic is due to sports events - champions league last-16 matchday live streaming: Bayern Munich vs. Liverpool, FC Barcelona vs Olympique Lyon. The heatmap matches the clubs' home countries UK, Germany, France & Spain quite well.
Multicast was designed exactly for this - same data streamed to many endpoints at the same time. Too bad it's not being more widely used, the bandwidth savings would likely be huge.
That chart doesn’t support your assertion. Akamai’s traffic and attack charts usually look like that, and the attack chart even says it’s currently low.
"This usually means we're making an improvement to the database your account is stored on. While this process won't affect your account, you temporarily won't be able to access the site." https://www.facebook.com/help/134401680031995
I guess that this is all that I will get. Facebook is never down, it is just making improvements (like restarting the services to make them work again).
They could be consolidating all of the DB infrastructure for their platforms.
A zero down time dial-up would not be possible as they would need to nearly double their DB infrastructure.
Short planned temporary outages of various features probably become long unplanned cross-platform outages.
They probably decided to not rollback the migration after the first outage.
What manner of failure would cause such globally deployed and distributed systems to go down like this? I'm very interested to read up on this when they release details of the failure.
I work for a smaller but comparably large platform. "If everything is down check the DB" is at the top of one of our internal monitoring websites in red.
Screw ups related to data loss are rare (I've been here years and haven't seen one with the DBs that the stuff I work with uses) but failures at this scale tend to cascade a little ways and it takes time to dig out of the hole. They probably have the problem solved but they have to spend a bunch of time synchronizing things and verifying the fix before they press the big red "go live" button.
We have a different dedicated page that gives an overviews of what's going on with the DB. The page in question is supposed to be a single stop that lets you visually get an overview of the state of the application servers and whether things are "normal" and if not allow you to quickly identify what is not normal.
I have no inside knowledge of this one, but broadly speaking, these sorts of failures can be caused by a change thought innocent at the time to the core software that is then widely deployed using automated systems. If the core's tests didn't catch a real issue in production (and for whatever reason, the rollout happens faster than the regular small-release verification process can catch the error), things can go sour in a way that's expensive to un-sour.
Amazon once pushed a seemingly-innocuous change to their internal DNS that caused all the routers between and within datacenters to drop their IP tables on the floor. They had to re-establish the entire network by hand---datacenter heads calling each other up and reading IP address ranges over the phone to be hand-entered into lookup tables. Cost a fortune in lost sales for the time the whole site was inaccessible.
As someone who works at a large company in the networking space, you would be surprised that minor changes to configuration can cause catastrophic failures that are really challenging to come back from
Network failures are usually really bad when your system is globally deployed and distributed -- often times you can't even communicate with your machines to deliver fixes :p
Increased Error Rates Created by Gary Fitzpatrick · · Facebook Team — Today at 10:32 AM
Current State: Investigating
Description: We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution.
Start Time:
2 hours ago
Last Update:
about an hour ago
Updates:
There are currently no updates for this issue.
Serious question: Was any value lost? (this may appear sarcastic)
Facebook obviously loses some ad revenue and Facebook customers may lose sales. But do Facebook/Instagram users suffer? But how does losing social media for several hours affect the quality of life of users?
I am not a big fan of social media too but you will be suprised ... For example here in Sudan (East Africa) the country has been under continuous protests for over 2 months now (53 dead, 4k+ detained, 500+ injured) with strong censorship from the regime & silents from the internatinal community. So facebook, whatsapp & twitter are the only media left for the people to fight for freedom —> every Thursday is the main protests in the week and this Wedensday night the outage might affect this as thousands around Sudan won't know about the meeting points of tomorrow!!!
Actually the government did block all social media for over a month but that was fixable with vpn. (Follow hashtag #SudanUprising on twitter to learn more)
I doubt Sudanese protestors are watching football and eating big macs on their off-time. Most likely limiting their protests to once a week 1. makes for a single, effective push, 2. keeps the protestors and their families from starving.
Lmao "no disrespect intended" "scheduling it between watching the football and eating big macs." pick one and keep in mind you're talking about Sudan and not the US...
As I said: "Main protests on Thursday" so no, Sudanese ppl don't protest once a week & btw People get shot here just for standing out and peacfully protesting so its far a way from the picture you have in mind. I've put a hashtag where you can see photos/videos & learn more and ofcourse share productivity tips
What I asked was what is the effect of sporadic interruptions of few hours. I mean, if Facebook had 30% availability, would I lose anything valuable from the experience? Is it that we are just used to it and and want it to be there always?
The value of 99.5 availability fore __users__ is not clear to me. Instant messaging is exception for this.
I know parents who keep in touch with their children via Messenger. In part because it works in more places: Messenger works wherever there's internet over wifi not just cell service. People rely on Facebook for non-trivial reasons whether or not I (or someone else) think it's a good idea or not.
It might seem pithy, but my wife has a small internet based business and uses facebook as a login for one of the sites she sells on. So, today instead of being able to autofill labels directly for shipping, she had to hand type addresses in for all shipping labels for products sold on that site.
I reached out to an old acquaintance that could be a great help to my company. I reached out over Facebook. Now that contact can not respond and may have not even seen my message. I have no other way of contacting this person. This affects my business.
I hate Facebook, but to deny its value is pretty naive.
FB has effectively replaced all other text messaging for several of my social circles. It's nice when you have groups that kinda change over time, otherwise group-texts always end up with numbers you don't necessarily have in your phone, etc.
I consider my relationship with Messenger separate from FB. Most of my conversations happen there. I've deleted FB from my phone, but I don't think I could ever go without Messenger.
I used to be like that. One day I just sent the same message to all people I still contacted on messenger saying that I was getting rid of it in one week and listing 3 alternate services people could use to contact me. Didn't lose a single contact and never looked back.
For hours, not especially - it's annoying but no worse than a power cut. There could even be benefits.
On the other hand, if someone were to sabotage the platform and prove/convincingly argue that they induced the failure, at minimum it would do significant damage to the tech sector and at maximum cause public panic.
This is a hypothetical, not speculation on the cause of this outage.
Obviously this could be argued differently from a shareholder perspective, but I would say otherwise no. Interestingly, this might be one of the only times where a large outage could be claimed to be adding value. Again, not for FB, but for users, sure.
After a few hours of not being able to use an app people might start realizing how addicted they are to it. "I was bored initially but then realized talking to people in real life still works."
I've also seen issues uploading images to Whatsapp in the past half hour. I wonder if there's anything to do with the Google Cloud Storage outage that took down Gmail yesterday?
The only things that I can think of that would cause this scale of being down is either a T1 center outage or (conspiracy hat on) a major hack and everyone is rush patching
Would be interesting to read the post mortem if there is any regardless
If rush patching were going on, we’d likely see some hints in commit messages of open source projects, like the Linux kernel commits that were tipoffs to Meltdown and Spectre.
Edit: Has anyone seen anything of this sort in any of the projects they follow?
No but everyone is 99% sure it was related to killing Google+ which was announced not too long before and everyone who has used YouTube way back when knows they had to make a Google+ identity at one point to link em. HMMMMM....
^^^ I noticed this too. GCP is under all 50 shades of outages since past few days. Feel I might need to rush back to my house and start digging a bunker
The Bay Area Peninsula has been having strong winds and heavy rains for the past few months. The last 3 days, there have been major power outages across the area. Redwood City had power outages 2 days in a row and Pacifica lost power to a good chunk of the city for like 7 hours last night. It wouldn't surprise me if all these major tech outages that have been happening this week is all related to poor Bay Area infrastructure.
Instagram seems to load the feed here fine (EU), but doesn't allow you to log in from any device or post anything new. FB is totally fine if you are logged in for reading, but also can't log in if logged out.
barely - BGP is that complex I'm afraid, but its the wild wild west when it comes to potential nationstate attacks on internet traffic. I wonder if these patterns are typical or worrysome
Coincidentally, just watched The Social Network, the plot of which includes that quote by Mark:
> Let me tell you difference between Facebook and everybody else. We don't crash ever! If the serves are down for even a day, our entire reputation is irreversibly destroyed. <…>
> Even a few people leaving would reverberate through the entire use base. The users are interconnected. That is the whole point. College kids are online because their friends are online, and if one domino goes, the other dominos go.
Can’t argue with that, however the fact that Facebook used to go down before would not preclude that each time was (and maybe still is) seen as a major incident within the original corporate culture.
Doesn’t their world class team make such a long outage to be quite unlikely? How hard would it be to devote ample resources to a cover story for the “incident report”? Is the timing relative to the plethora of indictments relevant at all? Reasonable that this may be related to shredding of data and/or code, or even a cooperation to turn over data to government in secret deal?
Why? When Facebook is down, you cannot use Facebook to communicate about the issue. That is also reason why FB still uses* irc instead of messenger for coordinating how to resolve such issues.
* Or at least used 1 year ago, when I was working there.
Hmm, I wonder if we could leverage that to make irc more popular again. "FB uses X" is often all the justification small startups need for picking a tech.
Especially if they can't sign in via OAuth. To an average user who signs into Spotify with their Facebook account, "I can't sign into Spotify" means Spotify is down, not Facebook.
> The team at Jefferies remains reasonably positive, and in the firm's top growth stock calls for the week we found four tech stocks that are offering more aggressive accounts good entry points. Carl Court / Getty Images
I got my github two factor auth SMS two hours late. Fortunately it was just my old laptop. I wonder if it was related. Good reminder to set up an authenticator app on my new phone so I don’t have to rely on SMS!
SMS is still an option if you have a working 2FA Authenticator on GitHub. And even if I went through the trouble to disable it, I disagree. There are conceivable ways people could get to my email to initiate a password reset without getting to my phone, such as snatching my laptop from me while I'm working at a coffee shop.
Whatever happens right now at Facebook is less important than the fact they will never say what affected them. Of course nobody would tell 'hey, outage right now due to 0day / mistake' but...
We're up to around 7 hours of partial outage now. I have spoken to FB employees in the past, every hour they are even partially down they are loosing millions and millions in revenue.
First thought when I heard the news was BGP hijacking (ignoring whether accidental or deliberate). Doesn't the symptom fit other known cases like the Telegram incident in Iran last year, just at a larger scale?
Admittedly networking is not my strength, so perfectly happy for someone to shoot down this hypothesis.
I haven’t been able to post anything on Facebook, neither a new post to my wall nor add a comment on a friend’s post, since mid-morning US/Eastern and this is still the case. In addition I can’t login to the site - I am able to access the site only where I’m already logged-in.
You're already logged in so it's just going to show you old content. I'd be surprised if your able to post anything, or if any "like" you give during the outage is saved. Same thing is happening with both Instagram and FB apps on Android for me.
This is the first time I experience this. Also note that current session on messenger.com still work, we can still send/receive message, but can't upload any image or send sticker. Looking forward for a post mortem analysis on this.
My hunch is that it's the end of Q1 and people are trying to release code changes so they can pad their Q1 performance reviews "designed and delivered feature X on time in Q1".
I've seen all quarter ends being targets for releases. And things that have been delayed since start of the end of the year are usually pushed to the end of Q1.
Perhaps relevant that npm has been having issues although they only recently caught and fixed them. Scoped Private npm packages were getting cloudflare 503 errors
Since it went down for PC and not mobile, I was concerned if it was just an idea of audience testing, in the process of moving to an app-only platform.
Meanwhile in Russia they are talking about disconnecting their network from the rest of the world.
Some test gone south?... Maybe...
Someone has a traceroute handy?
My fiance's uncle sent something today that because of a school shooting in Brazil, they were blocking all images and video shared to social networks like "WhatsApp, Instagram, Facebook and other social networks". I haven't been able to verify this myself or from any other sources, but I wonder if either people are misinterpreting the FB outage or if Brazil is blocking content it's having weird ripple effects.
People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.
Edit: The points on this post keep going up and down every time I check these comments. Yes, it was sarcasm, I was joking, but I was trying to point out that most people rely on a small set of services. "Cloud" has centralized things a lot.