Twitter Completely Down

steve8918 · on June 21, 2012

I think this is another good example of how we as an industry are still unable to adequately assess risk properly.

I'm fairly certain that the higher-ups in Twitter weren't told "We have pretty good failover protection, but there is a small risk of catastrophic failure where everything will go completely down." Whoever was in charge of disaster recovery obviously didn't really understand the risk.

Just like the recent outages of Heroku and EC2, and just like the financial crisis of 2008 which was laughably called a "16-sigma event", it seems pretty clear that the actual assessment of risk is pretty poor. The way that Heroku failed, where invalid data in a stream caused failure, and the way that EC2 failed, where a single misconfigured device caused widespread failure, just shows that the entire area of risk management is still in its infancy. My employer went down globally for an entire day because of an electrical grid problem, and the diesel generators didn't failover properly, because of a misconfiguration.

You would think after decades that there would be a better analysis and higher-quality "best practices", but it still appears to be rather immature at this stage. Is this because the assessment of risk at a company is left to people that don't understand risk, and that there is an opportunity for "consultants" who understand this, kind of like security consultants?

frossie · on June 21, 2012

Whoever was in charge of disaster recovery obviously didn't really understand the risk.

That's not necessarily true. People don't die when twitter is down, and whatever twitter's business model actually is, I am not even sure there is a monetary penalty to them being down (unlike, say, Amazon being down which results in lost orders). They may have made the calculation that it was not cost effective engineering-wise to chase that extra 0.001% of reliability.

[Edit: Pedantry shield: Ok, ok, should have said people don't die because twitter is down. Obviously people are dying all the time, and some will indeed expire while twitter is down].

dllthomas · on June 21, 2012

> [Edit: Pedantry shield: Ok, ok, should have said people don't die because twitter is down. Obviously people are dying all the time, and some will indeed expire while twitter is down].

And here I was hoping we could just take down Twitter and live forever...

webreac · on June 22, 2012

English language is very bad for nuances.

dllthomas · on June 22, 2012

Pragmatics generally is a fascinating topic.

InfinityX0 · on June 21, 2012

They definitely lose money - their ad model is based on engagement, so if tweets aren't seen/can't be acted on, money is lost.

walru · on June 21, 2012

Adversely. Twitter being down causes a stir, which gets people talking even more. They quickly make up for the lost revenue as a result.

There is no such thing as bad publicity.

utunga · on June 22, 2012

This is going to lead to a loss of confidence amongst many - investors, advertisers, partners - who thought that Twitter was finally past its initial scaling/teething issues. It's not like there is anyone that hasn't heard of Twitter at this point in time. So yes, yes there can be bad publicity once something reaches the scale of twitter.

re_todd · on June 22, 2012

You mean once you switch away from Rails, you don't automatically scale?

kelnos · on June 22, 2012

Extra reliability and redundancy costs money. More or less than money lost while the service is down? I of course can't answer that.

Terretta · on June 21, 2012

An e-commerce site being down does not lead to exactly the amount of orders lost as is average for that time. Most people will try again later, with the possible exception of first time buyers and likely exceptin of impatient commodity buyers with alternative accounts.

dennisgorelik · on June 24, 2012

On the other hand, even if many people buy, today downtime may cause users to change their long-term behavior and switch to competitors.

andresmh · on June 22, 2012

I would argue that people can die when Twitter goes down. In Mexico citizens are using Twitter to alert each other when Drug War violence erupts: readwriteweb.com/archives/shouting_fire_in_a_crowded_hashtag_narcocensorship.php

kamjam · on June 22, 2012

Maybe they need to tackle the underlying drug problem, rather than twitter fix it's downtime problem? The world revolved before twitter, and it will continue to revolve long afterwards.

thenomad · on June 22, 2012

I don't know the internal workings of Twitter, but if it isn't significantly easier to improve its downtime than to fix Mexico's drug violence, I'd argue they've made some non-trivial engineering mistakes somewhere.

jerf · on June 21, 2012

"I think this is another good example of how we as an industry are still unable to adequately assess risk properly."

It is likely that what you mean by "properly" is impossible. At large enough scales, what you end up with is a Gaussian distribution of errors in accordance with the Central Limit Theorem... except that there's a Black Swan spike in the low-probability, high-consequence events, and you basically can't spend enough money to ever get rid of them. Ever. Even if you try, you just end up piling equipment and people and procedures which will, themselves, create the black swan when they fail.

I think you're trying to imply that if only they'd understood better, this could absolutely have been prevented. No. Some specific action would probably have been able to avert this but you simply don't have a 100% chance of calling those actions in advance, no matter how good you are.

The state space of these systems is incomprehensibly enormous and there is no feasible way in which you can get all the failures out of it, neither in theory nor in practice.

Living in terror of the absolute certainty of eventual failure is left as an exercise for the reader.

steve8918 · on June 21, 2012

I didn't imply at all that all failures can be prevented. I'm saying that most peoples' assessment of risk are usually wrong. And the occurrence of a single point failure that can take down an entire system that is deemed low-risk seems to happen an awful lot.

It not only occurs in the technology industry, but also even in things like financial risk analysis. For example, people could mitigate the risk of a bond defaulting by buying a credit default swap. However, most people failed to assess the risk of their counter-party going belly up, like AIG or Lehman. This failure in risk assessment is in large part why the financial crisis was so widespread.

febeling · on June 21, 2012

Another striking example is the failure of Fukushima I and II. Basically what happend was that they had a power failure (of the external line)! They thought that was somehow too unlikely to account for, which I really have trouble with understanding. Isn't it obvious that multiple systems can fail because of some unaccounted for event external event? One that affects both, on-site and external power? And in the Japanese case, the trigger was not even something you'd need much fantasy for, an earthquake. Japan is sitting directly on one of the biggest geo faults on earth, and they don't account for simultaneous power outage!

So, yes, many if not most people are very poor risk assessors.

Then again, this might be rather a capacity (or the lack thereof) of the organization within the risk is assessed. This is what the software engineering quip "Most problems are people problems" means, I think. In some environments it is hard to bring up the unlikely, catastrophic scenarions without being seen as overly pessimistic and somehow not enough subscribed to the success of the undertaking as a whole. So you assess risk success overly optimistic in order to further your career, not to assess risk accurately.

cstross · on June 21, 2012

Pedantic footnote: the Fukushima Daichi plant failure wasn't just a power failure: they had multiple backup diesel generators and batteries, and the diesels kicked in after the earthquake hit and the reactors tripped and they lost the grid connection. The problem was that all the diesel generators and fuel were at ground level and the sea wall wasn't high enough to keep the tsunami from flooding them a few minutes later.

If they'd had a couple of gennies on the rooftops, they ... well, they wouldn't have been fine but they'd have had a fighting chance to keep the scrammed reactors from melting down. Or if they'd had a higher sea wall (like Onagawa) they'd have been fine.

So: not one, not two, but four power systems failed in order to result in the meltdowns -- and one of them would have worked if they had been located slightly differently.

(Otherwise, your point about people being poor risk assessors is spot-on. And worse: even if some people are acutely conscious of risk, once decision-making responsibility devolves to a committee, the risk-aware folks may be overruled by those who Just Don't See The Problem.)

febeling · on June 21, 2012

Yes, I know the full scenario was a bit more involved.

My main beef with their failure handling is actually this: you need to be able to face a situation where _all_ you smart emergency systems fail. In the case of an NPP this can mean an almost global environmental crisis, and need relocate millions of people, making hundreds of square kilometers uninhabitable, etc. In that case you don't really want to rely on five generators on some roof. Which may or may not work on that day.

And this is not something I make up here now. I remember discussing nuclear safety in high school, and the bottom line was: NPPs are ok, since they become uncritical when _everything_ fails, because the moderator rods slide down into the reactor vessel.

But after Fukushima I read, that actually the situation there, with that specific model, is different, unfortunately. Tough luck. Because that model still needs some cooling because the fully moderated reactor still produces 1% it's total energy, an that is enough to bring the reactor into an 'undefined' state, iirc. And it is easy to imagine what that means for a station that has just been struck by an earthquake anyway.

My whole point is: your comment makes it appear as if the security layers actually were plenty, and I would (respectfully, of course) disagree with that. I think it was poor.

That point is important: if NPPs aren't build safely, what is then built safely? My guess is: nothing.

So what to do? Design for failure. (Politically, technically, economically, can be applied everywhere.)

End Of Rant :)

cstross · on June 21, 2012

So what to do? Design for failure. (Politically, technically, economically, can be applied everywhere.)

Yup.

On a similar note, the French response to Fukushima Daichi is rather interesting (France relies on nuclear generation for over 80% of its electricity):

http://www.nature.com/nature/journal/v481/n7380/full/481113a...

"The ASN has also come up with an elegant technical solution to get around the (universal) dilemma of how to protect a plant from external threats, such as natural disasters. The report recommends that all reactors, irrespective of their perceived vulnerability, should add a 'hard core' layer of safety systems, with control rooms, generators and pumps housed in bunkers able to withstand physical threats far beyond those that the plants themselves are designed to resist."

(And a mobile emergency force who can move in and stabilize a reactor after an unforseen catastrophic disaster that kills everyone on-site and destroys most of the safety systems.)

In other words, they now expect unpredictable Bad Things to happen and are trying to build a flexible framework for dealing with it, rather than simply relying on procedures for addressing the known problems.

ghshephard · on June 21, 2012

": you need to be able to face a situation where _all_ you smart emergency systems fail. "

Totally agree with you here - Until recently, no nuclear power plan was designed such that it could survive failure of all their emergency systems. Hopefully, with the negative repercussions of Fukushima on the industry, engineers are rethinking their approach to Nuclear Power.

See: http://en.wikipedia.org/wiki/Passive_nuclear_safety

sausagefeet · on June 21, 2012

> I'm fairly certain that the higher-ups in Twitter weren't told "We have pretty good failover protection, but there is a small risk of catastrophic failure where everything will go completely down." Whoever was in charge of disaster recovery obviously didn't really understand the risk.

Is this really a valid conclusion to come to at this point? I expect downtime in any service I operate. It's just how the world works. Does that mean I don't understand the risks and am misleading the board?

adgar · on June 21, 2012

Do you expect Google search to have downtime?

kaib · on June 21, 2012

Google search has downtime.

sausagefeet · on June 22, 2012

Yes, and it has happened.

grandalf · on June 21, 2012

This is a really good point.

Any assessment of risk entails certain larger assumptions about the world, many of which often turn out to be mere guesses.

Consider all the prices that are set to their current levels b/c nobody expects the collapse of the US political system to occur. Yet there is a nonzero probability that it will occur.

On one hand this seems like an absurd example, yet it exemplifies the kind of blind spot we are prone to when assessing risk. We generally address all the risks we can directly control, then classify the rest as "systemic" which essentially means that we are not able to compute them so we're going to ignore them.

Yet many systems which we assume to be stable or predictable (governments, companies, markets, weather patterns, social trends, etc.) have unexpected aberrations now and then which can have very significant consequences. Since these tend to impact most companies equally, the market will converge on an equilibrium where no firms do anything to hedge against these things.

Do you want to pay extra bank fees so that your bank can hedge against the collapse of the US currency for your checking account? Probably not. Do you want to triple your hosting costs to hedge against a massive US power grid failure? Probably not. The same applies to asteroid risk and sudden ice age risk.

On the other hand, if you have lots of money saved, you may wish to hedge against the collapse of one currency or another, and if your business would end if you suffered a few hours of downtime, you might want to invest in massive amounts of redundancy.

Every morning when we all commute to work we risk death. Some exposure to systemic risk is considered acceptable, and part of the character of any person or business is the kind of risk exposure we tolerate day to day. A doctor working in an AIDS clinic risks needle sticks and HIV, a startup doubling its users each month risks downtime but also risks a cash flow crisis.

timdorr · on June 21, 2012

Your examples describe two entirely different systems. The failover of a software product is drastically different from the failover of a power system. Trying to map everything back to a common best practice under the category of "risk" seems like it would miss out on important intricacies.

SoftwareMaven · on June 21, 2012

Risk management is about determining how to identify risks, as such, it is applicable everywhere. However, much like security is applicable everywhere, securing Fort Knox is a very different endeavor than securing a web site.

rhizome · on June 21, 2012

That's not really fair, we as humans are bad at risk analysis. Bruce Schneier has written extensively on the role of cognitive biases et al in measuring perceived risk.

http://www.schneier.com/blog/archives/2009/08/risk_intuition...

Googling "Schneier risk" gets you lots and lots of reading material.

The only question here is whether Arrington is going to write another Amateur Hour post about this.

http://techcrunch.com/2008/04/23/amateur-hour-over-at-twitte...

cygal · on June 22, 2012

I've been looking for years for a good reference that says "human are bad at probability estimation", but this one doesn't seem to work: he's simply saying that we're very good at estimating regular risks (if I understand correctly).

16s · on June 21, 2012

You've never done DR have you? It's a business process with a cost and there are RTO's (recovery time objectives) and RPO's (recovery point objectives). Systems can and will go down. So long as the recovery meets the defined objectives, then DR has been performed correctly. There is a limited amount of money and resources that businesses can spend on DR, COOPs, etc. You should understand that.

One hour RTO and RPO will cost way more than 24 hour recovery. Edit... and the business managers decide how much they wish to spend on DR. It's a trade-off and anyone who has ever done it, understands that.

arrel · on June 21, 2012

I'd guess the higher ups at Twitter have run the cost benefit in their head (and probably many spreadsheets) plenty of times, and in most cases spending your limited resource on disaster recovery preparation just isn't worth it. Their site being down does not qualify as a "disaster" - they'll be back up soon, then we'll all be tweeting away again within minutes.

steve8918 · on June 21, 2012

"Disaster recovery" is a term. I'm not saying it's a disaster in the sense that some horrible thing has happened. Like most people said, it's maybe annoying to some of Twitter's most avid users but fairly innocuous.

The point is that at least in the case of Heroku and EC2 (I'm not sure what caused Twitter's outage yet), the causes of failures weren't something like a "16-sigma" event like a plane hitting an electrical tower (a tragic event that happened in Palo Alto a couple of years ago). They were things like insufficiently tested software and processes, and misconfigured devices. These things do not add cost, except maybe incrementally more man-hours in terms of testing and auditing of configurations. They are not million-dollar diesel units that require permits, etc.

My point is that if a device misconfiguration can take down EC2, or a single bad data in their data stream can cause massive failure, it means that the entire system is much more fragile than its been sold to everyone. If they didn't realize this was the case, it means that the risk of failure was a lot higher than they had assessed.

epoxyhockey · on June 21, 2012

I agree - risk assessment does not mean mitigate all risks 100%. Risk is a trade between cost and consequences of failure.

debacle · on June 21, 2012

Good disaster recovery doesn't cost a ton of money, but it does require testing.

In almost every disaster situation I've been a part of, the UPSes have failed. Almost every time.

Disaster recovery has a horrible track record.

alex-g · on June 21, 2012

This is a great point, and this isn't just about Twitter but also about many other sites and services that seem to depend on it. It looks like a lot of people have created a distributed system version of dependency hell for themselves, where they rely on a multitude of third parties not to change behaviour or go down. Additionally, in many cases and perhaps perfectly legitimately from a cost-benefit perspective, the envisaged way to recover from this kind of problem is to assume that people can quickly and frantically hack their way out of it at short notice.

0xbadcafebee · on June 21, 2012

Accidents happen.

(expanding my reply) No risk assessment in the world will stop a cage monkey from tripping over a pile of 1Us and falling onto the big red button. Figure out what your pain threshold is and live with it.

sneak · on June 21, 2012

http://en.wikipedia.org/wiki/Blowups_Happen

willvarfar · on June 21, 2012

I know of a chap, called Terry, who became known as Total Network Terry. TNT. Because he accidently plugged the wrong cable into the wrong jack in the wrong cable cupboard on the wrong floor.

Just finding it took days to sort out, apparently.

SoftwareMaven · on June 21, 2012

That's like when I unplugged what I thought was our T1 line, but it turned out it was a neighbor's, who happened to be an ISP, which was a new business direction we were just moving into. And they didn't believe it was an accident...

freehunter · on June 21, 2012

This is called "acceptable risk". If it would cost $3m to eliminate the risk and the damages would only total $1.5m, that would be an acceptable risk.

0xbadcafebee · on June 21, 2012

What I was trying to point out is that catastrophe is inevitable and risk assessment isn't a panacea. I'd rather invest money in proactive countermeasures to unknown risks than try to think of all the things that could go wrong (which you'll never have the money in your life to fix anyway).

But also to the thread parent's comment about Twitter execs not realizing there was a minor risk of catastrophic failure: Executives only care that their money-making baby keeps running. In the past i've seen execs demand that an engineer call them at 3AM if the production site goes down for more than 5 minutes... even though that call is pointless. I think they just assume there's no point in getting involved with the plan because the plan will never be perfect, but at least they can be aware of a problem so they can cover their asses and tell a higher-up that it's being worked on. At the end of the day, even the guys at the top don't really give a shit about the product, they just care about their paycheck.

algorias · on June 21, 2012

not necessarily. even if the expected value of taking a risk is positive, taking the risk can be undesirable, e.g. due to large variance.

statistical distributions that include a risk aversion parameter exist precisely to model this kind of problem (unfortunately their names have slipped my mind, otherwise I'd provide links)

wtetzner · on June 21, 2012

> Whoever was in charge of disaster recovery obviously didn't really understand the risk.

http://www.theregister.co.uk/2001/09/10/its_bofh_disaster_re...

srobbie · on June 22, 2012

It is simply a matter of perceived value and cost benefit. Why would a cio spend millions on Dr when the probability of diaster is so minute that the risk manager cannot even calculate it? Ok there is a risk that a plane will hit the pdc. .00000002%. And ultimately will our business grind to a halt? Or can we use a manual workaround until backups recover to sdc and we capture data lost since last backup. I mean I have a hard time taking this sort of risk seriously unless I'm running dialysis machines and someones life is at risk.

elithrar · on June 22, 2012

> ... and that there is an opportunity for "consultants" who understand this, kind of like security consultants?

Risk management consultants already exist! Many companies just choose to assess it internally, or the consultants themselves are inexperienced (often lacking practical experience).

zupreme · on June 21, 2012

I think that a lot of you guys are confusing "Disaster Recovery" with "Business Continuity".

Disaster Recovery is a reactive approach. It's what you do to get things back up AFTER a system or site has failed.

Business Continuity is a proactive approach. It's what you do to ensure that your critical services will remain viable whenever disaster occurs.

In the cases of Heroku, Amazon, Twitter, and many more, their Disaster Recovery strategies have been successful. The fact that they came back online without major data loss is proof of that. Their business continuity strategies, however, have been found wanting.

antirez · on June 21, 2012

We don't exactly know if the services troubles in this down times you cite were caused by a disaster, so maybe even the disaster recovery thing may be lacking.

johnyzee · on June 21, 2012

I hope they write up a post-mortem on the fallout (hopefully it won't be a post-mortem of Twitter). Those things are always extremely interesting with big infrastructure like this.

loceng · on June 21, 2012

I think it's almost a requirement of these large platforms now. If they don't then developers would lose trust in them; I would anyway.

hinathan · on June 21, 2012

Hasn't Twitter demonstrated a general disinterest in their developers of late?

mootothemax · on June 21, 2012

Always fun when you're developing against an API, and then have to perform a frantic investigation to work out if your latest code change broke everything... or it's just the API endpoint itself.

auxbuss · on June 21, 2012

In ruby, at least, there are some good tools to automate this for you. Using, say, vcr, you can automatically "record" your API calls (into "cassettes") as your tests run; these data are then used when you run your unit tests subsequently. When you plan to integrate/push, simply delete the "cassettes" and run your tests again. That way, any API changes are picked up prior to integration.

I have a suite here that takes about 3 minutes to run from scratch, but just over 1 second as unit tests.

zmoazeni · on June 21, 2012

I had this problem years ago against the Twitter API. It took a shot of rum and a cigarette before I figured it out. And I'm not a smoker.

DanielN · on June 21, 2012

Unit test ( as already mentioned) and build up static json/xml/whatever files to run your code against so that you can integration test your stuff without integration testing the 3rd party stuff.

Retric · on June 21, 2012

And hope your static files still match the API behavior when your done...

timdorr · on June 21, 2012

That's actually a completely valid test to run: what happens when the API's results change on you? You should test that.

DanielN · on June 21, 2012

At some point you end up outside of the scope of testing, but you can build bootstraps to make sure that your mocks match whatever they are mocking.

Obviously, this does you no good when something like an API service goes down, but that's issue is irrelevant to the question that was posed. Having some static file to test against whether it is accurate or not will instantly tell you whether it is your code or the third party that is breaking.

ckdarby · on June 21, 2012

This is what unit testing is for

CD1212 · on June 21, 2012

One great thing about this for me: it exposed a bug in my app which relied on twitter being up.

wtetzner · on June 21, 2012

If Twitter never went down, would it be a bug?

MiguelHudnandez · on June 21, 2012

I had a similar issue in one of my sites. It's an unnecessary dependency and should be corrected. Maybe not a bug, but a problem.

jgrahamc · on June 21, 2012

x is down: http://blog.jgc.org/2010/10/x-is-down.html

kristofferR · on June 21, 2012

Well, it is newsworthy at least.

Sure, being upset/getting angry just because of a little bit of Twitter downtime is stupid, but that doesn't take away from the fact that one of the biggest and most important discussion and communication channels the web has is completely down.

jgrahamc · on June 21, 2012

I don't believe it is newsworthy. In fact, it's sad that it is being written about because it shows the utter shallowness of what passes for 'Silicon Valley News' at the moment. And it's sad to see this as a 'top story' on Hacker News taking up space.

willtheperson · on June 21, 2012

I think it's only newsworthy in this space - where, as some of us are the ones responsible for these systems - we are trying to learn why this happened in order to prevent it.

You know, learn from others mistakes and all.

jgrahamc · on June 21, 2012

I agree that learning from why Twitter was down will be interesting and when that story comes out I hope it will be high on Hacker News. But this sort of news story about something that's happening right now is symptomatic of the useless '24 hour news' cycle of noise.

cryptoz · on June 21, 2012

> But this sort of news story about something that's happening right now is symptomatic of the useless '24 hour news' cycle of noise.

The single most important part of the internet is the immediate availability of news (to me, anyway). I've never heard anyone complain about that before; why do you think it's not worth knowing and talking about events as they happen? 'Twitter is down' isn't noise. Years and years ago there were stories about fire departments (SF I think?) that started using Twitter to send out fire notices. Here in Montreal, the police tweet very quickly and accurately about our daily student protests.

Twitter is extremely important to a huge number of people, and when it goes down, a site like HN definitely should be talking about it. It's big news and it's almost exclusively relevant while it's happening.

I love the internet.

jgrahamc · on June 21, 2012

So, here in the UK Twitter is back for me. It looks like I was without it for about 45 minutes. Call it an hour for a nice round figure.

Think about these two scenarios:

1. During that one hour you spend your time focussed on talking about this event as it's happening, speculating, having an emotional response (because you can't access something you want to and find a group of people experiencing the same thing and all get together to experience the frustration).

2. Tomorrow you read a story that says "Twitter was down for one hour yesterday" with some detail about what happened.

I believe that the latter is preferable. It's more efficient, less emotional and more useful. The former is the same as watching some 'Breaking News' event while is happening.

Now imagine that the one hour of downtime happened when you were asleep. You've missed nothing.

There are two scenarios where this news is important: if your business depends on Twitter, and if you are trying to assess the reliability of Twitter. The latter can be achieved by #2 above, only the former needs real-time updates and that doesn't mean general news reporting just your own monitoring.

Terretta · on June 21, 2012

As a person whose love of news started at age five scrapbooking newspaper clippings about Nixon, Ford, and Carter, your point makes me very happy.

The 24 hr news cycle ruined TV news, and SEO has made Internet news worse (first links win).

mkr-hn · on June 21, 2012

It seems like your perspective is that HN exists for the links it points to. I read HN for the discussion, and this thread has many good discussions in it.

tg3 · on June 21, 2012

If a large manufacturer had a fire in their facility, it would certainly be industry news. This seems like a similar situation.

jgrahamc · on June 21, 2012

This is more like a large manufacturer having a power outage. A fire would destroy machinery or stock. Nothing's getting lost here, something's just unavailable.

Karunamon · on June 21, 2012

But a large manufacturer having a power outage, for, an hour like we are here, would cause a great deal of trouble with anything downstream of that manufacturer. And I'd be willing to bet that industry sources, mailing lists, forums, (whatever the manufacturing equivalent of HN is) would be discussing the outage, how bad it is, how this is going to completely screw up profits, how irritated they are, etc, etc.

tedunangst · on June 21, 2012

You don't think GM or Ford have ever lost power for an hour? Did you ever read about it? Did the car dealerships suddenly run out of cars for one hour three weeks later? I think you are vastly overestimating the impact transient events like this have.

Karunamon · on June 22, 2012

And I think you are vastly underestimating the percieved impact by the people involved, which was kind of my point.

Both Twitter going down for an hour and a plant that assembles cars going down for an hour aren't that big of a deal in the huge scheme of things, but for people who are intimately connected (either work there, know someone who does, are emotionally connected to the product in some fashion, etc), it feels a lot bigger than it is.

samstave · on June 21, 2012

If one of the largest services in the valley being down is not tech news, then what exactly do you think would be news?

Think of "twitter being down" as Silicon Valleys equivalent to hollywoods "Lindsey Lohan is drunk in jail again"..

The tech companies, their founders staff and services are our pop-culture to gossip about.

daeken · on June 21, 2012

> Think of "twitter being down" as Silicon Valleys equivalent to hollywoods "Lindsey Lohan is drunk in jail again"..

This is Hacker News, not Globe for Hackers. Seriously, tabloid-esque coverage of tech companies adds nothing of value at all.

samehere · on June 21, 2012

it's called hacker news. but for starters, "hacker" implies not being spineless or giving a fuck about peer groups. hackers also don't like censorship; so you see how hacker news is anything but. it's an involuntary joke, nothing more.

the word has been appropiated by those who are needy like that: you don't call yourself hacker, just like you don't call yourself saint. only mediocre people to whom it never applied and never will would do that. the end.

tokenadult · on June 21, 2012

That's a good blog post. But "x is down" is newsworthy for sufficiently large number of users of x and sufficently long downtime. That's because consumer experience with this or that online service influences the online service's reputation. A service with few users, or users who have lower expectations, can endure more downtime without loss of reputation than a service with many users who expect the service simply always to be available, at least as available as broadcast television or plain-old telephone service.

taylorbuley · on June 21, 2012

xx Next time your favorite waste of time is down, just shut up about it. xx

Twitter is infrastructure for us in media. It's well worth discussion.

samehere · on June 21, 2012

that's what you get for not using RSS.

also, "us in the media", what kind of whore talk is that even? twitter is correctly referrerd to in w3c docs as medium preventing intelligent discussion. so it was down? GOOD. people are inconvenienced? even better! it cannot possibly have hit anyone or anything that was worth fuck all.

taylorbuley · on June 21, 2012

Cool it on the ad hominem please

I'm not sure why you suggest RSS is somehow synonymous with Twitter, but I will say that in addition to RSS buttons almost every major media company on the planet has a Twitter button on its article page (I work on one, which is why I say "us in media"). Because many sites don't do proper async JS when it comes to social buttons, an outage on socnets can be crippling

Here's an example: http://techcrunch.com/2012/06/01/facebook-outage-affects-oth...

Uchikoma · on June 22, 2012

Hello, Dave

billpatrianakos · on June 22, 2012

This deserves to be the top comment. Your one liner nailed it. Twitter was down long enough for far more people not to notice than did notice. Shit goes down. It always will. Whining about how whoever needs backups or failover protection or distributed networks of servers across the planet or should use a VPS instead or a dedicated server instead or Heroku instead or EC2 instead or a combination of all that crap doesn't make you right. It makes you a speculator. No amount of fallbacks will give you 100% uptime ever. And calling this a massive failure is also ludicrous. It's just some downtime. It went right back up so chill.

These posts are so incredibly annoying. We can see if service x is down for ourselves. That isn't news. I could maybe accept these stories if the link on the front page was to a blog post stating that not only is service x down but why it went down for sure plus an added lesson we can learn from it. Short of that it's become an easy way for people to build up a trillion karma points. And if you want to tell me you don't care about karma then you're either lying or you have none. Enough with this crap. We'll find out ourselves but most of us won't actually because we have lives and by the time we go online to check our favorite wank-off site it'll probably be back up again like the past fifty times I've seen a story about Heroku/AWS/Twitter being down.

MattRogish · on June 21, 2012

I just ate lunch. It was sooo good! [Checked in to Shake Shack, NYC]

hendler · on June 21, 2012

LOL @MattRogish RT I just ate lunch. It was sooo good! [Checked in to Shake Shack, NYC]

rhizome · on June 21, 2012

yep @hendler LOL @MattRogish RT I just ate lunch. It was sooo good! [Checked in to Shake Shack, NYC]

leot · on June 21, 2012

Twitter is a protocol masquerading as an app.

k3n · on June 21, 2012

Conversely, this has always been my complaint against the general hoopla and bandwagonning around Twitter -- they didn't do anything technically innovative. When they first started generating buzz, I instantly thought, "Oh, did they create a new messaging protocol that is universally accessible, redundant and easy to use? That's killer!" But, that was not the case.

Yes, they had a good idea and executed very well, but as I see it, Twitter is nothing more than your run-of-the-mill 4chan board. I still don't understand the draw, but then again, there's a lot of facets of modern society that I simply have no explanation for (reality tv?!?) and have been better off not worrying about it further.

kristofferR · on June 21, 2012

99,9% of people doesn't care at all about how technically innovative something is, they only care about how useful it is. And Twitter is incredibly useful.

MatthewPhillips · on June 21, 2012

Well, it's not incredibly useful right now, and it would be had it been a single node and not the entire damned thing.

bithive123 · on June 21, 2012

"Incredibly useful" would imply that if you told me about Twitter, I wouldn't believe you. As much as I wish it were otherwise, it's not that hard to believe that someone invented IRC over HTTP and gave it a silly name.

polyfractal · on June 21, 2012

Christ, enough with the pedantry. I'm getting sick of all the pedants on HN critiquing idioms and common phrases.

Did you understand what the GP meant? Yes. Did you need to point out to the world your complete mastery of the English language? No.

The GP's point was that Twitter is useful, despite it's relatively low score on "innovative new technology" scale.

jdminhbg · on June 21, 2012

Many people, myself included, were skeptical of the value of Twitter until they used it. In that sense, we literally did not believe how useful Twitter was. Also, Twitter is almost nothing like IRC in any respect other than being an electronic communication medium.

wtetzner · on June 21, 2012

> Many people, myself included, were skeptical of the value of Twitter until they used it.

I'm still skeptical and I have used it.

philmcc · on June 21, 2012

I think that's okay. He said Many, not Everyone. Right?

glhaynes · on June 21, 2012

People don't expect "incredible" to mean "not credible" anymore.

bithive123 · on June 21, 2012

That's literally the most awesome thing I've read today, for all intensive purposes!

gfosco · on June 21, 2012

..... for all intents and purposes.

bithive123 · on June 21, 2012

Christ, enough with the pedantry.

marshray · on June 22, 2012

If it once was "credible" but no longer is, perhaps a more appropriate term is "uncredible" or "decredible".

Of course, that woudl imply that the act of removing credibility from something would be "decrediblizaiton" or "decredencing".

mikecane · on June 21, 2012

People like to chat with other people: http://en.wikipedia.org/wiki/Cb_simulator

untog · on June 21, 2012

Oh, did they create a new messaging protocol that is universally accessible, redundant and easy to use?

Something the vast majority of people couldn't care less about. Twitter has, in effect, created a new messaging protocol, in the broadest sense of it. It's accessible on your computer, your phone, even your TV if you try hard enough. It's integrated with hundreds of apps and sites. Technically speaking it isn't doing anything particularly amazing (although the sheer scale they deal with is), but that's not really the point.

MatthewPhillips · on June 21, 2012

> It's accessible on your computer, your phone, even your TV if you try hard enough.

Note the irony of this comment existing in a post about the service not being accessible, anywhere. Which wouldn't be the case had it been a protocol.

untog · on June 21, 2012

A massively decentralised protocol would not exactly be the model of reliability, either. I'm not even sure how that could possibly work in a service like Twitter.

mkr-hn · on June 21, 2012

It would probably work like an IRC network. A network might involve hundreds of servers, but you only notice which server another person is using when there's a netsplit. Split users can reconnect using another server and rejoin the channel with only a slight interruption.

An IRC-styled Twitter would need some sort of synchronization service to handle twitsplits.

MatthewPhillips · on June 21, 2012

Has email ever been down, like, everywhere?

untog · on June 21, 2012

If you think of it like e-mail, who runs the equivalent of the e-mail servers? Your ISP?

droob · on June 21, 2012

Once the user base is big enough, that's the main draw.

protomyth · on June 21, 2012

Right, Twitter's main feature isn't the messaging, it's the namespace.

jaxn · on June 21, 2012

s/protocol/abstraction layer/

What is great about Twitter is that it allows both the sender and the receiver to choose if they want to interact over the web, through an app, via SMS or even email (receiving DMs). Take the lowest common denominator across all platforms (no subejct, max 160 chars in SMS) and sit in the middle as an abstraction.

derrida · on June 21, 2012

Free it now!

sp332 · on June 21, 2012

Free it? It's just 140-character blog platform and an SMS interface right? What's closed about it?

ghurlman · on June 21, 2012

https://mobile.twitter.com/ is working just fine.

pors · on June 21, 2012

Maybe it's built only on top of the streaming API? https://dev.twitter.com/status

gruuk · on June 21, 2012

And has a better UI than their normal version (to me, anyway). It's kinda cool that twitter has a fully functional backup site already online.

sga · on June 21, 2012

True. It's like a ghost town in there. #failwhale

timf · on June 21, 2012

"Today's turbulence explained" was just posted: http://blog.twitter.com/2012/06/todays-turbulence-explained....

Unfortunately there are no details, it just says "there was a cascading bug in one of our infrastructure components".

aeurielesn · on June 21, 2012

Twitter Status - http://status.twitter.com/

No news at the status site either, that beats the purpose of having a dedicated status site.

xabi · on June 21, 2012

Twitter Site Issue 17 seconds ago

Users may be experiencing issues accessing Twitter. Our engineers are currently working to resolve the issue.

Argorak · on June 21, 2012

My favorite part is the Tweet-Button with zero tweets.

tg3 · on June 21, 2012

It has 56 tweets now...perhaps the outage is somewhat intermittent?

Deestan · on June 21, 2012

"Users may be experiencing difficulties accessing Twitter"? Weasel words are so annoying and unnecessary. "Twitter is down." would do.

mkr-hn · on June 21, 2012

"Twitter is down" means something very different from "Some people are having trouble accessing Twitter." My own traceroute results during the outage pointed to network problems, and I saw people saying they could access it.

It was unavailable to many people, not down.

larrys · on June 21, 2012

They just updated:

http://status.twitter.com/

wyclif · on June 21, 2012

Users may be experiencing issues accessing Twitter. Our engineers are currently working to resolve the issue.

That's called understatement.

jauer · on June 21, 2012

The best part is their status site is powered by tumbler even though they acquired posterous.

dlokshin · on June 21, 2012

The even better part is that Tumblr is often down itself, which means that status.twitter probably has more downtime than www.twitter

ceejayoz · on June 21, 2012

The real irony for me is they host their status page via Tumblr, which has at times had a pretty dismal uptime number.

mrose · on June 21, 2012

This link isn't working for me as of now

gms7777 · on June 21, 2012

Twitter is down? Now where am I supposed to complain about twitter being down?

mkr-hn · on June 21, 2012

https://plus.google.com/s/%23TwitterDown

0xbadcafebee · on June 21, 2012

Sounds like you just invented a new product: XIsDown.com, where not only do you get notified that X is down, you get to whine about it with other people.

mkr-hn · on June 21, 2012

I'm glad this made it to the front page. Is the topic itself newsworthy? Not on its own. Is all the discussion that's flooding into this thread worth having?

Yep. Even the subthread from the person complaining that this isn't newsworthy.

revorad · on June 21, 2012

http://mobile.twitter.com is working.

timdorr · on June 21, 2012

As does SMS.

mikecane · on June 21, 2012

Looks like that just went down too, while I was using it!

Edit: Nope. Just slow. My tweet appeared.

m_pagliazzi · on June 21, 2012

for me it works

mikecane · on June 21, 2012

What's up with the downvotes?

Braasch · on June 21, 2012

It'll load, but it won't allow you to sign in, Tweet, etc.

tikitaka · on June 21, 2012

this still applies: http://blog.pinboard.in/2011/12/don_t_be_a_free_user/

Mojah · on June 21, 2012

If you're debugging webservices that suddenly slow down (timeouts of 10s), this may be your cause if they depend on s.twitter.com, search.twitter.com or api.twitter.com.

As a workaround for those systems, add s.twitter.com, search.twitter.com and api.twitter.com in your /etc/hosts file that map back to 127.0.0.1.

This obviously breaks Twitter integration, but it also makes sure page loads don't explode when waiting for remote resources.

Braasch · on June 21, 2012

Not only is the website down, but it seems the entire API is as well.

dreadsword · on June 21, 2012

Yep - stalling out big time. Good chance to test your error handling in a real world scenario!

nulluk · on June 21, 2012

Have just had to add timeouts to a few requests to the twitter API. Was completely hanging a whole site. Learn't something though, to never trust an API to respond in a reasonable amount of time!

dreadsword · on June 21, 2012

Words to live by!

zainny · on June 21, 2012

Now how am I supposed to tell people that Twitter is down!?

kristofferR · on June 21, 2012

It's sad, but my immediate instinct when I couldn't post a tweet was to try searching for other people having problems with Twitter - on Twitter.

simondlr · on June 21, 2012

Same. It gives credence to Jack Dorsey's claim of Twitter "holding the public conversation".

sakopov · on June 22, 2012

Millions of people couldn't post a status update for hours. It's a terrible day on the internets.

zbowling · on June 21, 2012

the mobile site is still up. http://m.twitter.com/ You can tweet and everything. The streaming API is also still partly up.

ntkachov · on June 21, 2012

I like how on status.twitter.com they tell you you might have issues accessing the site. And then they give you a "tweet this" button.

qhoxie · on June 21, 2012

http://downscout.com/twitter.com if anyone wants to track it.

dreadsword · on June 21, 2012

Its been ages since Twitter has had one of its famous fail whale outages.

pinko · on June 21, 2012

Notably this outage lacks even a whale.

darkstalker · on June 21, 2012

The birds finally dropped the whale

moolcool · on June 21, 2012

People are going to have to resort to [desperate measures](http://nedroid.com/2009/05/people-have-to-know/).

teekarja · on June 21, 2012

Twitter is down. The very fabric of modern life is endangered

debacle · on June 21, 2012

Sorry guys. I added twitter to my phone yesterday and tweeted twice. I didn't think it would break anything.

larrys · on June 21, 2012

It's not down in when tried with an anon proxy service:

https://proxify.eu

rellik · on June 21, 2012

Wheels down @ Chicago O'Hare. Ping me if you wanna meet up while I'm in town

yati · on June 21, 2012

If they decide not to hide it, some interesting news is on its way ;)

voltagex_ · on June 21, 2012

Elaborate?

arunagarwal · on June 21, 2012

Where should I tweet... when twitter is down!!

davidkellis · on June 21, 2012

Why is this on the HN front page? This is an entirely worthless post. It adds no value. Nobody is going to reread this at any point in the future. Utterly worthless.

larrys · on June 21, 2012

http://www.klonix.org/jun2012/twitter-20120621-125438.jpg

modarts · on June 21, 2012

How does a service who's sole purpose is to allow users to create and retrieve 140 character messages go down on such a frequent basis?

mikecane · on June 21, 2012

I'm not able to sign in. It happened when I tried to send a tweet from my blog while I was signed in. So, yeah, even the API is down.

circa · on June 21, 2012

Not even a fail whale. bummer.

cdooh · on June 21, 2012

It's down here in Kenya too

jsilence · on June 21, 2012

yes, "walled garden" is "single point of failure" spelled backwards.

th0ma5 · on June 21, 2012

Mobile is back for me now

Matt_Mickiewicz · on June 21, 2012

You forgot the second part of the headline - "Twitter Completely Down, Productivity Surges"

draftable · on June 22, 2012

But if Twitter is down, how can I tweet my complaints about Twitter being down?!?!?

ThomPete · on June 21, 2012

Hmm it's up for me.

meifun · on June 21, 2012

Wow, Twitter is down, I guess I can get some work done today :-)

mck- · on June 21, 2012

It's back up

jwu59 · on June 21, 2012

twitter is up for me

thatusertwo · on June 21, 2012

I can't tweet!

SuperChihuahua · on June 21, 2012

Wheres the whale?

goronbjorn · on June 21, 2012

http://www.quora.com/What-are-the-most-commercially-successf...