Cause of YC/HN outage discovered

pg · on April 21, 2010

Things should be back to normal now. Trevor just moved www.ycombinator.com to Slicehost, and I just told HN to refer to static stuff there instead of serving it locally.

Interesting how easy it is to move your whole web site. Good customer service is important when users can switch so easily.

carbocation · on April 21, 2010

Any particular reason for Slicehost as opposed to any of the other similar options (Linode, prgmr)? I'm not affiliated with any, though I run my site on VPSes from Linode. I was convinced by a Dec 2009 post by an HNer, uggedal: http://journal.uggedal.com/vps-performance-comparison

jyothi · on April 22, 2010

Me too. That post was detailed & fair. On slicehost I had a 256MB slice which I upgraded to 512MB slice to host 2 small websites. After the post got a 360MB linode VPS instead and am very happy with it.

Edit: Slicehost help articles & resources, well organised & exhaustive is a plus though.

davidw · on April 22, 2010

I came to similar conclusions myself:

http://journal.dedasys.com/2008/11/24/slicehost-vs-linode

JshWright · on April 22, 2010

Re: Slicehost's help articles... Have you seen the Linode Library http://library.linode.com/ ?

othello · on April 21, 2010

Thank you ! Procrastination becomes much harder when HN does not provide a continuous flow of interesting distraction.

nfnaaron · on April 22, 2010

Hacker News: We get shit done, so you don't have to.

axod · on April 21, 2010

>> "Things should be back to normal now."

Hopefully that's the normal of months ago when loading comments/threads etc was quick?

Still taking 10-20 seconds to load most pages. One thing I haven't tried yet is just creating a fresh account - maybe mine just has too much associated with it (Maybe I comment too much etc).

edit: ah I see it only affected static content. Shame :/

Locke1689 · on April 22, 2010

I've had some speed issues too. My account isn't tiny, but it is a 12th of your upvote count.

projectileboy · on April 22, 2010

Heh heh... Totally clueless companies are so cute sometimes; like watching a toddler try to eat chocolate cake.

sh1mmer · on April 21, 2010

Dear Pair Networks,

I'd like to highlight this lesson in how to loose (or not gain) customers by randomly shutting down technology sites that serve the decision makers you wish to influence.

Tom

P.S. Well done Slicehost.

gojomo · on April 21, 2010

150K hits in 30 minutes is 83 hits/second. That's a lot to ask for from a shared-hosting account.

When the traffic starts to impair neighboring sites, something has to be done. Just about any ISP will do the same thing: block the site with the surge, that could possibly make other arrangements, rather than inconvenience other customers whose traffic is as expected/usual.

The detail missing so far is why Pair noticed today, if it was the same level of traffic as before, or a slow build. Was a new threshold crossed? (Did someone's HN-focused tool go haywire?)

The Pair message suggests end-of-day logs will be the way to tell for sure.

BrandonM · on April 22, 2010

So let's see... you're hosting a website that has gotten large; that is, it's grown to the point that it will need higher-cost services in order to meet demand. You have a chance to add a valuable customer to your client base. How best to handle this?

a) anything

b) except

c) killing their service

Sukotto · on April 22, 2010

Yeah, Pair screwed themselves pretty badly by pulling this trick. They would have been far better off doing something like:

Dear [account_contact_user]

Your website traffic has risen beyond the maximum threshold of [threshold_amt] for the [name_of_level] level of service.

Since we appreciate your business of the last [length_of_service], we have given you a 24 hour courtesy upgrade to our next level of service -- [name_of_next_level]. If, by [end_time] you decide to keep this level of service you must contact our sales center to arrange payment. Otherwise we will have to start throttling traffic to your server so that it remains below the threshold of [threshold_amt] and does not impact our other customers.

If you have any questions about this courtesy upgrade, or wish to keep this new level of service, please contact [account_manager] at [account_manager_details].

Thank you for using Pair Networks for your hosting needs.

jacquesm · on April 22, 2010

Companies like slicehost make their money on the customers that do not use their servers.

ubernostrum · on April 22, 2010

How best to handle this?

Of course, the flip side is that leaving it running adopts an attitude of "screw all our other customers, they can eat crappy service while we kiss up to the popular guys who are chewing up everybody else's server resources". Which isn't what I'd look for in a hosting provider...

pmjordan · on April 22, 2010

False dichotomy. The correct way to handle this would have been to temporarily move the shared server to hardware where it won't impact other customers and notify YC that they need to move their server to a bigger server or the site will have to be shut down. Presumably with an ultimatum of a week or whatever.

TallGuyShort · on April 22, 2010

Agreed - the whole reason you pay a host is that there's some level of management and responsibility there. You're not just renting hardware.

BrandonM · on April 22, 2010

This is exactly the kind of approach that a smart company which cares about business would take.

eli · on April 22, 2010

What about the dozen other customers who are calling you to complain about crappy service right now?

bigboote · on April 22, 2010

Exactly

sh1mmer · on April 22, 2010

I was being tongue-in-cheek, but there is a real point here.

As you say I doubt today was unusual for HN, it's the way Pair went from zero to shut-down.

One of the things I like about the way that Joyent operate as a cloud host is that they allow you burst on shared boxes because of those times you need it. At the same time they'll let you know you need to think about buying more resources without just slamming on the brakes.

Pair should do a much better job of noticing a soft limit earlier, so if a heavy traffic day had hit HN PG et al should have been already aware they were overusing their shared hosting and were planning a route out.

mixmax · on April 22, 2010

Exactly - they missed an obvious opportunity to upsell. If Trevor had gotten a mail stating that he needed to upgrade he would most probably just have done so and been happy. It's not like he can't afford it :-)

bwb · on April 22, 2010

That technology is not out there... A static site might be fun at that level and not causing problems, where as a WordPress site with super cache is fine, but then a Drupal site can only handle 1/4th of that.

It currently is not really possible to track that per user in a shared environment and see who is using what resources from an individual perspective of MySQL / Apache / CPU / Mem.

moe · on April 22, 2010

That technology is not out there

Ofcourse it is.

Or how do you think they determined that it was pg's site causing the trouble?

InclinedPlane · on April 22, 2010

Absolutely wrong. That technology exists and is being used millions of times a day at sites all over the world. DreamHost, for example, employs just such a technology for their shared hosting: http://wiki.dreamhost.com/index.php/CPU_minutes

InclinedPlane · on April 22, 2010

The "ask" is not that a shared hosting account support 83 hits/second. It's that the hosting company demonstrate the bare minimum of respect and professionalism in contacting the hostee ASAP when their site is being shut down due to excess traffic. This isn't bitbucket or geocities, dollars are changing hands. If you don't like running a business that makes money then by all means, treat your paying customers like they are a burden to you, they will get the hint and go elsewhere.

neurotech1 · on April 22, 2010

Yes and no.

I doubt it is a HN-Focused tool, as this affected the static content from www.ycombinator.com, and images and CSS are not the focus of bots usually.

Assuming it was not a DoS attack, a smart host should have noticed the traffic load increasing over time, and offered to upgrade to a less-loaded server and recommend a dedicated server.

By disabling the site, they have lost a customer, and lost on a up-sell to a dedicated server.

gojomo · on April 22, 2010

You're assuming the load today wasn't extraordinary. That's not yet clear. PG doesn't know of any surge, probably based on figures he has access to on other hosts and analytics tools. But what if HN/YC static resources at Pair were newly being deep-linked, by a much more popular site? Only seeing the end-of-day logs from Pair, compared to previous days, can definitively answer this question.

zck · on April 22, 2010

tlb says it was "gradual": http://news.ycombinator.com/item?id=1283439 .

gojomo · on April 22, 2010

I'm sure that's what the general News.YC/YC stats show. But as the Pair.com tech mentioned, today's logs only become available at midnight ET. If there was a surge this morning -- to the static resources only, as if by offsite deep-linking -- Pair's logs are the only place it would be evident.

bigboote · on April 22, 2010

Todays's logs are similar to other days. I guess the traffic is high for a shared account, but I've never noticed it being slow so I didn't think about it.

I would have been happy to respond to an email saying I should upgrade & pay more, but even after a few emails with tech support that option didn't come up.

The only limit they advertise for that class of account is data transfer, and because it's mostly just serving uparrow.gif -type files we're well below that limit.

noonespecial · on April 22, 2010

Customers nothing. "I didn't know what that 'hacker news' thing was or if it should have that much traffic so I shut it off". Really?

Please turn in your geek card at the door.

axod · on April 22, 2010

FWIW, It's probably not worth their effort of working out exactly what websites are using cheap shared hosting and overusing resources.

I don't get shared hosting at all. VPSs are dirt cheap these days.

CGamesPlay · on April 22, 2010

VPSes are still shared hosting and are not immune from the problem.

pavs · on April 22, 2010

They won't shut you down for over-using resources. As a matter of fact you can't over-use resources, you can only use the resources you are given to you. I know they are different type of VPS, I am referring to Slicehost/Linode type VPS which has hard limit and acts very much like dedicated single box.

More important to me is that I am really really really surprised that Ycombinator was running on a shared account. This just blew my mind.

I have no hard feelings towards pear, I would have shut the site down too; possibly long ago.

chrischen · on April 22, 2010

They limit memory at Slicehost but not CPU. I think you can still burst CPU for too long and they'd eventually tell you to upgrade.

comex · on April 22, 2010

But they are trivial to migrate between machines on demand.

patio11 · on April 22, 2010

I typically don't tell other people how to run their businesses, but if a similar issue brought my website down and I were to post about the causes, I might focus more on my failures in capacity planning, vendor selection, and monitoring rather than on my vendor's lackluster customer service. User-visible failures are, ultimately, process failures on my part, regardless of the surface cause. A nice side effect of this philosophy is that improvements to my processes help with all sorts of surface causes whereas if I were to address surface causes individually it would be like playing whack-a-mole. Bad vendor whack, hard drive failed whack, traffic spike whack, poor customer service whack, out of memory exception whack whack whack -- why is everyone conspiring to keep me from getting any work done.

pg · on April 22, 2010

It didn't bring HN down. HN deliberately didn't rely on that server for anything except hosting static content that was also duplicated on this server. I planned in advance for the possibility that the other server wouldn't be usable, by writing the code so that I could switch to serving the same content off news by changing one variable, which I did. As a result service was barely affected.

In short, Pair flaked, but we had in fact planned the system in a way that protected us against it.

davidw · on April 22, 2010

One obvious point is that HN is a peripheral part of YC's business, mostly aimed at recruiting, rather than something that's critical.

swombat · on April 21, 2010

Shared hosting is generally sucky for anything remotely successful. I'm amazed you got away with storing your static assets there so long! When I got 100k hits in a day on my first blog, Dreamhost promptly shut it down without warning (in the middle of a slashdotting!)

PostOnce · on April 21, 2010

Dreamhost? Their main advertising point is UNLIMITED TRAFFIC!!! I guess I'll scratch them off the list of potential hosts.

jackowayed · on April 21, 2010

No, they they're careful to say unlimited bandwith (and storage and such), but not unlimited traffic. When I was looking around dreamhost docs/FAQs/something either right before or right after signing up, it did say somewhere that they might ask you to move up to their VPS if you're using too much CPU, which could happen just from a whole bunch of static hits.

It's nice for your hosting all of your little sites and getting them up in basically 0 time, even if you need wordpress or something. The panel's pretty nice. I'd at least sign up when they have one of their crazy deals (I got a year of hosting with a free domain for $9.something, which basically means you pay for the domain and get the hosting for free. It's a good way to give them a try with the main cost being that they'll suck you in and get you paying $9/month after your cheap price expires.)

yellowbkpk · on April 22, 2010

I was told in a friendly conversation with one of their service reps that shared accounts get roughly 200-250MB of total memory usage (Ruby on Rails bumped me over, for example) and a "certain amount" of 95%ile bandwidth averaged over some time. I have a feeling they sort their customer list based on these two values every once and a while and send the top X accounts a message.

axod · on April 22, 2010

Nothing is unlimited. It's just sales tactics.

byoung2 · on April 22, 2010

The terms and conditions pages for shared hosting sites usually outline that bandwidth, disk space, number of sites, and number of databases are unlimited, but CPU and memory are not. There's your bottleneck. You can put thousands of sites on shared hosting, but get one or two with decent traffic or heavy scripts and you get the upsell or leave email. That's why I host at Rackspace Cloud now.

froo · on April 22, 2010

I actually don't mind Dreamhost for very targeted uses.

I jumped on a deal at the beginning of this year for ~$9 for 1 year of shared hosting. I now use that account purely for running things like scripts through SSH for doing basic data grabs with wget for further analysis elsewhere.

I've more than got my money's worth and I've yet to see any kind of email message asking me to upgrade.

DrewHintz · on April 22, 2010

Dreamhost also repeatedly shut down my sites, such as mapwow.com, due to too much traffic. One time they even denied they shut it down -- even though my web directory had been chowned to root, which requires root privilege.

I moved to slicehost and appengine[disclaimer,etc] and have been relatively happy with them.

mikestanley · on April 22, 2010

Kinda funny and sad that so many people here only see this as some sort of stupid or unfair action against HN, seemingly without even acknowledging that every single other customer on that shared server had as much right individually, and more right collectively, to not have their performance negatively impacted by HN.

Yeah, it sucks that one of our favorite tech news sites was impacted by this, but how impacted were all those other customers?

It is easy to make a smartaleck comment about how Pair was trying to upsell by doing this, which is preposterous. Pair is a well-respected provider with many more years providing good service at a fair price than HN has existed, and I'd be willing to bet will be around after HN has peaked and begins to move back to the traffic load that might make sense on a shared system.

But the fault here ultimately lies with the folks running HN who thought it was wise or appropriate to host any of its content on a shared server that likely cost them less money per month than most of us spend on soft drinks in a week.

euccastro · on April 22, 2010

Strawman. Nobody's saying HN/YC should be allowed to overuse resources (then again, I don't know the terms they had agreed to). But not giving a warning is crappy customer service, no matter how many other folks do likewise, how many years Pair has been doing other stuff well, or however else you want to spin it. In this case, it also happens to be a big sales screwup.

mikestanley · on April 22, 2010

Well, you didn't identify what I said that you think is a strawman, but I'll point out yours.

Fine, nobody's saying HN/YC should be allowed to overuse resources. I didn't say anybody was saying that.

Not giving a warning probably doesn't count as great customer service, but then again, once the problem had been identified by Pair, and once they knew of the negative impact HN was having on every other paying customer on that server, what kind of customer service to those other customers would it have been for Pair to fire off an email to HN then wait an hour, or thirty minutes, or ten minutes, before shutting it down?

How long should Pair have allowed HN to impact other customers to satisfy folks here? And what makes HN more important than any other paying customer on that server?

Oh right, it's because you read and like HN, which, ironically, so do I.

As for it being a sales screwup, maybe. I kinda doubt there is a great deal of overlap between HN readership and the average potential Pair customer. We could also suggest that Pair taking action to protect all those other customers on the server is an example of how they would provide good service to the many when they're being hammered by one overpowering fellow customer.

bshep · on April 21, 2010

"We just figured out what caused the problem. Apparently Pair Networks' procedure for requesting that users upgrade to a dedicated server is to shut down their site without warning...."

I think you should upgrade to a more "dedicated provider" rather than a "dedicated server"...

But seriously, not even a warning?

eli · on April 21, 2010

I think this is pretty common. When Joyent shut down my shared account way back when, I discovered it in a comment in my httpd.conf

bwb · on April 21, 2010

Guys, shared hosting is designed for the 99% of non busy sites, say under 1,500 unique visitors a day, with the level of traffic HN is doing I'm surprised Pair didn't warn you earlier. A site this busy needs a dedicated server or virtual server.

That is why shared hosting is cheap, you start with it and once you are successful or starting to get slashdotted you buy something bigger that can scale.

bwb · on April 22, 2010

Also, just going to point out here that Pair.com has had a great reputation in the hosting industry, and has for the last 10 years.

I've been in the industry for 10 years and worked for quite a number of hosting companies, not Pair though, and when you have 150 shared clients on a machine and 1 client is causing the problems you do your best to deliver the warning before it gets out of hand but it is very hard to do.

rg · on April 22, 2010

When I emailed Pair.com to tell them that my site was about to get a writeup in the Wall Street Journal, they moved my site to a dedicated server within an hour, left it there until the spike of traffic was over, then moved my site back to a shared server--all at no charge. If you use Pair.com properly, they are more than excellent.

euccastro · on April 22, 2010

They could have a small pool of servers where they could move problematic sites until the customer has had a warning and a fair chance to react to it.

jseliger · on April 22, 2010

I don't think anyone will disagree. What they will assert, however, is that shutting down a site with zero notice is a jerky thing to do. As the other commenters have said, the nicer/smarter thing to do would've been to send an e-mail that says "Hey, you're using too much bandwidth/whatever." The smartest thing of all probably would've been, "You're using too much. Want to by more?"

archon810 · on April 21, 2010

This is almost as bad as the reason for my server's downtime recently: http://beerpla.net/wp-content/uploads/img_2958.JPG

My server literally had its plug pulled.

nkassis · on April 22, 2010

Are we at the point were we have to deal with robotic janitors too?

Geez, and I thought that watching the janitors while the work was over the top, now I gotta watch the rumba too :(

samd · on April 21, 2010

I'm not sure what is more ridiculous: that they disabled the site without warning or that they were too lazy to look at their logs or do a simple Google search to find out that Hacker News is real site with lots of regular traffic.

VBprogrammer · on April 21, 2010

What I really loved is that they just demonstrated a fair degree of incompetence to a site used pretty much exclusively by people who are very good potential customers! This site is a wet-dream for their marketing department.

What they should have done is upgrade the website to a dedicated server for free and let that news hit the front page.

nfnaaron · on April 22, 2010

Well, maybe upgrade for free, with a time limit, and an email saying "if we don't hear from you in x days, we'll shut 'er down."

a2tech · on April 21, 2010

Thats some good customer service there Lou. (Said in Chief Wiggims voice)

Seriously-they couldn't just email the account holder?

fletchowns · on April 21, 2010

This site was on a shared hosting box?

And of course they are going to disable it without warning if it's causing problems for all the other customers on that box...

pg · on April 21, 2010

Not this site, www.

There was no sudden spike in traffic. If they'd bothered to check the logs they'd have found that the load, whatever it was, was no higher than it had been.

guelo · on April 22, 2010

I'd imagine what happened was one of their other customers finally decided to call support to see why their site was so slow.

wvenable · on April 21, 2010

So were they just jerking you around then?

pg · on April 22, 2010

Not deliberately. I think they just don't have a very high standard of customer service.

anotherpaulg · on April 22, 2010

This makes me think of a little side project I've been chipping away at called InstaCDN.

It makes it easy to minify, combine, gzip and push your css, js and image assets into the Amazon Cloudfront CDN with far-future expiration headers. It also automagically detects background images referenced in your css, and puts them in the CDN. It rewrites the css to use the new CDN image urls.

It's all done through a trivial REST API.

Would love some feedback, and to find out if/how it's breaking any of your complex css/js.

http://www.instacdn.com/

papaf · on April 22, 2010

That looks really cool. I'll be in a position to give it a go in a few months. One thing that would make me hesitate though, is not knowing what the potential pricing would be when you move from non-free.

kordless · on April 22, 2010

Wordwrap on you page is pretty nasty on the iPad.

moe · on April 21, 2010

And the Darwin award, category marketing, goes to...

eli · on April 21, 2010

And this is why I don't recommend shared hosting for anyone

bwb · on April 22, 2010

Because everyone you know has one of the busiest sites on the net?

eli · on April 22, 2010

Anyone can get slashdotted. And my time is valuable -- I'd much rather pay more for better service.

bwb · on April 22, 2010

Ya but if your dad wants to put up his pictures you go tell him to buy a nice scalable hosting product instead of a cheap shared hosting account for $5 to $10 a month?

Xichekolas · on April 22, 2010

No. I'd tell him to get a Facebook/Picasa/Flickr/Twitpic account.

Dad used to get a shared hosting account for things like that, but now we have services that host content for free. They are much easier for Dad to use as well.

In this age of $20/mo VPSs and free content hosting, I'm honestly not sure how shared hosting survives. It's not as flexible as a VPS for hosting sites (and with more than two sites, isn't even as cheap), and it's not as easy to use (or as free) as Facebook.

patio11 · on April 22, 2010

I'm honestly not sure how shared hosting survives.

John Smith Plumbing's five page brochureware site doesn't need the headaches of having a VPS but also won't accomplish business needs on Flickr.

ahi · on April 22, 2010

You can get a nice scalable hosting product for $5 to $10 a month.

CGamesPlay · on April 22, 2010

The real WTF here is why HN considered it appropriate to run a production site on a shared host. Shared hosting is a bad idea all around if you are even slightly concerned about reliability. You don't control the server, so any 60 year old woman running a photoblog with a vulnerable WordPress version can bring your site down just by getting hacked.

RyanMcGreal · on April 22, 2010

The email sounds pretty generic and policyish, which leads me to suspect it was auto-generated rather than typed out by some hapless sysadmin who's never heard of HN.

InclinedPlane · on April 22, 2010

There are too many grammar errors for it to be auto-generated (I hope).

rrhyne · on April 21, 2010

Really, we are all guilty. If we'd just get back to work, HN wouldn't have this problem. ;)

jqueryin · on April 22, 2010

I actually find it quite embarrassing that the guy had no clue what HN or YC were. He had absolutely no idea... and he works for a hosting company. It's quite a shame. I really got a kick out of the fact he questioned the legitimacy of traffic as well.

ramchip · on April 22, 2010

I don't really see why every hosting company employee out there is supposed to know about a certain small american investment firm started in 2005 and its associated news aggregator... we're probably an order of magnitude or so less popular than Reddit, which is itself rather niche. It's a small dot in the web world.

MikeCapone · on April 22, 2010

There are probably a lot more hosting company employees out there than there are HN regulars (or even occasional) users.

stuntgoat · on April 21, 2010

So your network traffic is guilty and sentenced until proven innocent.

dirtyhand · on April 21, 2010

The thing I can't believe is that nobody at Pair Networks that was involved with this actually knew about Ycombinator or Hacker News... otherwise they wouldn't have shut down such a popular site. Somebody needs to hire real hackers

bwb · on April 22, 2010

do real hackers like working customer service :)? customer service is 80% of a good hosting company.

dirtyhand · on April 22, 2010

Somebody had to pull the plug, and it wasn't some customer service guy with a headset...

bwb · on April 22, 2010

Sure it was, customer service just means a tech who can talk to people.

ck2 · on April 22, 2010

To be fair, if I was another customer on that shared host and you made my side crawl through no fault of my own, I'd want your site disabled asap too.

VPS or cloud is dirt cheap these days, no excuses for not planning growth?

ivankirigin · on April 22, 2010

If I ran biz dev at _any other hosting company_ I would offer to host the site for free in exchange for a little logo on the bottom of the page.

_pius · on April 22, 2010

Complete failure on the part of Pair Networks.

feverishaaron · on April 22, 2010

"Biting the hand that feeds you" seems like an appropriate analogy for arbitrarily shutting down this site without warning.

mstevens · on April 22, 2010

I'd suggest this as a business opportunity, but I suspect non-sucky shared hosting is a hard sell.

marcamillion · on April 22, 2010

Hrmm....HN was hosted on a shared server. That's pretty impressive..if I must say so myself.

voxio · on April 22, 2010

No company worth their weight should be using off the shelf shared hosting.

dnsworks · on April 22, 2010

HN has been pretty sluggish for a little while now. If you rewrote it in Rails you could it host it on Heroku and get back to not worrying about scaling!

alnayyir · on April 22, 2010

Please tell me you're not serious.