Hacker News new | past | comments | ask | show | jobs | submit login
Trello has moved to AWS (fogcreek.com)
124 points by gecko on Oct 31, 2012 | hide | past | favorite | 67 comments



There's a little bit more information in the last status post, plus a picture of Fog Creek cofounder Michael Pryor hauling 5-gallon jugs of diesel up 17 flights of stairs. (We have other employees down there, too, and I know Stack Exchange's NYC employees were excited to help out, so they're probably there by now, too.) http://status.fogcreek.com/2012/10/diesel-bucket-brigade-mai...


When FogBugz on Demand launched, Joel touted 2 data centers[1] , in NY and LA, "for the once-in-a-lifetime case of an entire data center blowing up". What happened to the second data center?

[1] http://webcache.googleusercontent.com/search?q=cache:lHEK939...


No need to post a link to the Google cache. The original source [1] contains the same text. I don't think that there is much risk that they might ninja-edit the five year-old announcement to change the claim about redundancy.

My guess is that the second data centre was there, but failover didn't work for some reason. It would be good to know the reason, though.

[1] http://www.joelonsoftware.com/items/2007/07/09.html

Edit: Politeness


Apologies, the link was copied from an unrelated post elsewhere and I didn't really look at the URL, no slight of hand was expected by them.

I think your guess is probably wrong though, in a status post 2 days ago they listed[1] the precautions they were taking so that they could be "confident that all of our services will remain available to our customers throughout the weather", and they did not refer to another data center, only "Our data center". That would be the ideal place to mention any fall-back centre.

They are also resolutely not replying to public posts on various boards questioning these issues. I am cutting them some slack in that they have a lot to attend to right now, although the fact they are managing to find time to reply to some threads (like the one in which this is posted) is a tad sniffy.

Once they're back up and running I'm sure they'll update us all on why their redundancy failed or magically wasn't there.

[1] http://status.fogcreek.com/2012/10/feelin-fine-no-expected-d...


Just so we're clear, I have time to reply to these because I'm stranded in LA. That's also why I have to go back and edit my posts fairly regularly when I get better information out of NY. Pretty much everyone in NYC, such as the ops team, who could answer your questions, is very much not here right now. That's why you're getting odd silences: I'm happy to answer things I know or am involved with, but not to speculate.


I appreciate that you don't have all of the info on whats happening, and that those who do are a tad busy right now.

However, as I understand from your profile you are a coder on Kiln[1]. Are you really telling us you don't know whether there was a backup datacenter or not?

If there was, I understand you might not be able to tell us why it didn't catch the fail-over, or if not, why there wasn't one in the first place. However basic info like whether a backup existed in the first place really seems like it should be within your realm of knowledge.

[1]http://bitquabit.com/page/about/


No need for apologies. My comment was unnecessarily accusatory. I've edited it accordingly.

Re their status posts: you may well be right. Time will tell I guess.


I'm sure you've thought of this, but now that you're up and running, isn't it possible to buy/hire a pump, run it off the electricity from the already running generators, and pump it up from street level instead, instead of walking up stairs with buckets?


They probably (only) need a couple of gallons per hour.

If they can't tie into their existing fuel infrastructure, hauling it up the ~150 to 200 feet of stairs bit by bit is probably easier than trying to source and install something temporary.


It's very difficult to pump liquids up long distances. You can't suck liquids up more than 25 feet high, and while you can push it higher, you need a big pump and strong hoses/pipes, etc. Practically what you end up with is a series of smaller pumps. The water supply to tall buildings is a difficult bit of engineering on its own.

Anyway, the short answer is no, it's not at all easy to pump anything up 17 stories. You're talking about 8 or more pumps and appropriate hoses curling up the staircases, making them impassable... but people need those stairs.


They've thought of that. Kind of fun to find new sets of industrial pumps post-hurricane though.


The pump is underwater.


OP is suggesting using a new pump to pump from the trucks, in lieu of buckets - not the pumps that are submerged.


Are there any safety issues associated with hauling a bucket of diesel up a staircase? I know it's not as volatile as gasoline but it's still not exactly the most fire-safe substance. Are they literally 5-gallon pails, or are they sealed containers?


    Are there any safety issues associated with hauling a bucket of diesel
    up a staircase?
It's flammable and slippery. Wear boots, and I suggest this is a good day to quit smoking.

    I know it's not as volatile as gasoline but it's still not exactly
    the most fire-safe substance. Are they literally 5-gallon pails,
    or are they sealed containers?
They are open buckets we store the cleaning supplies for the fish tank in. And also now gasoline.


A million times this.

Diesel is extremely dangerous stuff. I nearly blew myself up a few years ago by not cleaning out a diesel tank with detergent properly that I was welding. Diesel residue vapourised and went boom. Fortunately there wasn't enough in it to create a large explosion but it required new underpants.

To add to that, my only car accident was due to having diesel on my shoes. My foot slipped off the brake and I rear ended some poor guy.


After Hurricane Andrew, my father and I needed to siphon gasoline from one of our cars to use in our generator. Lacking a proper container, we siphoned the gas into a 5 gallon plastic water cooler bottle. The gasoline promptly ate through the bottle's seam and spilled out, dissolving a good size chunk of our asphalt driveway. Lesson learned: use only approved storage containers for gasoline.


> They are open buckets

So one guy slips and there's five gallons of diesel covering what I assume is the escape route of a NYC high rise?


Indeed, can't think of it as anything other than irresponsible, bordering on downright dangerous.

Hacking software is one thing, but when that goes wrong it just leaves crashed servers, unresponsive web apps and 404/500 errors.

And, before someone mentinos it, hauling diesel up stairs to keep a hospital generator going (as mentioned in another story) is a whole different situation. Patients are a bit more important than a bunch of servers. But you'd still evacuate all non-essential staff and patients if you were doing that given that it presents a significant danger if something were to go wrong; especially with the emergency services already dealing with a huge workload.


It's diesel. You'd have to heat it before you would be able to ignite it.

But yes :)


Exactly. You can't light diesel on fire with just a match. You need more heat or pressure. However, the burn from gasoline or diesel on your body is intense. I was once filling an old junker with gas when it spurted out from that tank and all over me. At first it was funny and felt nice and cold on the body. Then the burning sensation started and it took a solid 25 minutes in a cold shower to calm the sting.


Ignition isn't the only risk. Diesel causes severe skin irritation. I'm not sure about the effects of inhaling fumes from split uncombusted diesel, but it can't be good for a person's lungs.

Potential hazmat incident.


Diesel doesn't vaporize to any significant degree at temperatures experienced on Earth.


Good on you for getting up and working to keeps things running.


I would highly recommend no longer using those buckets for fish tank cleaning. At least not for putting water back in the tank. :-)


Perhaps the Fog Creek staff is just really fit, but it seems like it would have made sense to hire some burly mover dudes to haul gasoline up stairs.


Play some Zumba music and charge for it.


Wow, that's going beyond the extra mile. I'm pretty far from NYC but if I were closer I'd give you guys a hand. In lieu of that, any possibility of a "Trello Hurricane fund" which we can donate to? Would give appreciative users like me a chance to help pitch in for a team and product which has already helped so much. Maybe it'll amount to just enough for lunch or something, but I for one would feel much better knowing that I contributed a little...


I think there are many, many better places to donate than a profitable company who had to do a little hard labor because they didn't have a second datacenter like everyone suggests.


The point is, as a developer myself I feel their pain, sometimes shit just hits the fan. As a user, not only is their service amazing, it is integral in running my little startup. I'm not saying monetary compensation is the only way to say thank you, but I think it would be nice if others like me got to say thank you in a slightly more substantial way, for the guys going the beyond the extra mile. There are services that I pay for which were down and they didn't even bother. More like buying a friend who helps you move some lunch.

I'm all for donating to red cross and other charitable organizations, but that doesn't mean you can't show anyone else appreciation just because there are others with needs in the world. Give to both, hopefully more to the charitable organizations and a little less for token gifts.


"Trello is in the process of being moved to Amazon AWS where it will not be affected by further data center issues."

I guess that's called the power of positive thinking?


Oh, the irony. Interesting they didn't move to Azure though.


Unlike their other products trello isn't .Net based. It runs on a node.js/mongoDB/Redis (and I'm guessing Linux) stack . My understanding was that they're using Trello as a test platform for playing with and evaluating different new technologies.



"Let us redefine progress to mean that just because we can do a thing, it does not necessarily mean we must do that thing."


So they kind of move some business away from Peer 1 after all this bucket brigade job? Not nice.

I hope they move Trello back to "real" servers and not this AWS, once the datacenter is stabilized.

I think that AWS is overrated and overpriced and it's a big asset of FogCreek to still use real data centers, after all they started this thing before AWS even existed. I don't know if StackExchange or FogBugz would run so good on these AWS-servers with bad IO.

It's great though if they now have the chance to quickly overflow Trello to AWS in case of problems or high demands and buy a new correctly-priced, powerful server to put at Peer 1 a week later!


..or the power of a really long night with very sore muscles.


Would love to hear about what it took to transition to AWS. Any chance Joel would divulge some of the driving factors (reliability being one)? How about cost? Unlike my paltry webapps, trello gets a lot of activity and no current revenue stream, so I'd love to know more on what the cost difference is between running your own vs in the cloud. How long did it take to move over? Are the old servers being used as backups in the event Amazon has their very rare aws outage?

I know that's a lot of questions, but heck it would be an article I'd love to read and something I believe would be very relevant to the community as well!


> Are the old servers being used as backups in the event Amazon has their very rare aws outage?

AWS is essentially acting as a backup of the old servers, as the datacenter is in danger of losing power.

That said, the cost, reliability, etc. had been carefully thought out, albeit as a somewhat longer-term project than it ended up being. More info will certainly wait at least until the current state of things is cleared up - Fogbugz and Kiln are still in the datacenter.


Thanks for the info. I know things must be pretty crazy right now. Quite literally in the trenches. Just goes to show what an amazing team behind an amazing product can do. Kudos!


Not entirely related, but regarding the backup fuel situation: how long does 1 "bucket" of fuel keep the power going for and what is the cost of 1 bucket of fuel?


This information is from earlier today, so I don't know how accurate it is, but:

  - They burn 44 gallons per hour
  - The tank adjacent to the generator holds ~400 gallons
  - Gas is currently VERY expensive, but I don't have a real number for you at
    the moment.


Diesel, on-highway in New England, is averaging $4.205/gal[1] (probably more in the city, especially more in a disaster situation). Applying your information to the above question and assuming they can get just shy of five gallons per bucket, it's probably a fair estimate that each bucket carries more than $20 of diesel fuel and accounts for seven minutes of generator time. So, about $3 per minute or 5.1 cents every second.

Really puts things in perspective.

[1]: http://www.eia.gov/petroleum/gasdiesel/


I got a similar feeling at my last^2 job when we had a client run a superbowl ad, for which they were paying a million dollars. That works out as something like $30/millisecond.


FYI: #2 heating oil and diesel are pretty much the same thing, the only differences being dyes in the heating oil (since it's often taxed differently) and additives in diesel (for example in the winter to stop it waxing up in the cold). Not sure about sulfur content differences when comparing #2 to the low sulfur diesels you see, but I doubt a few hours of burning higher sulfur fuel will do anything.

Also, #2 heating oil is cheaper than diesel in New England, it's about $3.70/gal.


Nit-pick: These generators burn diesel not, gas.

And to the poster below: 500kw generator is not unreasonable for a fully populated datacenter. That is power and cooling, remember.


They really need around 1/2 megawatt?

Wow.


Be interesting to see how this was migrated, method, what had to be changed, lessons learned etc.

Would also be great to see a follow-on report after a month or so of lessons learned from hosting on AWS vs own infrastructure.

I will also be keeping an eye out to see if it remains as responsive.


What would be really interesting is if they thought of a Hybrid-Cloud approach. I recall Joel talking about adding a new server every week due to Trello's ever growing popularity during his presentation at Startup School. Being able to spool up on-demand should really help with performance or for times when the datacenter is flooded. But wouldn't it be more cost effective to run it on your hardware during off-peak times and spool up some new servers on AWS when performance starts to lag?

Disclaimer: I'm a Trello fanboy, it's like having container classes in STL, only more visual!


Technically, it is running hybrid now, albeit for somewhat silly reasons: the mail server is still at Peer1.


Yeah it's hard to migrate email given the anti-spam measures ISPs use like IP reputation.

Good luck. Maybe a quick migration to Sendgrid for that system? (Edit: or Amazon SES, of course)


Unrelated to the post, but I hope you guys are all doing okay out there.

I have a huge amount of respect and appreciation for the level of commitment you have for your customers, despite these horrific circumstances. Stay safe!


I'd be interested to know whether there has been a significant surge in demand for AWS since Sandy hit.

Having the cloud as your back up makes a lot of sense but if everyone did it could it cope with the additional demand? Sure the AWS infrastructure is massive but I'm guessing the load that would be dumped on it if a significant chunk of New York's web hosting was suddenly moved across would be massive.


Haven't we seen more AWS outage issues than "east coast shutdown"-level storms over the past, oh, say, 5 years?

EDIT: Thinking now this may be a temporary move, not a permanent one. As such, it probably makes sense to have a tested process to be able to move between cloud/dedicated/whatever as quickly and painlessly as possible.


We've seen many people who failed to read AWS' documentation and put all of their services in one data center. If you follow Amazon's guidelines and used multiple AZs (i.e. if you use RDS check that box), you've had very little (~20 minutes) downtime; if you went multi-region it's even less.

It's not like this is a well-hidden secret - head over to AWS's whitepaper section

http://media.amazonwebservices.com/AWS_Cloud_Best_Practices....

“Be a pessimist when designing architectures in the cloud”

http://media.amazonwebservices.com/AWS_Web_Hosting_Best_Prac...

“As the AWS web hosting architecture diagram in this paper shows, we recommend that you deploy EC2 hosts across multiple Availability Zones to make your web application more fault-tolerant.”


Do you know something that Netflix doesn't? They avoid EBS dependencies for a reason:

http://techblog.netflix.com/2011/04/lessons-netflix-learned-...

http://techblog.netflix.com/2012/10/post-mortem-of-october-2...


Avoiding EBS - or at least planning seriously for how you'll handle failures, as with any storage system you use - is a good idea but it's unrelated to my point.


But... didn't we also have some comments before about how Amazon's own control panel systems are all in one data center? If Amazon themselves don't 'get this right', perhaps it's because it's too hard, and they need to take steps to make this easier (or indeed automatic, and charge a premium)?


Those comments were in the wrong, and they were posted because parts of the control panel stopped working during the last outage. This was actually due to the throttling of the API, as mentioned in their write up.

AWS is resilient enough so long as you dont park all of your environment in the same AZ/Region.


To be fair, some of those AWS outages were "east coast shutdown"-level storms.


I wonder, how technically can you move something like Trello to AWS in such short period of time? Do you think they had a plan prepared for such contingency?


The real question is "is it permanent?"


Did they move to AWS to ensure that their services go down more efficiently next time there's a hint of a natural disaster?


You do know that AWS did not go down during Sandy, right?

/me knocks on wood


Well, but it's not like AWS didn't have its share of downtime in the last couple of weeks. Back then the recommendation of the arm-chair-admins was to move to dedicated hardware you own. Oh the irony.


Not necessarily. If they only have their setup in a single AWS datacenter, and it gets hit by a hurricane, then you still have issues.

The real issue is geographic redundancy, regardless of using AWS or dedicated hardware.


With the last round of failures some of the people affected reported that they couldn't start up instances in other datacenters since those were overloaded. Still, I agree - the underlying problem is cross-datacenter redundancy but that's a very difficult problem to tackle and might just not be economically feasible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: