So I'm assuming they have determined that S3 is cheaper than their own metal for actual file storage?
With S3, they don't have to physically own enough hard drives to support every user maxing out their allotment. If I started DropBox tomorrow, and a million people signed up for free accounts with 2GB, S3 would cost me $0 at first. As more and more users start to fill up their accounts, it will make more economic sense to migrate in-house (a la Backblaze storage pods).
Surely it's the same with bare metal. Companies who store client data do what is called capacity planning, where they estimate how much space (or compute power, or whatever) they will need for the next N months, and buy it on a rolling basis. Nobody would actually purchase 2 GB of physical disks every time a 2 GB capacity signup occurs.
Whether it makes more sense to bring it in-house depends on some other factors, such as whether Dropbox is willing to cut some corners relative to what S3 provides (i.e. maybe there are S3 features they don't need), and how dedicated to budget trimming their employees really are (some people just don't have the stomach for cutting costs way down, e.g. if they see it as compromising in some way).
That's definitely true, that you wouldn't buy all your capacity upfront, but you would need to have a minimum amount of capacity on hand upfront, and to do that in multiple datacenters and build the software to manage it would cost money. That explains why they started on S3 but it doesn't explain why they are still there now. looking at the economics of it, I wonder if Amazon hasn't cut Dropbox a huge discount to keep them around. But even still, with $250 million in the bank, I bet the discussion to move in house is happening right now. After a certain point (X users, Y revenue, Z $ VC money), I would look at hosting in-house. http://blog.backblaze.com/2013/02/20/180tb-of-good-vibration... looks like an interesting solution to storage.
> I wonder if Amazon hasn't cut Dropbox a huge discount to keep them around.
I can't imagine why they would. Dropbox dedupes all their data before hitting S3, but Amazon can still leverage their bulk hardware discounts that Dropbox likely isn't big enough to score. Amazon would have the upper hand in such a deal, and I don't see why it would be friendly.
AFAIK those boxes are and have usually been tuned for lukewarm or maybe even lukecold storage to cut costs. Dozens of 5400 rpm drives loaded on bare minimal support planes have a genuine rating of Shit for reliable seek times. It's great for backup, when you only expect a small fraction of users to actually make a query and be thankful enough they're pulling down a recovery. For the kind of direction Dropbox wants to take, this is absolutely out of the question.
Dropbox hosts the metadata on much faster hardware. When you find your file, seek time on the file system is not as important as transfer speed, and over the Internet, the network is your bottleneck, not your spinning media.
"User metadata is stored in the company’s data centers, while the actual files reside on Amazon’s S3 storage service."
So I'm assuming they have determined that S3 is cheaper than their own metal for actual file storage?