Hacker News new | past | comments | ask | show | jobs | submit login
Dropbox saved $75M over two years by building its own infrastructure (2018) (geekwire.com)
268 points by simonebrunozzi on Nov 18, 2020 | hide | past | favorite | 212 comments



I think this quote really hits the nail on the head and confirms what a lot of people may have intuitively known about the value of cloud providers:

  But once certain startups turn into big companies with hundreds of millions of users, with computing needs that they’ve come to intimately understand, it can be far more efficient to set up computing infrastructure designed exactly with those needs in mind.
I think the main advantage of cloud providers is to offset the risk of purchasing equipment that eventually is no longer needed, which is ideal for younger companies that are still trying to reach their market capacity or unsure about whether they'll still be around in a year. Of all the things on a startup's todo list, I can't imagine setting up their own infrastructure is the best way to improve profits or revenue.

But once the constraints of a userbase are more established, it should be easier to migrate off these platforms, since their pricing is optimized for users of all business sizes and use cases, whereas your specific hardware can be optimized for your specific users.

My biggest question is whether cloud providers could achieve a scale where they are able to offer the most optimal infrastructure costs for specific businesses. Maybe this is the case for smaller or mid-size companies, but I'd be interested to see where the inflection point lies.


You’ve described the historical value of cloud computing perfectly. That said, I think the days where all but the largest or most stubborn companies run their own datacenters are coming to a close.

The problem will be finding skilled labor. Short-haul networking, power configurations, thermal load, hardware maintenance; these and many more are specific skills that can’t be learned overnight. Data center work used to be a viable middle-class career, but the pay scale for it has gone down and down. Companies that do run their own DCs like Google and Facebook have a few centralized experts, a thin professional staff on-prem, and an army of minimum wage disk swappers who are told what to do by a ticket system, just like an Amazon warehouse worker. The knowledge of how to build and run these things is all at the top now.

I’m not saying the jobs or talent pool are gone. Just that they’re shrinking, and will continue to shrink. Like the manufacturing industry, the fewer people there are who are comfortable working with real hardware, the harder it will be to start anew.


I'm not in the web or cloud business, but I've filled a rack with my stuff before. My impression is that hardware has become a lot more capable even relative to its tasks. With high iops memory, many cores and obscene amounts of RAM, I would expect companies of a much larger scale (in $, FTEs, or most other metrics) can be served by one 4HE machine, or by one rack, or by one room. Thus I would expect the knowledge of how to handle 5000 hard drives to become more obscure, naturally, but the skill to run a decently sized web application to remain almost constant.

Does this math work out, or have the tasks become more demanding at the same speed that hardware has improved?


IMO your assertion is validated by the excellent overview of Stack Overflow's infrastructure given here:

https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

Very few web apps will ever serve as much traffic as SO.


SO doesn't have a very operationally complex app.

A bank running 50 different services, on different platforms, with serious audit requirements, physical and logical access control, strict change and configuration management, etc., has two orders of magnitude more complexity. And that shit is very expensive in manpower.


"Very few web apps will ever serve as much traffic as SO."

Their traffic is like 80-90% reads and they actually hire good devs and let them work on perf.

Neither of those things are true in typical companies.


There are now businesses that explicitly depend on the elasticity of the cloud and can never really be moved on premise without massive up-front investment in hardware that may only be used a few times a year for their biggest customers. Trying to hybridize these workloads hasn't been very successful as of yet. It is possible that K8S could relive this problem but I haven't seen it in practice, at scale.


Instant Elasticity in Cloud is a myth. If you think you are going to get 1k hosts just like that from AWS you will have an unpleasant experience.

I work at one of the decent size tech company and we are split between cloud and on prem. From our experience you have to inform AWS/GCP in advance (sometime way early) if you are looking to meaningfully increase capacity in zone/region.

Sure, auto scaling few hundreds of hosts may be possible but people who run a service which needs few hundreds of hosts run it directly on AWS, they will run it some kind of scheduler+resource manager which will have some kind of operational buffer anyway (as in you would already have those hosts so cloud elasticity is not a factor here).


How early is "way early"? Because as long as it's shorter than the two-three weeks it'd take to order boxes, rack them, provision them (which would be automated but might still take a afternoon), deal with any QA hiccups... I'd much rather call my AWS rep and say "can we add 30% by Thursday" and have them figure it out (and at such a large scale you might be able to spread it out across a couple regions anyway unless you only serve a specific part of the world).


From what I have seen it is actually of the same order or sometimes more. In one of the region/zone we add few hundreds hosts every week but that is after telling them we plan to upscale this in this region upto some big X number.


"Instant elasticity in cloud is a myth"

This times a million. I think SQS standard queues are probably the only thing that IME actually fulfill that promise.


This is the same with disaster recovery too. The idea that "oh, our main DC went down, we'll just spin it up in another region" is great until you realize that means you need reserved instances in another zone, that just like another physical DC, you won't be using.


Why not go fully on-prem then? You can run kubernetes locally.

Are managed data stores that attractive? You can pay for on-prem management.

What workloads are in the cloud versus on-prem?


Right now there is no specific distinction between what we want to run in Cloud vs On Prem. Important thing to note here is we use Cloud as an IaaS only. We have our own stack which sort of prepares the hosts before it is ingested into clusters as usable capacity.

We actually recommend not using custom cloud providers Databases or any other value added services.

Why not completely either way (on prem vs cloud) is something that happened way before I joined the group but I think the main reason is to have a tactical edge in the long run such that we avoid lock in. I guess in some ways it helps us negotiate pricing better.

Imagine moving a certain workload from GCP region to an AWS region as part of a failover drill.


As these are generally scheduled events, end of year, end of quarter, etc. they can be planned. Beats owning the machines.


Elasticity? Fine. So their, say, single rack will sometime have limited load and be under-utilized.

About the up-front investment - most hi-tech companies are a massive initial up-front (or nearly-up-front) investment.


I was talking about at scale, not a rack. If you can get by with a rack, you will pay more for the people to support it than the incremental cost of the cloud.


> If you can get by with a rack, you will pay more for the people to support it than the incremental cost of the cloud.

Probably a whole lot less.

At larger scale - I would guess it's the same thing. If an organization needs more than a rack during peak use, it can probably benefit from setting up its own infrastructure. Only in the uncommon case of short extreme peak use and almost no use most of the time does such elasticity make a could solution attractive. IMHO.


That is very common with the infrastructure startups that I work with, like Snowflake and others.


Don't the various clients even out the usage?


> My impression is that hardware has become a lot more capable even relative to its tasks.

Indeed. The margins are bonkers high. As an example, the amount of ram that you can stuff into a physical machine has at least doubled in the last five years, but the price of the average virtual machine has not.


You still want ha, failover, and disaster recovery. Then you need to set up stuff like bgp, dns, security rules, etc, etc etc. Complexity mounts pretty quickly.


Indeed. It seems that most of the people saying that cloud hosting is expensive have never run into the issues of making their own SAN, managing the provisioning of 20 different teams, etc.

The organizational complexity and specialist knowledge is mind-boggling and there is zero chance that your in-house knowledge is better than what Amazon can provide.


This is true, but unrelated.

We're talking about Dropbox scale.

At that scale you can (nee should) hire all the specialists you need.


Installing rack servers and setting up services to run a site used to be a sort of rite-of-passage 15-20 years ago, but that time period of the web was different. Still, I would consider basic familiarity with the infrastructure necessary also today.

Increasing hardware performance relative to task load created the rational for virtualization. Virtualization also turned out to be rational with respect to consistency, convenience, maintenance, and so on. At that point, outsourcing to a cloud can be rational.

But fewer people get hands-on experience with the infrastructure, and it sounds like many consider it almost mythical. For example, realizing the amount of work that can be done in 4U today. What does amazon charge for 96 cores and 256GB?


It's not just managing the complexities of managing the bare metal, although that's certainly a huge component of it.

There's some other huge arguments against running your own datacenters.

One is being able to properly provision resources. Being able to write just a function and have it consume just that tiny amount of resources rather than a whole VM is huge. Being able to spin instances up and down as you need them is huge.

I think that's been obvious for a long time, but what I've seen less obvious to the business analysts is the impact of more advanced cloud services. The direction of cloud computing is managed services where they run your databases, container platforms, etc for you. Trying to run a huge Cassandra cluster or Kubernetes cluster takes up a ton of expensive labor's time and there's a good chance the cloud providers are a lot better at it than you.

Sure, cloud services tend to be really expensive, and besides cost, there's also concern about things like vendor lock-in, IP protection/data privacy, and ability to tweak the small details of your platform. But cloud platforms in 2020 have a lot more features than in 2010 and in 2030 will have even more. The direction is pretty obvious. Running your own datacenters will be about as common as running your own power plant.


> One is being able to properly provision resources. Being able to write just a function and have it consume just that tiny amount of resources rather than a whole VM is huge. Being able to spin instances up and down as you need them is huge.

A 75 million dollar price tag is also huge.

Bothering about the operational impact of a VM or a request sent to a function-as-a-service might be a significant operational issue if your whole team can be moved around with a small sports utility vehicle.

Once you've grow over the point where your monthly cloud pricetag eclipses your company's paycheck budget, operating your own hardware is a no-brainer.

> But cloud platforms in 2020 have a lot more features (...)

That really doesn't matter at all, does it?

I mean, cloud providers are already repackaging FLOSS services as their Serverless offering.

And besides pursuing the latest fad, how many of those features are killed off and vanish from the face of the earth?

It's always great if we can get others to do the work for us, but if we consider the absurd premium charged by cloud providers for their services... Well, those "others" doing the work can be employed by your company and you still save money.


> 75 million dollar price tag is also huge.

Is it? Over the two years they saved, this is about the salary for 100 engineers. Can you replace and maintain all the cloud aspects that AWS provides you with (I mean the ones you actually use) with 100 engineers? Maybe, if they are good engineers (which is kinda implied by the 300k salary tag in the calculation). If it's worth it, remains to be seen. Definitely nothing for any medium sized company.

Dropbox is huge and has a relatively simple, highly optimized use-case, for which cloud perhaps doesn't offer too much. This is NOT the norm. For most companies, no matter the size, building their own cloud is a no go.

> those features are killed off and vanish from the face of the earth?

Don't use Google Cloud then ;).

> but if we consider the absurd premium charged by cloud providers for their services...

Do you have any data on backing this up? This "absurd premium" includes the salaries of engineers to develop it, maintain it, do DevOps, keep the hardware/data centers, do marketing, etc. etc. There is of course a margin, these companies aren't doing it as a social service... That margin is highly variable from service to service and also between cloud providers. Some may not have a margin at all, others may run at a loss. There is not easy "uh everything is overpriced". Most companies will have a VERY hard time providing the offering at the price of large cloud providers. And the simple "back of the envelope" calculations often miss all the work & cost that needs to be done, but you don't know about...

> Well, those "others" doing the work can be employed by your company and you still save money.

Yeah, if your company is really big, then yes. If your revenue is below 100 million, there isn't even room for any discussion on this: Don't run your own cloud, it's not gonna work. Most of the "cons" I see are about misunderstandings of the offerings and failure to navigate the pricing models and picking the cheapest offerings that do the job. If you fail to do even that, how on earth are you going to run your own cloud?


>Definitely nothing for any medium sized company.

I work at a medium sized company. Depends on who you count, but let's say around ~30 devs.

Recently we basically did just this, and it's been a great success. We haven't fully migrated and still use AWS for prod, but have seen substantial savings already.

We spent $2k on servers, Dell r720s. We bought a UPS and mount, and racked them in our office. I installed OpenShift 4 on it, which is Red Hat's Kubernetes offering with a nice web GUI, and setup a few terabytes of NFS to automatically provision storage.

To be fair, installing OpenShift for the first time took a while, around 3 weeks. Since then it's been smooth. We still use AWS, but our usage has gone down dramatically. We are still only migrating dev and test environments, leaving prod in AWS (we don't want to be responsible for uptime SLAs, and clients pay prod hosting costs). Some of these projects are CPU heavy, machine learning and computer vision projects too. They're not just simple web-apps. I'm not privy to our entire AWS budget, but I know that one project which we migrated saved over $500/mo.

After installation, maintenance has taken barely any time. Around 10-20% of my time is dedicated to OpenShift cluster maintenance. The rest I do normal project work. I often go weeks without having to touch anything, and the most common task I do is onboard new users. We've had 2 outages in over 6mo, one was an expiring cert and one was an airflow issue on the rack. I've learnt a lot and am certainly not an expert. These were the firs rack servers I'd ever worked with personally, although I had been researching used models for home use for a while (shoutout to /r/Homelab).

In fact, I had such success doing this that I personally bought a Dell r720 and have used it to selfhost a bunch of stuff at home. A co-worker of mine hosts his self hosted lab on AWS. Things like Plex, private photo storage, a few other toys, etc. He says he pays $300/mo, which seems insane to me, but I guess people streaming 4K plex adds up. The used r720 server I bought was $1,500CAD and has way more horsepower than he's paying for. (There are also electricity costs I haven't factored in here, as I'm trying to control for other changes in my power bill. Might be $100/mo at most.)


This post proves parent's point though.

You're not doing anything even remotely close to the features offered by cloud providers or even managed hosting providers.

Disaster recovery? Geographically separate redundant servers with failovers? Automated (and proven to work) backups? One-stop access control for infra maintenance? Audit controls for your database and storage objects? Tape backups?

Even today to support all those things you need a small army of specialists. Granted, a heck of a lot of things can get away with not having any of this. But the use cases are out there and hosting and maintaining all of that in-prem is another different level.

I understand your use case, but your is very, very far from the sheer and absolute complexity and features that enterprise data centers have.


> You're not doing anything even remotely close to the features offered by cloud providers or even managed hosting providers.

So what?

Who in their right mind believes in, say, you need to operate and maintain half a dozen types of RDBMS in three flavors along with two or four or eight different message brokers and your own convoluted infrastructure-as-code multiplied by three along with a repackaged FLOSS offering... And a ground station?

Let's not be mad, here. There are proper, full-blown, popular, global-scale cloud service providers. That. Only. Offer. VMs.

Are we so drunk with corporate kool-aid to believe that we are missing out because we are missing... What do you believe you're missing, actually?

I repeat: there are popular professional cloud service providers whose business consists of providing either VMs or access to bare metal. That's where real-world companies run their real-world businesses. Why are we supposed to believe that you need more to operate your own stuff?


You are assuming that that vast majority of shops have the capacity to impose a very limited number of technologies, and secure them through common best practices.

This is about as far from the truth as I have experienced in life.

Fortune 500 companies have an innumerable number of platforms for software, use hundreds of products from dozens of vendors, many dead long ago. Same thing with governments, at every level of scale. Telecoms? Utility providers? Medium-sized businesses who are not in tech? Specialist software that runs in a basement rack and that eventually gets moved to a datacenter and compliance requirements begin demanding all the bells and whistles I just mentioned.

Without a doubt there's a lot of gross compute power that lives on the VMs you just mentioned. But all their financial processing is probably about a fraction of what some AS/400 or mainframe doing a nightly batch job, with software running from decades ago and licensing costs going into 7 figures a year.

What you're asking for just doesn't exist. You can do what you're mentioning across, maybe, a single product line and a half-dozen teams. But even that company needs to use CRMs, ERPs, and custom stuff for which you cannot possibly define platform requirements on your own, limited, terms.

A customer that I used to admin their Unix servers on had software on IBM mainframes, IBM AS/400s, Solaris, AIX, two SCO Unix machines running some proprietary hardware control plane, a few thousand Windows machines, etc. You want a "real" ERP product? It's gonna run on Oracle or DB2, forget about Postgres. That app you made 15 years ago running on MySQL with the ISAM storage engine? Forget about ever upgrading that. Need to interact with banks? Holy smokes have I got bad news for you. You need software to interact with medical records that requires special legal compliance across multiple jurisdictions? Well, no one cares what that runs on as long as it keeps the millions rolling in.


>Disaster recovery? Geographically separate redundant servers with failovers? Automated (and proven to work) backups? One-stop access control for infra maintenance? Audit controls for your database and storage objects? Tape backups?

These are our dev+test setups, and we're looking far more carefully at prod for the reasons you touch on. Those aren't necessary for every project too, eg hosting computer vision demos.

For our government projects, the government hosts it on their own OpenShift cluster that they maintain (including their own data centre), due to requirements for all data to be hosted within our boarders. The OpenShift cluster I setup is no-where near as well maintained as the governments, they have multiple FTE and it runs most of the open source gov't code. They have tape backups, rolling on-call staff, public developer chat for support, the whole deal.

What I setup is far more simple. We have daily/weekly/monthly rolling backups of postgres pods. We store some backups of those on digital ocean, but that's just a cheapo litttle linux server.

But now a team of 30 developers can easily spin up their own projects using a web-based GUI from basically just providing a Dockerfile or a link to a git repo. One of the oft-touted organizational benefits of "cloud" is that you don't have to wait a week for Ops to provision a VM. We get all that.

>I understand your use case, but your is very, very far from the sheer and absolute complexity and features that enterprise data centers have.

My point is that many things people host in AWS do not need enterprise quality. If you're a startup, then almost by definition you do not need enterprise quality (though, as always, it depends). We made a tonne of savings. I'm sure many others would by self-hosting and learning a moderate amount of Linux / Kubernetes.


> The problem will be finding skilled labor.

I think that overstates the problem. It does not require a whole pile of skill to purchase a few rack mount servers from Dell or Supermicro with onsite 24 hour warranty, and plonk them in a co-lo. In the rare event the hardware does break ring Dell and ask them to fix it for you. When the onsite warranty end's in 7 years it's time to replace the servers.

The expertise required it literally minimal - not much beyond the ability to use a screw driver to install it into the rack and know how to connect the Ethernet cables. Then you have to plug in a USB and install whatever OS you want, of course, but you don't have to be onsite to do that. They all come with iDRAC's or the equivalent.

They will cost about $1000/yr with maybe 10TB raid disk, co-lo costs of around another $1000/yr for unlimited bandwidth. To rent the same dedicated metal is about $500/mo from OHV where I live, so over twice the cost.

Obviously, this is all impossible if you aren't big enough to have dedicated IT staff. And obviously, if you are likely to go through rapid change (well, something more extreme than adding a new server every now and then), it isn't the best plan. But for a stable mature business that employs several hundred people, all you are really doing is cutting out the middle man.


Basically it makes sense to set up your own infrastructure when your business IS your infrastructure.


Same as lawyering or accounting, right?

I don't do contracts regularly unless my business in contracts.

Same with taxes.

Just another form of specialization. MSPs and data center companies have been doing this since the 1990s at least, this is just the next evolution.


But it's insane that running a for-higher data center is considered a high-margin business. (And testament that the customers are VC-gorged price-unconcious baby gremlims.) In a sane economy, data centers for higher would be a fully-commoditized barely-profitable common carrier with little natural monopoly.


Reliability concerns make datacenters resilient to commoditization. A datacenter that’s available 90% of the time is worth vastly, vastly less than 90% as much as one that 99.95% of the time. Commodity businesses are largely built on presumptions of linearity. Produce 90% as much corn/iron/wood/widgets as you expected and you’ll probably make something like 90% of the money you expected. Produce a 90% available datacenter and you’ll have a hard time finding anyone willing to pay you anything. And that’s just availability, not to mention data durability, which is even more critically nonlinear.


> A datacenter that’s available 90% of the time is worth vastly, vastly less than 90% as much as one that 99.95% of the time.

That depends on your workload and the nature of the loss of availability. Is the completed work still there but just unreachable for an hour every few months? That might be okay for some folks.

Is your data center always available mon-fri but constantly has scheduled downtime on weekends? Might still be okay.


Then you should be able to sue your provider for breaches in SLA.

0.5% downtime should void the bill.

I think presently the providers are getting all the upside. High margin, perpetual lock-in, and no consequences.


You can sue your provider for breaches in SLA. The rebate you automatically get likely exceeds what you'd otherwise recover. (Which is why they grant one). SLA payouts are usually very generous - I've gotten credit for the entire month for a one-hour outage of a service.

I don't know where this idea that cloud providers and other DCs don't pay out for SLA agreements is, but they absolutely do.


>But it's insane that running a for-higher[sic] data center is considered a high-margin business. (And testament that the customers are VC-gorged price-unconcious baby gremlims.)

Your analysis is incomplete which is why it looks like insane high-profit margins.

Amazon AWS (and MS Azure, Google Cloud) also sell high-value services on top of raw datacenters. It's not just commodity rack servers. Amazon keeps iterating on new value-added services (e.g. see new announcements at annual AWS re:invent conference[1]). E.g. AWS DynamoDB service was announced in 2012 and Netflix is one of the customers that use it.

In contrast, other datacenter companies that don't have the same higher value-added portfolio like Rackspace and DigitalOcean are losing money[2] or not even profitable yet[3]. Yes, the lower-tier datacenters are also adding value-added services but the breadth of their product portfolio is not in the same league as AWS/Azure/GCP.

Rackspace was losing so much business to AWS that they're trying to sell the idea of customers paying their RS employees to manage AWS.[4]

>(And testament that the customers are VC-gorged price-unconcious baby gremlims.)

Most of the revenue comes from non-VC businesses. A lot of old Fortune 1000 companies where IT is a cost center shrank their self-run datacenters and moved the workload to the cloud vendors. Another example is AWS winning the big $600 million contract from the CIA.

[1] https://www.youtube.com/results?search_query=amazon+re%3Ainv...

[2] https://www.google.com/search?q=rackspace+%22net+loss%22

[3] https://www.sdxcentral.com/articles/news/digitalocean-inhale...

[4] https://www.rackspace.com/managed-aws


Recently I just began using cloud services. I started with Firebase, thought it was super cool, then moved to GCP, which was super inflexible (and super shit support). So I decided to try Azure, since we used that at a previous startup, but it was too complicated to even get started. So I decided to give Jeff my money (or rather avail of his free tier), and started using AWS. While it's super complicated to use compared to Firebase, it was relatively easy to learn from scratch due to huge amount of support online and what they have on offer. But what honestly stumped me was the HUUUUUUUGE amount of services they have, and the pricing on that. DynamoDB was half the price of Cloud Firestore, the free tier on most services was half that of GCP, and the offerings were insane. Need to host a website? Cloud Front. Launch an app? Lightsail. Satellite connection? Covered. Streaming Data? Kinesis. Queried Database? Elasticsearch. And that's just scratching the surface, I know.

Honestly, in a few years, I think we'll see Google forced to exit this space, Azure and AWS competing on price for big name corporate contracts, while others such as Digital Ocean being crumpled, simply because the AWS offering is so vast and widely supported online. And AWS' lock-in is pretty damn good.

The only solution to break out of such a duopoly would be for tech oriented companies to stop being lazy and start building out their own cloud infra.


> The only solution to break out of such a duopoly would be for tech oriented companies to stop being lazy and start building out their own cloud infra.

What competitive advantage do you get by making your own in-house, inferior version of an infrastructure service that won't benefit from AWSs economies of scale?


Not being overly dependent on an outside provider? AFAIK, most traditional corporates using clouds often use multiple cloud services of the same type from different providers.


Not getting bled dry by Jeff?


I'm pretty sure the main "value" those services "add" is resume padding.... :/


Not nitpicking but I had trouble understanding one thing: for higher should be spelled for hire I guess


Sorry! Wish I could edit but it's too late.


Some of it is becoming more democratized too. Facebook as an example open sources it's datacenter design, it's servers, switches etc. I know they are the big bad for privacy these days, but the OCP project has contributed state of the art datacenter tech that legacy providers should be jumping at.


There is a lot of spectrum in the middle between AWS micro and your own datacenter(s). One don't have to jump from the cloud to datacenter - just rent/buy a server or two, or rack of server or two.


Sure, but the management overhead from 1 server to 20 racks scales, at best, linearly. I've seen a lot of places just past the point where a single sys-ad person with a thumb drive is viable, yet they choose to not invest here and instead spend 2-5x on going to the cloud.


all but the largest or most stubborn companies run their own datacenters are coming to a close

You're right about the stubborn part, but not necessarily largest. Many IT leaders at companies of all sizes have their political capital tied up in the data center. What you're talking about doing (with cloud adoption) is outsourcing 90% of what they control. For them it's existential, uptime and agility be damned.

Serverless is even worse for them, as far as IT fiefdoms are concerned.


You say "stubborn" but when the savings can be in the millions of dollars, it becomes logical to own a datacenter. I can't believe datacenter people suddenly all disappeared or are retired. If the pay scale is down, they will surely appreciate working for a big company with datacenter needs.

Obviously, this only applies to huge companies like Dropbox. Everyone else is better served by AWS.


You can rent racks of servers from a dedicated hosting provider and spend 10X less than AWS. You don't need to do you own wiring and HVAC and shit.


This is explicitly what I’m calling out at the end. They didn’t all disappear or retire; it’s an ongoing process. Think COBOL programmers.


Ha!

I was also thinking about COBOL programmers when writing my comment.


They aren’t shrinking at all, we have the same talent pool and are trying to thinly spread it over all of the problems we can solve with networked computers. The amount of problems you can solve with networked computers has scaled way faster than our industries capability to find talent and train employees.


Who is “we”? Always curious what remains of my field—I’ve worked in the storage and infrastructure industry for over 20 years. If you mean the industry as a whole, I simply disagree. The talent pipeline is not there anymore. You used to be able to get started without an Engineering degree by “getting in at the bottom.” Now somebody on this path will never get past power supply swaps because they’re just doing drudge work fed to them on a tablet (“replace disk 17 on row 23, rack 9, slot 4”—“oh, yup, there’s the yellow light”). If you do have the degree, there are a billion more profitable paths, and most take them.


We is engineers who work in datacenters. Colocation datacenters have existed for several decades and remote hands working there would probably not see much difference between getting a call/fax/email/app telling them to do a break fix order. Having worked in dozens of datacenters over the past two decades is that there will always be newer semi-technical workers there doing break fix. A lot of those people would consider those jobs dead ends too and so this cloud datacenter thing is nothing new, its just on a much larger scale. The datacenter operators don't have to keep working there to level up their careers but working in a datacenter gives familiarity and the confidence to try more technical roles later.


If you think you can manage cloud deployments without skilled labour with cloud expertise, you are in for a surprise.


Dropbox is in a relatively specific situation wherein their cloud costs would be extremely high (storage, bandwidth) and their in-house technical skills are probably quite good.


Second point is spot on. Not many companies have the engineering capabilities. Dropbox has so much potential, but their business vision wasn't great to say the least.


Dropbox is itself a cloud provider. A consumer/SMB cloud provider. It makes total sense they build up their own infrastructure, not only from a purely profit/scale point of view.

It's about their core values and identity, at least because in this way they are seen in the market as a big player and not just as another AWS reseller with some added benefits.


There's a lot of companies that have the scale to benefit from running their own tech, fewer who have the skills and number of people, but it's not a small niche even so.


CrowdStrike is building its own servers to move off from AWS as well.


Netflix is a counterpoint? A good fraction (though not majority probably) of their cost is probably bandwidth and computation yet they continue to offload that to a cloud provider, who's a direct competitor no less.


Netflix plays both sides rather strongly, e.g. they run some of their hardware in your isp's buildings: https://openconnect.netflix.com/en/

By some measures, that's much more extreme than running your own datacenter.


netflix does not serve their bits over amazon. they have their own major infra, and operate a large and distributed cdn, including edge caches colocated within eyeball networks.

it would be insane for them to deliver their video content over aws.


Netflix does run their own CDN for video content, presumably to avoid that exact problem. I would imagine that's by far the most expensive part of their business.


IT infrastructure costs scale down very badly.

It's not to offset risks, if you have very little load (a site with 10k visits a day, for example), you can share the costs on the cloud and save nearly the entire bill.

Companies with a high revenue/load ratio tend to stay at the cloud even after they get big. That is because even though the cloud is very expensive for their needs, it adds speed on their internal processes by saving the time to decide and buy equipment. But when that ratio is small, they just can't afford it.


There are not a lot of those companies which are that big.

You also might not want to manage that many it experts for your infrastructure or you are not able to get them.

Also if your companies product is very technical, i would argue that those companies are much better equiped doing it by themselfs then others.

Nonetheless, it also doesn't need to be all or nothing. You can easily combine a MultiCloud approach.

Build only the stuff which is easy to build and costs a lot on cloud yourself. I would say Buildsystems or compute instances are good candidates.

Like i could imagine putting netflix authentication system on a cloud provider while doing the compute stuff in my own data center and building the CDN myself.


> Nonetheless, it also doesn't need to be all or nothing. You can easily combine a MultiCloud approach.

There may be reasons to go multicloud but ease isn’t one of them. You double your infra support overhead (or more likely, half its quality) and have a “least common denominator” experience.

The natural tendencies of large organizations is a diffusion of investment but the cheapest costs frequently come from a concentration of investment.


Bigger you are bigger the differences between teams and products and projects.

You can leverage the high quality network infrastructure from Google while using your own DC for Compute Heavy Load.

Use Azure for your Windows specific workloads.

Go with AliCloud in China.

You need to be big enough so that running it yourself is doable with a certain amount of quality. Which does imply many teams and workloads.


My employer does have a luxury of focus in its product offering, though we do have a moderately heterogeneous approach in development, certainly compared to many of the peers that operate at similar scale.

Heterogeneity in compute location has a multiplicative effect on accounting, security, capacity management, network management and is dilutive in terms of expertise -- instead of being able to justify the worlds leading experts in one system, you now need more staffing to cover a wider surface area (and they all need to have collaboration overhead to ensure they arent working at cross-purposes in strategy or tactic.)

I think this belief in marginal benefit from "right tool for the job" is a local-optimization where the costs of coordination and overhead are not borne locally and so are generally undervalued/discounted.

My employer runs on a single cloud provider, but -- do to its scale and closeness to core competency of our business -- we do operate our own CDN infrastructure, and this is a decision I happen to agree with. As a result of this division, I am acutely aware of the impact it can have on an engineering organization and only in certain specialized use-cases would advise considering DIY or multi-cloud.


You also need to be on MultiCloud if you do not operate stuff on yourself so you are in a better negotiation position.

Or so that you are not dependend on only one.


I hear this sentiment repeated frequently, but I’ve never heard multicloud as leverage actually getting a better deal than an exclusivity deal. If you have a different experience I’d love to connect and learn more - email in my profile.


There are lots of entities that are big enough. Given that the cloud stacks change from time to time, you don't reduce your need for engineers and other SMEs -- in some cases you need more.

I would say as someone who supports lots and lots of apps that cloud services are usually financial winners in a SaaS perspective and in a rapid growth scenario. Nobody can deliver Exchange cheaper than Microsoft. My team stood up apps for covid related activity for 20-40% of the cost and more importantly type than services under our organizations control.

That said, for what I would call "base load" scenarios, in many scenarios it's exactly the opposite.


> I think the main advantage of cloud providers is to offset the risk of purchasing equipment that eventually is no longer needed,

Every corporate use case I have seen is labor based. They dont want the overhead of salary and healthcare for the IT department. Even if long term they end up paying more, they always view it has pay for it now or pay for it later. And they always choose later because they dont know better.

(none of these have been the scale of dropbox, that is different)


> And they always choose later because they dont know better.

This is a dangerous assumption to make. Delaying payments and going with the crowd are both safe decisions. Safe decisions are smart decisions under normal circumstances.


True there is not one answer for everyone. I just know Ive been a part of companies where we go IAAS, get rid of the people that have the knowledge to manage infrastructure and when it comes time to need the knowledge again we either need to get consultants or hire at a much more expensive rate as for infrastructure management talent pool is getting smaller everyday. To be fair I deal with the headaches more than the everything is fine and dandy so my views are skewed as a result.


Especially when IT is not your strategic differentiator.


Why do people buy coffee from Starbucks?

Is it because Starbucks has the most efficient cost outlay for long term investment in a user's needs?

Or is it because people just want some damn coffee and there's one on every corner?

Or is it because, in a world full of places to buy coffee, one place gives you everything you could ever dream of in a coffee place?

That is what the big cloud providers are. They are Starbucks. They are not the cheapest. They aren't even the best. But they are everything you want.

If your company gets Starbucks-huge, you don't need to buy your coffee from Starbucks. You have your own deals with roasters and your own supply chains and baristas and coffee logistics experts.

That's why some companies build their own. Not because it's a better idea (it's not), or because Starbucks costs too much (compared to the investment in re-creating Starbucks?). It's because they are a business that effectively makes their own coffee already, so it makes no sense to pay Starbucks for it. Of course, they won't have a Starbucks once they make their own coffee, but they will have served their needs well enough.


This is not a great analogy. Sure people go to Starbucks for one off coffees and sure some do that exclusively. But almost every startup to enterprise company has an expresso machine in the kitchen.

So in most cases, if I want a coffee I don't go to Starbucks. They I go to the kitchen, it's faster, cheaper, and more convenient than going to Starbucks.

However, if I'm out and about it doesn't make sense for me to invest in temporary coffee infrastructure. In those cases it's easier to go to a cafe like Starbucks.

This matches the idea of the article. If you have consistent demand it makes sense to buy infrastructure to meet that demand. But if the project is temporary or extremely bursty it may make sense to have someone else do it for you.


I wonder if Dropbox is an outlier.

Their product is extremely close to basically just reselling storage space. Of course it makes sense for them to build their own infrastructure.

For a company whose product is a saas application (business logic in code) with users spending hundreds or thousands of dollars per month, those cost savings may never materialize relative to the amount of infrastructure each customer is using per dollar.

Dropbox is essentially buying a barrel of gasoline and selling it in gallons. Their product can never be profusely more valuable than the underlying infrastructure.

As I recall there are giants like Netflix that still run on AWS...which brings up another point! If you’re large enough to consider your own data center, you’re large enough to negotiate contracts with cloud providers at below-retail rates.


> My biggest question is whether cloud providers could achieve a scale where they are able to offer the most optimal infrastructure costs for specific businesses. Maybe this is the case for smaller or mid-size companies, but I'd be interested to see where the inflection point lies.

Disclosure: I work at Microsoft on Azure, but I’m on the product/dev tool side not on infra.

I think this is already happening to a certain extent and will happen more in more verticals as time goes on. There are massive government use cases for the cloud and it isn’t as if governments and agencies haven’t been maintaining their own datacenters and servers before. Clouds optimized for healthcare are also a thing and are only becoming bigger — again, industries that have long maintained their own infra.

You also have the private cloud model, which OpenStack pioneered but Azure Stack and AWS Outpost have put their own spin on, which essentially lets you host specific cloud services and tools on your own infrastructure.

There are always going to be some businesses that reach a size and scale where it doesn’t make sense to offload to the cloud, where paying for people to do maintenance and support, build out monitoring, handle everything soup to nuts makes sense. I think Dropbox, which is a storage provider, is a key example of that.

I talked with the then CTO of Dropbox right after it finished moving from AWS to it’s own datacenters and the process was extraordinary and really impressive. For what Dropbox is doing, it makes sense that it owns and operates its own infrastructure and storage and tooling.

Of course, you can also have the inverse. Zynga famously moved off AWS as its demand peaked and it saw the cost savings, and then had to move back to it, after demand died down and the numbers of owning and maintaining its own infrastructure no longer made sense.

Netflix has moved much of its stuff in-house, but still relies on AWS and likely will for quite some time.

But on the whole, yes, I absolutely see cloud providers moving to offer specific business and business vertical centric solutions with pricing that is lower than what those businesses could achieve on their own, even if you take some of the “services” stuff out of it snd are just looking at raw infrastructure costs.


> it should be easier to migrate off these platforms, since their pricing is optimized for users of all business sizes and use cases

Isn't the financial implications of capex vs opex a huge consideration? I've heard that opex is is a lot simpler to account for. Technically once you get big enough cloud becomes more expensive. But hiring people to manage both your own datacenters and cloud services does complicate things. I find it understandable that companies are willing to pay more for cloud providers if their core business doesn't require expertise in cloud computing.


all clouds are not equal either..

colo'd datacenter or 'off brand' cloud provider can easily be much cheaper than the big 3, still get you out of 'dealing with hardware' and either way you are still 'setting up infrastructure' in terms of developing software management tools for your system


> My biggest question is whether cloud providers could achieve a scale where they are able to offer the most optimal infrastructure costs for specific businesses.

IMO, this has happened. AWS GovCloud. Since the USGov has near unlimited spending power, it's better for integrators to just pass the costs along. Compared to certifying your own infrastructure, this will probably be much cheaper for most everybody.


Just as a reminder: You are not Dropbox. You personally need to run the numbers with both scenarios. Depending on your use case, it may be cheaper to be in AWS, and it may be cheaper to be co-located. If you are as big as Dropbox, it may be cheaper to build your own data center.

It reminds me of the arguments for and against K8s. Most of the discussion is based on use case, and not considering the solution space for all solutions. E.g. if I'm running an Erlang solution in K8s, or have a small number of servers, I may have better options.

It isn't only about scale. It takes understanding your system and your needs in depth, and any short cuts lead to cost inefficiencies.


>You are not Dropbox

That may actually work the opposite way. The big guys can negotiate pricing with the cloud providers to the point where they may be running at close to cost. It's the small and mid tiers that get hosed with cloud pricing.

>You personally need to run the numbers with both scenarios. Depending on your use case, it may be cheaper to be in AWS

I'm not sure about that. Certainly cloud infrastructure provides a level of flexibility which may be advantageous if you're going through high rate of growth. Outside of that, I'm not sure you'd see any cost savings.


In fact it is easier to get bigger discounts if you are a mid-size customer than if you are a big customer. If you are a very big customer, then any discount given to you will directly affect the cloud business bottomline.

A public cloud inherently/structurally has more costs because it is built to be truly multi-tenant and has lots of elastic capacity.

A single tenant private cloud can cut out a lot of bells and whistles and can be tailored to the needs of the company's private use-case – this brings down the cost significantly.

If your compute/storage bill is greater than $10 million per year, then it is highly likely that for the same money you will get much higher compute/storage capacity in a private cloud. And that gives you more headroom for growth/elasticity etc.

If your bill is greater than $30 million per year, then you will likely save significantly as well as get other strategic advantages – especially if your infra scale continues to grow over next 2-5 years.


Here's another way that you're not Dropbox (unless you are): you are not a public company that gets points from investors for removing [$35M] dollars from COGS (cost of goods sold), even if you spend [$40M] on R&D + capitalized expenditures to replace it.

Stories like this can be really misleading because large companies almost can't help but trick themselves into financial shenanigans. It's very possible that Dropbox just spent $100M upfront to start saving $35M / year relative to AWS current prices... but discovers in year 3 that they are no longer saving money because AWS has reduced prices via their own R&D, which is spread over a massively larger customer base.

Now, because Dropbox is a storage company, it may well make since for them to continue R&D and try to keep pace with AWS, at least for their own needs. But 99% of other companies - of any size - would likely fall on the wrong side of the scenario above.


Why must Amazon pass off its savings to your org, in this example, rather than taking most of those realized R&D savings to their own investors?

Sure, AWS may be constantly innovating on doing things more cheaply and scalably. Sometimes, those savings get passed off to the customer; but not most of it, and not necessarily all that frequently.


Check out these curves: https://www.stayclassyinternet.com/articles/investigating-AW...

Amazon competes with other cloud providers and with build-your-own continually, and use price segmentation to stay on the right side of the equation for as many people as possible, while still skimming as much premium for themselves from each user group. Historically, that’s meant big price drops in most categories that would make most one off investments obsolete - while still keeping a ton for themselves.

It also helps that S3 is an anchor product that helps them sell higher margin stuff.


I was at an F500 that spent a lot of money with AWS. I heard rumors that services were becoming more expensive. If you are spending a ton of money, moving off of AWS is major project that will require a lot time, money, and personal. You are kind of locked into AWS unless you convince senior management you should move off.


Yes but no. The figure you reference shows the lowest price of a lowers tier available. I am pretty sure if you cannot switch from S3 with full availability to S3 Glacier, your savings will not be SO impressive as that fall-off-the-cliff figure.


The figure shows full-availability S3 at multiple price tiers depending on volume. The article discusses that price reductions were biggest for high volume -- in other words, AWS reduced price for exactly the customers who are most equipped to build-not-buy. Smaller use-cases won't see prices go down as much -- but they also won't get paid back as quickly for rolling their own.


> ... and not necessarily all that frequently.

Did you miss where AWS reduces prices multiple times a year? I don't know the latest figure, but as of 2018, they reduced prices 67 times since launch in 2006.


There have been 6 price reductions this year https://aws.amazon.com/blogs/aws/category/price-reduction/ you are probably not using any of the services in question.

The only S3 price reduction I can find was in 2012 https://aws.amazon.com/blogs/aws/amazon-s3-price-reduction/


(i work at aws)

fwiw, a quick google search for `s3 price reduction` yields one from 2016: https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glaci...

there's also one-zone s3 storage options which are cheaper as well


Because Google and Microsoft are competing against them.


I want to see the kind of R&D Amazon has to do to bring down their insane egress prices ;)


They probably don't want the high egress business, anyway. Besides the whole "keep data in". They have a CDN for those businesses that need it and everything else that needs a lot of egress, they probably don't want as business.


Agreed. I work at a company that does 150 Billion transactions a day and it is cheaper being in the cloud than on prem for us. Maybe the variable cost is more but the fixed cost is a lot less for the business.


Cloud vs on prem is a ridiculous comparison. The sweet spot is almost always in the middle.


> ...and it may be cheaper to be co-located. If you are as big as Dropbox, it may be cheaper to build your own data center.

Nothing I have seen indicates dropbox "built its own datacenter." It looks like they are in Equinix, CoreSite, and DRT colocation.


You're right - I misspoke, and that likely makes more sense at their scale. At some point, I am sure they will again run the numbers.


Cloud is taking a taxi everywhere, bare metal is owning your own car.

In one case, you pay a premium so that somebody else worries about all details and maintenance, and in the other, you take on that burden but don't pay the premium. This has always been true for everything where a service is provided to you.

Great if your needs are small or not well known because there is no initial investment, but bad in the long run.


Not a bad analogy, but the taxi also comes with a driver, which makes it expensive but allows you to get work done in the back while you're en route. It'd be like a cloud provider that came with a devops person dedicated to you.


Except when driver decides to take couple days off and you can’t get anywhere or their cars breaks down for a day and they say it’s still within slo because the radio still worked (looking at you GCP)


Except in this analogy, a taxi breaking can be quickly replaced with another taxi, the issue is when the whole taxi network goes down which is more rare. If you had your own car, if the machine went down, it would be a lot more work to fix it yourself.


True but i can buy more cars/parts and still save money. Also it’s only rare if you judge by status page


Sure, but if you are using the taxi often enough it may be cheaper to higher a full time driver.


Assuming you are qualified to hire drivers. And know where to find a good one. This is what people forget with the cloud. That you're paying for their expertise.


Or you could hire off them?


A company could hire their own janitor, buy their own janitorial supplies and save money.

Or they can go through a 3rd party consultant to take care of it so they can focus their cycles on more important issues.

Not everyone is an infrastructure provider. Time is a limited commodity, and there are clear benefits of spending as much of your time on the core business.


I work for a company that does about 100M revenue per year. Run everything on prem, we did the numbers if we moved to AWS, it was almost 10 X of our current spend. Again , depends on your business, ours is complex and compute intensive. But at scale, on-prem always wins.


Different numbers, but almost the exact same 10x spend difference moving all to AWS. We are data/compute heavy with low bandwidth use. We still use Lambda for bursty traffic if it works for the use case.


I work for a 100M rev/year company as well and we used to run our ecommerce site on-prem as well. We knew of the increase to cost for moving our infrastructure to AWS but we're happy to bite that bullet for the vast array of services we now leverage in AWS as well as faster site response times and increased reliability.

Just saying spend isn't the only factor here.


The lie of cloud services is ephemerality and elasticity. We convince ourselves that this hacked together solution is not actually going to run for years, or that we need instant scale 50x our current capacity. For most is that resources turn on and never turn off. This is why so much of the cost reduction conversation revolve around RI purchases.

The real secret is that compute isn't the driving factor in cost - its bandwidth. I've had too many conversations where people are congratulating themselves on their $0.027/GB PPA only to explain that they are paying effectively $8.00/Mbps for a network link. The real threshold is network consumption and when you cross it figuring out a hybrid strategy is the best path.


So they saved next to nothing? Their yearly operating expenses are in the order of $1.3B, which doesn't even include cost of revenue. $75M savings versus $2600M operating expenses over those 2 years comes out to about 2.8% savings.


> So they saved next to nothing? Their yearly operating expenses are in the order of $1.3B, which doesn't even include cost of revenue.

Wow thank god some of y'all aren't running these businesses.

It still surprises me every time I see failures to think marginally.


~3% boost to margins is very welcome for a cloud company (would be amazing for a retailer). But what about the risks? What if you need to scale down the operation? What if your hardware team ends up costing more? In particular, the R&D department (cloud providers depreciate hardware over 2-3 years, which matches the progress of technology and the speed at which things become obsolete). $35M per year only pays about 50-100 salaries. What if your setup turns out not as good as expected? You can't easily jump ship once you sink in the capex. And I don't think that already big (in terms of revenue) startups should be adding much to their risk profile.

It could also be that there's just no other way to cut anything from their enormous opex (for example fire people or buy less ads). I find this hard to believe but may be true.

One very good strategic reason to do this, I think, would be to gain independence from cloud providers, who unfortunately happen to be your major competitors.


> One very good strategic reason to do this, I think, would be to gain independence from cloud providers, who unfortunately happen to be your major competitors.

All of the questions you asked before (risks, capex, etc.) all would have been answered during an assessment period. It's almost as if you assumed that none of that was done and that their sys admin group just through together a proposal to move to a colo and their executive management made a decision to do so. That's not how this works.

The number one priority of this type of initiative was cost and reliability. Going exclusively cloud is not a silver bullet for every company - that trope needs to die.


That's assuming the numbers shown fully take into account all the tangential cost related to the transition, and not just "cost before vs cost after". If you spent 500M in developer hours doing the migration, that just wiped like 10 years worth of savings. Also it's not clear if in 2 years you really get to see the cost of maintenance and breakages. One big breakage could wipe off that 75M very quickly. Of course Cloud also has its fair share of down time, so you'd have to compare longer term to get a better sense.


> that just wiped like 10 years worth of savings.

And then the next 10 years you get pure profit? That's still an ROI and significant one.

> One big breakage could wipe off that 75M very quickly.

What specifically would cause a $75M loss? As you said, cloud is not bullet proof.


> $75M savings versus $2600M operating expenses over those 2 years comes out to about 2.8% savings

2.8% of a big number is still a bigger number than 2.8% of a small number. Not sure what point you're trying to make here...


Opportunity cost plus the risks of this migration.


That's objectively still a lot of money that can pay for a lot of things.


Given 2323 employees in 2019[1], that's an average of a $32k bonus if paid out to the employees.

[1] https://en.wikipedia.org/wiki/Dropbox_(service)


Even if you want to argue they saved next to nothing (which is silly to say, $75m is $75m).. they also have CONTROL over what they are doing and are not dependent on other services. That is a huge advantage.

So even if it were a wash financially, it's still a huge plus.


Whats their margin? If they have any competition at all, 2.8% savings could be fat.


If you see it from the perspective of being able to retain staff or hiring new employees, it definitely seem worth it.


Except now you have to hire a staff to maintain and manage hardware, batteries, generators, buildings, security?


Yup, exactly. What's your new R&D spend going to be going forward, now that you have to maintain an edge in so many new areas? It really is a mediocre saving for a company this big.


Or you can just use a regular colo company and not even think about batteries, generators, buildings, security. Hell they'll even rack your gear for you.


No. That is factored into the new costs, which with all of those things, plus initial / ongoing hardware buys, saved that $75M.


In the long run "small" improvements adds up. The margin will never beat time.


We are a startup. We decided to host our full stack in self managed vpses running debian. We started at one provider, but because we have everything (except the hardware) under our own super vision, we were able to move to another provider (hetzner) in just a few days. Moving away to another provider wouldn't be an issue. I believe the key is to not use any of the propitiatory services offered by all these big players. We currently pay around 1000 dollars a month for about 15 servers. Same thing at AWS will cost at least 1 week to figure out the pricing model and probably six phone calls with sales people trying to lock you in. Then in the end I am sure you would pay much much more.


The thing is that if you don't use any of the proprietary services offered by the bigger players they look at lot less appealing. AWS with just EC2 is one expensive-ass server provider with murderous rates for egress traffic.


Really depends on what you are trying to do.

I work in a company that is currently trying to move over to AWS from on prem servers.

Why? Not for AWS proprietary software. That's completely an afterthought (In fact, the first round is likely going to be those expensive ass EC2 instances).

The real reason we are looking at moving to AWS is because they have data centers across the globe and we don't. So even though EC2 is expensive, it's far less expensive than trying to navigate the waters of setting up a datacenter in France, china, and India.


What are the size of the servers?


Yeah 1000 bucks pm for 15 servers... should be quite decently specced


On the other hand it's very easy to not include certain costs in this project to make it seem like a success. Did they really negotiate with AWS, Google Cloud and evaluate the project against those rates? Did they include all the costs associated with this? How about the opportunity cost of not focusing on their product. Whenever I read posts like this I always doubt how accurate they are.


Dropbox is a unique case where majority cost is storage/bandwidth by one product/team.

If you have a large organization where cost is divided in 100s of teams and products, cloud (vendor/managed) will be beneficial in removing lot of bureaucracy.


Not sure how you reached that conclusion. At an org where every sub-org has their own junk in AWS a new and uniquely hideous type of bureaucracy springs into existence in order to manage/dictate which VPC is allowed to talk to which other VPC and how.


Is it really because of a unique business model or the obscene markups on bandwidth and storage?


A good compromise halfway is to have your own equipment in colocation, at a facility where your IP transit upstreams are <1ms away from the edge of the AWS and Azure ASNs. You might still have an appreciable amount of stuff hosted on AWS, Azure, GCP or similar.

Depending on your scale might even be able to peer with them directly if you're big enough.

Or, if you are big enough to justify it you can order direct 10Gbps cross connects within a number of facilities to AWS.


> <1ms away from the edge of the AWS and Azure ASNs

So, essentially, in the same building or across the street? May be very tricky to pull off.


Not really difficult, colocation is available in the same buildings as the largest IX points... Places like One Wilshire, 350 E. Cermak, NAP of the Americas, 60 Hudson, etc. It definitely has a premium price.

If you can extend your horizon of view out to buildings that are in the same metro area but less than 10, 15 or 20 km of (OTDR measured) fiber from the IX point, the performance is very nearly the same, and there's a lot more options. One example would be the datacenters in Tukwila, WA and their relationship to IX points in downtown Seattle.


There's tipping point in growing software companies where the primary cost goes from Software Developers to infrastructure and when that tipping point gets hit it swings towards infrastructure costs almost exponentially. The reason is simple: computers beget more computers at a much higher rate than Software Developers beget more Software Developers. Doubly or triply so when you're in the business of storage like Dropbox is.

At some point it might be interesting to calculate the size of your company in terms of how many racks you have and how many employees you have. These days a fully loaded rack costs a lot more than your typical Software Developer salary.


I often wondered why we aren't seeing more hybrid solutions, base systems running on minimal owned hardware with cloud options for peak-load and DR.

I guess part of the problem with that is the cost of maintaining different platforms, but there are ways around that.


I can see that more for mid- or large-sized enterprises...If i was such an org, and already had some investment in infrastructure, then hybrid is what i would look at first. Not sure that i hear that so often though. Mostly, what i hear about is, "let's go to the cloud!". Oh well.


I remember Gitlab blogged that they are going to use baremetal instead of Azure with some numbers showing benefit of doing so. AFAIK they quickly reversed decision and didn't even try to implement that plan, moved to GCP instead.


I'm not privy to any particulars specific to GitLab's situation, but it's possible that a little sabre-rattling caused a better deal to appear which changed the calculus on migration.


My limited visibility take is that their board (or someone higher up, my memory is fuzzy) told them to use the cloud, and that was that.


I could see it. As discussed elsewhere in this thread, investing in your own metal/ops is definitely a long term play, and some combination of "focus on your actual differentiators" and "why bother when one of those cloud providers will acquire us eventually and we'll just have to migrate to them at that point anyway" could be a compelling case.


The thing is, Dropbox talent level is probably quite high. Thus, they can build a lot of internal APIs to manage their own hardware (Think: Software Defined Networking, Kubernetes, lambda-like platform, programatic scheduler, etc).

If you don't have people who can provide cloud-like APIs to developers to self-serve, then you are trading off your developer time with money. And I'd argue that is a terrible trade off.


This was viewed as a mistake by some investors as it distracted them from building features/products that would differentiate them from Google/MSFT.


Dropbox is competing in a narrow-margin market wherein those margins are a function of hosting costs. At the unit cost level, it's material to their business.

Pfizer, where IT is a minor backoffice operation, doesn't care that 'everything is 2x the cost' because that cost is materially small, and the nimble nature of the cloud is a much bigger advantage. And they don't have the talent to make their own cloud anyhow. So to them it's cheaper.

Also note that most cloud users are not startups or tech companies, they are 'basic corps' doing IT.


Let's look at it storage wise. A 480TB Backblaze 6.0 pod costs $22,000 to build. Dropbox has 90% of their data stored on their custom systems, and they have 'multi-exabytes' of data.

Let's just say they have 5 exabytes. A petabyte costs roughly $50,000 using this scenario, and 5,000 petabytes would then cost $250 million dollars alone.

If they had 20 exabytes...they would have a billion dollars in storage hardware alone?

This doesn't include power, networking, labor, none of that.


One thing to note though: Dropbox is a cloud provider.


Some assumptions to analyze this switch:

- Dropbox saves $50M/year in AWS costs

- Dropbox spends $200K/year (salary, benefits, equipment, SaaS, etc.) for their average infrastructure engineer

Following those assumptions, Dropbox must hire <250 additional engineers for this to break even.

Of course, these assumptions may be wrong (please correct them if so!) and this entirely ignores the unique computing needs of Dropbox's business, which may be unique vs. anything available off the shelf.


> - Dropbox spends $200K/year (salary, benefits, equipment, SaaS, etc.) for their average infrastructure engineer

The usual rule of thumb is that you assume 2:1 for salary and support costs — that starts to drift at the higher end of the pay scale but I'd doubt that a company in San Francisco isn't paying a substantial amount for office space, health insurance, etc.

Your general point is still correct: call it a hundred engineers to break-even and it's likely still a substantial win, and there are some interesting angles for additional cost-optimization given Dropbox's mix of high network traffic and long-tail storage.


I'm guessing the $200k is a considerable underestimate.


Agreed; probably more in the ballpark of $275k / $300k.


I think they could hire very skilled people from poorer countries for like 60K USD / yr


This can be generalised to “having a deep understanding about your tech stack is good and profitable”. Spotify did the opposite and went from a complex peer to peer design to a “throw everything in an s3 bucket” design because the underlying business assumptions changed (among other things, p2p didn’t work on mobile). So yeah. I think the lesson here is stop making excuses and go compile a Linux kernel or something.


I would have loved to see them buying Wasabi instead of building their own infrastructure, and becoming the biggest client of Wasabi at the same time.


Do you know more about this? Curious if they ever entertained buying Wasabi, of if it's just an idea of yours.

I think it's a good idea. But when deciding whether to acquire a company or not, there are several factors involved, only one of which is the technical fit.


no serious company wants to be 'the biggest client' of a company. You'll find all the scaling issues.


It depends on the company you're the client of. I worked at a top 3 client of one company, and we were also the top client of another company where I was the only engineer dealing with them. It worked out well, because they were very responsive to our issues, were able to scale, and we used the least hands on parts of their service offerings.


It's a classic rent vs buy conversation - if you're sure about what you want, and confident that you needs will not change drastically in the short-medium term, it makes sense to buy your own infrastructure. Else, rent someone else's.


Value of cloud is in many unique, quite well integrated addons like Cognito or Firebase. Once company have enough talent to build their own solutions, cloud turns into just set of VMs and storage, which is cheaper to build yourself.


"Cloud" handles the problem of people who buy based on who takes them out to lunch, buy based on wanting to have bragging rights, buy based on the idea that "nobody ever got fired for buying Dell / HP / whatever".

Sure, you can buy and colocate a dual processor Xeon Platinum bought from Dell, but when you can get significantly more than twice the performance per dollar buying an Epyc system, it's stupid to do it.

Stop comparing "cloud" to bad business decisions, and start comparing it to business decisions which would be made by people who want to get things done without ego.


What makes you think Amazon and Microsoft don't take customers out to lunch (pre-covid)?


Lol exactly. Amazon and Microsoft always take potential customers out for lunch (or dinner). My fiancee and I have attended a number of "Azure" dinners, where Microsoft funded our expenses. The only qualm was that we had to sit through lectures where salespersons would go on about how Xamarin was the next hot shit, which bored her, a medical doctor, to death, and managed a few chuckles from me.



Not very surprising. Cloud infrastructure is flexible, but very expensive - certainly much more expensive than hosting your own servers.


Depend on optimization - if you just want a server, yes it's much cheaper to just get one at home.

But why run servers when you can just put your workloads in lambdas. That's a huge money saver, unless you're hosting video games.


Well unfortunately I'm seeing this in my microscopic way. I moved off all of my domains and hosting from GoDaddy (shrugging in disgust) to Google domain+GCE. yikes, with virtually no traffic outside of me ssh'ing, my 2 vm instances is costing me $45/mon just basically sitting idle. I gotta be doing something wrong!


You can spin up a VPS on digital ocean for $5/month


[off-topic] Hey randomdude - a few months back you replied to one of my comments with some well wishes that I could get back into motorbike riding soon. Just wanted to pass on that Melbourne is now free from lockdown again, and so I am able, and have been. It's great! I hope you're doing well too!


Thanks will look into that. Don't understand why at smallest GCE instance why the cost is so high for virtually no traffic.


Does anyone know what the break-even point is for building/managing your own infrastructure, vs. using cloud services?

I'm also curious - what's the rough time/cost of moving to your own infrastructure? I'm sure it depends on lots of things, but looking for a ballpark. 1 year + 10 engineers? 5 years + 100 engineers?


I mean, the more important question for me is, is running your own infrastructure something that should be a core competency. For Dropbox, the answer seems like an obvious yes. I mean, virtually their entire product is around data storage. I would be shocked if they COULDN'T save money to build a storage system tailored to their needs.


It’s also a lot of cold storage, low iops, moderate throughput. Not high performance compute with random high iops.

Of course it’s cheaper. Compare the cost of an enterprise NAS vs external hdd from bestbuy. It’s a similar comparison here


There's a bunch of steps along the way between AWS[1] and managing your own infrastructure.

Virtual private servers, bare metal hosting, cabinet, rack, or cage in a colo, and then operating your own datacenters.

The more you buy into the AWS services and not just EC2 instances, the harder it is to move out. In other parts of the thread, people are saying you need to have a good team to run your own infra, but you also need to have a good team to run on other people's infra, and debug their bugs without visibility, so that you can guide their techs to fixing your issues, so I don't think that using other people's services absolves you of having a good team.

Really the issue isn't age of company or number of engineers. It's the number of servers you need, and how stable that is. If you can't predict your server count 3 months out, you need to host with someone who has stock on hand to buffer your growth. If your server count is small, you get better geographic redundancy picking up an instance here and there from around the world from a single vendor; colo space is available everywhere, of course, but you would likely be dealing with different vendors in each locality.

If you can take advantage of growing and shrinking your deployment throughout the day in response to load, and there's a dramatic difference between peak and trough, it makes a lot of sense to be somewhere that you pay by the hour, instead of by the month or have to buy for the peak and let it idle.

[1] Or Google Cloud, or Azure, or Oracle Cloud or whoever.


My take is that there's only a very specific kind of company where on-prem vs cloud is primarily a cost-driven decision.

"Cloud" providers are so much more than just the hardware. There are probably a lot of AWS customers who could absolutely save lots of money if they took all their cloud servers and magically made them physical on-prem servers. But then they might hit a situation like:

"We need to create a whole new prod-sized cluster for load testing but only for a day". Do you rack all those servers, do the test, then take them off and bin them? Or what?

I'm tempted to go on and on with similar examples but imo the biggest deal with cloud providers is that they can operate at such an insane infra scale that it lets you treat extremely large quantities of servers like an abstraction instead of a physical metal box that needs to be bottle-fed and rocked to sleep at night lest it get cranky.

I think at fairly small scales it's easy to build in-house systems that let you treat servers like abstract units as long as you're not trying to 2+x your infra dynamically. But cloud solutions let you do that because your scale <<< the cloud providers' scale even at pretty large values of scale for any one company.


Depends on what your use case is. For GPU compute, in my experience, it's pretty much 2 months of continuous usage.


Part of the problem is at smaller companies, managing your own infrastructure is just a part time job. So you have to outsource it to someone you trust, find someone in your company that wants to moonlight as a sysadmin for 10-20 minutes/day, or pay a full time employee to not do that much.

How much time you have to spend is generally related to how many servers you have. In my experience, if you have automation setup correctly (1-2 weeks) you can average around 5 hours per month to manage about 30 servers.


It depends hugely on your cost of staffing.

Also, there is not only aws vs on prem.

There are solutions in the middle where you don't have all of the AWS services but price is much lower.

I've seen very average businesses spending 1mln per year in AWS for, really not much. IMHO most mid businesses would be better off renting servers and having some staff maintaining them.


There are no shortcuts here. You really have to understand your use cases.


Riddle me this: computer and storage prices keep going down, but cloud spend keeps going up. How does that work out?

And more importantly, I won't go all in on the could until the big cloud vendors start running their workloads on machines they don't physically control.


One thing that has always stuck with me after an AWS training session I attended at work is what the trainer said: "The cloud isn't necessarily cheaper, but it's usually better value for money". It's something to always remember.


The cloud can be cheaper if you write your software for the cloud and need more redundancy.

The issue is that people use the cloud as just another random VM to run their software.


Cloud providers are not just server farms. They are a whole stack of software and services and an ecosystem, and more is moving into the cloud every day.

Can you save a few bucks on server costs? Sure. But most of the world has moved beyond caring about servers.


Anything more recent than 2018? Any examples of other companies doing this? I see in the article that it was a public filing from Dropbox. I’ll look into more recent ones when I have some time unless someone beats me to it.


Did anyone think AWS was about saving money?


we only have less than 60 racks. given every 2u can fit 4 x rtx and 4u can fit nearly 1pb raw. in general its cheaper than what people can rent to us for.

also. since we already exist. we run mail, ldap n a few other bits as well. i still think its all cheaper than 3rd party hosted.


pfft, when I interviewed at Dropbox they were on-prem. Didn't realize they had gone to the cloud and back.

Protip: If your business is storage, that is your core competency. Don't pay someone else for it.


your phone has as much computing power as an average k8 cluster


Shows you how you shouldn't worry about this stuff till late. They saved $75 million while losing $3 billion of market value -- capturing market share is vastly more important than operational excellence.


So Dropbox could be the next big cloud provider?


Wondering why a company like Netflix did the opposite?


Doesn't Netflix have cache boxes hooked directly into the ISP networks?

I'm sure Netflix uses a lot of AWS, but as a percentage of their total traffic (which is absolutely enormous), it's probably not that much.


Yes. And although much of what Netflix does is in-house, it’s done the math and realized that for some of its workloads, AWS is just better. Especially when looking at all of the areas where Netflix is available and needs datacenters and in some of the specific compute options Netflix needs. It’s not serving traffic through AWS. It’s using AWS for storage and compute.


It also certainly doesn't pay by the public price book.


$75M over two years is compensation of 75 engineers in the SV (not even taking into account opportunity cost - this 75 folks could innovate and bring more revenue)

So either it's not $75m or it was a bad idea.


Some businesses are not about innovation, some businesses are about margins. Personal cloud storage is that business - people shop around and buy on cost.

I use OneDrive because it's free with Ms Office, which I need anyway, and their 'family plan' pricing is great. I decide cloud storage for like 5 accounts, backups of family photos, etc.

That is in spite of the fact that dropbox is miles better, and has extra features - they dont move the needle.


Or maybe the workload of using their own infrastructure was lower, and scales well, that the savings could go towards having 75 extra people who could innovate instead of not having them.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: