> So I really don't see any problem with 200 boxes from OVH.
Its not cost effective compared to hiring ops and colo'ing it yourself. Once you're large enough, you hit tipping points:
* When to move from cloud to dedicated equipment
* When to move from dedicated equipment to someone else's colo (usually Equinix, but lots of providers in this space with varying levels of "warm fuzzies", which would cover on site techs, power and network redundancy, diesel commitments, and so on)
* When to move from someone else's colo to your own datacenter
(or in the other direction, depending on business requirements)
It also helps that US tax code (Sec 179) provides gracious depreciation schedules for physical compute/network/etc, which means that profit spread between cloud providers and you running your own gear goes back into your business or into your pocket
Disclaimer: 15 years of ops experience, including selling hosting to Fortune 500 companies as well as helping companies move from cloud to on-prem as well as the other way around. I've had to run cost/benefit analysis for this most of my career.
The OVH costs are often not a huge premium over self-leased equipment, especially when you factor in power, co-location fees, and other overhead.
There's a sweet spot between 1 server and some number, say into the hundreds, where OVH is still cost effective. Beyond that you'll absolutely want to roll with your own gear because you can negotiate for a whole cage instead of partial racks.
I think what he's saying is that 200 boxes from OVH is dedicated and that 200 boxes is well past the point where you should switch from dedicated to colo.
The cloud part was just an example of a tipping point and not the main point of the comment.
Depends where are you based. OPS in the Bay Area is above 100k/year which need to be added to the price of servers. Also one person has a huge bus factor so you essentially need two people (if one is sick or on vacation). Now you're at 200k+ annually without having any server installed.
200 servers (at least the one we were interested in) would cost us around $22k on OVH each month. That means if I remove the personal that I would have to hire, the cost of servers is now down to $6k/month (200/12). For that money you can't really find a better option.
In our case, where we are running thousands of severs on any given time, the flexibility is much more important than price. So we built our service around pre-emptible instances on GCE (the same as spot on AWS). You can't beat dedicated server in performance, but it's close enough and they're making for it by having a great infrastructure.
As an operations employee, I'm shocked you've run thousands of servers in some kind of service while talking yourself out of any operations employees. An operations hire is not a prerequisite to moving past dedicated; even with nothing but 200 dedicated servers you are way past the point of needing at least minimal operations. Contract this out if you have to.
We are not a direct cost center that can be discussed in those terms. Our insight will reduce capital and operational expenditure beyond our salary, because that operations hire would have told you how insane of an idea paying $22,000/mo for four cabinets of gear is and why a capital tradeoff with depreciation is a fiduciary responsibility to your investors and shareholders. You can buy at least a dozen U for that each month and then pay for nothing but where it lives with a dash of break-fix to taste.
I can put four cabinets of gear in a colocation for a quarter or less than that if you'd swing a little capital. You are wasting money on poor operations architecture and design and you don't have anybody to really tell you.
Even beyond that operations is a skill, much like marketing. I know a lot of people think they can fake it for a while (and they usually can), but after a point it's time to act like a grown up company and bring someone who does nothing but think about this shit on board. Security, performance, remediation, all the system level grunt work you shouldn't be concerning yourself with as engineers. Or you can keep throwing multiple operations salaries at your four cabinet OVH deal and keep getting ripped off.
We have no dedicated servers as we're running everything on GCE. Our use case is highly specific as we've a huge ingress network requirements, which we're getting free of charge.
> "You can buy at least a dozen U for that each month and then pay for nothing but where it lives."
We don't need dozen a month. We need hundreds now. We may not need them in 6 months though and then what? Will I rent them out?
That's the sort of capacity planning and management an operations chief would do for you several quarters out, based on experience with shifting business needs that they have acquired over a career of dealing with highly specific use cases.
Think of it as putting an intelligent layer between your demand metrics and your server fleet. I live for utilization, just like you live for your product. Hire an operations nerd who does and your company will be much better off for it; based on description it sounds like you or the other engineers are already involved in operations anyway, so you probably won't need two. Hire one and let them tell you.
Would you denigrate a software engineer by saying you had to rely on them as a magic wizard for their knowledge the same way you're referring to someone with ops/capacity planning experience?
Experience and knowledge isn't wizardry or magic, but as a business owner you're free to burn up your cash on pride if that's what you'd like to do. Not every problem's solution is a google search or API call away.
I am a SRE, I've done software engineering for years, and I do ops and lots of capacity planning at the moment. I have no intention to denigrate anyone.
The word you should have focused on was 'magic'. In the same way that magic does not exist, it is not possible to plan future capacity with accuracy. It can even get worse if the system is of limited size because the time spent on analysing and planning can quickly cover the savings.
The only way to have good predictions is to already have the systems running in production [for months], with a [mostly] static user base, running applications that don't evolve, and never add any additional service. Given these circumstances, we could have good metrics on current usage and make good prediction on the future... except that to be at this point the hardware has had to be bought already.
> In the same way that magic does not exist, it is not possible to plan future capacity with accuracy.
I...what? Capacity Planning for Unpredictable Growth is literally SRE 103. It's hard and will have a margin of error, yes, but it's not bloody magic. The "magic" is in identifying and collecting the correct metrics that somewhat model the abstracted utilizations of the property, because almost everyone picks the wrong ones; this is situational so there is no blanket advice to offer except that load average is almost certainly wrong, as well as consulting only one metric. If you're working capacity from five or six key application metrics you're probably on the right path.
SRE is, quite specifically, application operations engineering. If you can't model your application's growth I'd be more inclined to call you an SA. (There's absolutely nothing wrong with that, to be clear, even speaking as an SRE. And I am aware several valley companies are diluting the term.)
We have one server with them as VIP and very happy about it, they react fast on support with a special team dedicated... we asked a quote for this server and they proposed several solutions.
EDIT: As suggested below by toomuchtodo, at this level of deployment - Colo would be cheaper.