Microsoft open sources its next-gen cloud hardware design

jabl · on Oct 31, 2016

Seems we've come full circle, eh?

Back in the original dot-com boom ~15 years ago 1U pizza boxes were the workhorses. Same for things like HPC clusters.

Then the big-name vendors started introducing various blade server architectures, where many servers shared power/cooling/BMC (out of band management), often with things like integrated switches in the back of the blade chassis. (Of course, many thought the real reason behind blades were that standard rack servers were rapidly becoming commoditized, so the vendors needed something with more lock-in..)

And then we had things like "scale-out" designs, with somewhat bare-bones servers, often in some simple chassis with shared power. And lately (Intel at least?) there has been talk of "rack-scale" architectures.

But now MS goes against this "trend", going back to basics so to speak. What gives?

walrus01 · on Oct 31, 2016

The nice thing about using 1U size chassis is that you aren't reliant on a vendor specific backplane interconnect for the blade-to-blade-chassis interface. It takes advantage of much larger economies of scale in standard AC or DC power feed interconnects and ethernet (25/50/100G ethernet). And if you buy a huge enough number of things you can get a Taiwanese motherboard OEM to design you a custom ATX (12x9.6") or full size (12x13") dual socket motherboard with all the extraneous crap stripped off, like VGA displays, USB ports, 2.5mm headphone jacks, etc.

There are "standard" widths and lengths of 1U size power supplies (Whether AC input or DC input) that are used by dozens of different system integrators. Order off the shelf based on the wattage and capabilities you need. Same thing with the chassis. At a certain quantity level you basically have a folded steel box with mounting screw holes for ATX or larger-than-ATX motherboards. With M.2 SSDs on the motherboard these days, you need only a few things:

a) a long 1U steel box with high speed 40mm fans sucking air through it

b) motherboard with cpu/heatsink/ram/storage mounted on it.

c) power supplies.

wmf · on Oct 31, 2016

One thing that MS can't say yet is that a Skylake-EP server is "rumored" to have a larger socket and 24 DIMM slots instead of 16 so it needs to be physically wider.

walrus01 · on Oct 31, 2016

There's the old telecom standard for 23" racks... Other than facebook I wonder who is building new hot aisle/cold aisle separated environments for wider-than-19" standard.

wmf · on Oct 31, 2016

I mean the old MS servers are 9.5" wide and the new Skylake servers need to be wider than that so they're going to 19". Facebook/Google/Rackspace use a 21" rack but MS and LinkedIn are using 19".

jabl · on Oct 31, 2016

Today you can get these tray servers with two trays in 1U (or 4 tray servers in a 2U chassis, typically), where each server is a standard 2 socket Xeon(-EP?) thing. So surely if you suddenly have twice the width you can cram in a few extra DIMM slots without having to go wider than the usual 19" racks.

IBM · on Oct 31, 2016

Can anyone tell me why Intel, Cisco, Juniper, etc are part of Open Compute Project? It doesn't really make sense to me because this project is effectively designed to commoditize their business. It makes sense for Facebook, Google, Microsoft, Apple, etc to be in OCP because server and networking hardware is purely a cost and isn't the part of the value chain that they profit from.

wmf · on Oct 31, 2016

All "commodity" servers use Intel processors so Intel isn't really losing anything. Juniper is just sticking their toe in and trying to look cool. Cisco is probably trying to sabotage it from the inside.

IBM · on Oct 31, 2016

Seems like Intel's margins are a good incentive for OCP members to design their own processors. Google has made some noise about that (but I guess that could be a bluff as well) [1].

[1] https://www.wired.com/2016/05/googles-making-chips-now-time-...

gogopuppygogo · on Oct 31, 2016

Hardware is hard and expensive. I doubt there is enough industry support financially to commit to designing an open source CPU to be compatible with software designed for Intel CPU's. It's just the cost of doing business.

wmf · on Nov 1, 2016

Compatibility isn't what it used to be; all Google software runs on x86, Power, and probably ARM. There are probably a dozen ARM server processors being designed right now, so either there's a huge bubble or the market is big enough.

rixed · on Oct 31, 2016

Not that I disagree with what others have said, but I've found that the bigger the company and the more misleading is the mental model of a company as a single, unique, self conscious entity that would take decision according to its own interest.

Behind companies decisions there are real persons taking decisions or supporting them for their own personal agendas that are not visible from the outside.

kchoudhu · on Oct 31, 2016

The networking giants at least aren't planning on selling hardware for much longer: they want to become purveyors of SDN technology that runs on commodity OCP-style hardware. I am guessing they are part of these initiatives so that they have some influence over what their code runs on.

devonkim · on Oct 31, 2016

Who's going to sell hardware if not for the networking giants though? While hardware may be a declining business in terms of revenue it doesn't mean that there's no business either. After all, the big cloud providers will all have to source hardware from someone even if everyone goes whole hog into the OCP ecosystem with cheap, large scale vendors based primarily in low cost of business regions. Who's better equipped to handle economies of scale better than large companies in our world? Did I miss something about manufacturing trends being disrupted in the past 15 years?

If anything, these vendors would want some say on the SDN systems so that their hardware can better integrate with them and offer more compelling value than their competitors.

mcpherrinm · on Oct 31, 2016

Network switches are increasingly being based on silicon from companies like broadcom with their Trident and Tomahawk line (what Facebook's six pack use), Intel's Fulcrum chips, Mellanox, or Cavium's Xpliant chips.

An ODM vendor like Quanta makes the equivalent of a "motherboard" for those, and sticks it in a box.

Then you can get software from a vendor like Cumulus which has all the SDN goodness.

Vendors like Cisco, Juniper, or Arista sold all of those as one big product. Whitebox switching is about breaking those up so you can buy them more like you buy your servers.

wmf · on Oct 31, 2016

The idea is to buy the hardware from ODMs like Quanta, Accton, Celestica, Foxconn, etc. They already manufacture all name-brand IT equipment anyway.

discodave · on Oct 31, 2016

> Who's going to sell hardware if not for the networking giants though?

AWS, GCP and Azure will sell you hardware, with a bunch of other services on top.

IBM · on Oct 31, 2016

That makes sense. Is this a case of trading hardware dollars for SDN cents or does it bridge the gap? I have no idea about the economics of this space.

richardwhiuk · on Oct 31, 2016

Joining a project is the best way of mucking things up. Apart from that negative viewpoint:

Intel are providing the expensive component (the CPU) so it's in their interest for the complement (the chipset/motherboard/chassis) to be a comodity.

Juniper/Cisco are supplying the network infrastructure, so they want the attached servers to be as cheap as possible.

wilhil · on Oct 31, 2016

Now, how can I actually use these?

I say it every time I see "company X releases Y" or open compute project...

They are lovely and I want to use it, but, any enquiry to the manufacturers show that you need a minimum order that is too big for small scale companies and I don't see many of the larger companies actually using each other's designs.

So, it is all well and good being open, but, I seriously wonder who benefits from it?

discodave · on Oct 31, 2016

The main beneficiary is Facebook and other hyper-scale operators for whom datacenter operating costs is not a competitive advantage.

Google and Microsoft are arguably in it to look "open" and/or they are trying to reduce their hardware R&D costs (like how they use Linux).

AWS are staying out because they consider their hardware proprietary and a competitive advantage. I don't think Google has made their latest and greatest stuff open (particularly network infra).

If you don't have the scale to order your own hardware then you should probably be considering one of the cloud providers (AWS, GCP, Azure).

gorodetsky · on Oct 31, 2016

I wonder what are hardware specs on Google Cloud. I didn't manage to find anything similar for GCE.

E.g. what's the max speed of internode network connectivity within a region zone? What's the spec of local-ssd? Could be interesting to know underlying hardware.

mlinksva · on Oct 31, 2016

Are there differences between the datacenter and consumer device markets that make something like OCP gaining traction much harder for the latter, besides the lack of huge buyers (in terms of both quantity and making a difference in buyer's operations) in the latter? Are there any potential huge buyers in the latter that could get something like OCP for some type of consumer devices started?

bluedino · on Oct 31, 2016

Smaller buyers want features that get stripped out like HP or Dell's management tools. They always want easily replaceable hard drives in caddies you can reach from the front of the rack. Some designs use a 21-inch instead of 19-inch width rack. 277 volt power connections instead of 120v.

tkinom · on Oct 31, 2016

If someone can put 32 / 64 ARM (Raspberry Pi) in the same 1U Chassis with each has its own SSD +1,4,8 GB RAM.

Would you want to use that? Why and Why not?

wmf · on Oct 31, 2016

Looking at the failures of Calxeda, SeaMicro, Moonshot, Group Hug... no, you wouldn't want to use that.

_ytji · on Nov 1, 2016

Sockets look LGA 3647 (xeon phi) shaped, no?

tekism · on Oct 31, 2016

As someone who just saw (from another HN post) the documentary "Before The Floods", I'm curious if they take into account energy saving and lower carbon footprints into the design as well.

eightysixfour · on Oct 31, 2016

That is one of the major design goals of every data center, electricity on that scale is very expensive.

molecule · on Oct 31, 2016

And all of that electricity is turned into heat that needs to be dissipated, so cooling is also very expensive.

KirinDave · on Nov 1, 2016

This is one place where you don't need to be too concerned about tech giants. They're massively invested in reducing the cost of their datacenters by making them:

1. Less wasteful of electricity. This means less cooling and a relentless drive to push manufacturers to produce more performance for a given watt-hour, which also generally means less waste heat.

2. Less reliant on active cooling: this means that their designs can use ambient air until the air outside reaches a certain temperature. It also means a lot less wasted hardware in the event of an active failure.

3. Less enclosed space: they want to reduce the physical footprint whenever possible because every square foot of enclosed space is maintenance even if you don't have to actively climate control it.

Google, Microsoft and Apple all share these goals, and as such are quite active in the space. Lots of data on youtube for the casual inspection.