I can appreciate the thermal engineering that has gone into this. I have executed some extremely challenging thermal designs over time. They generally used heat transfer plates of the type shown in this video.
My largest design as approximately 23 x 12 inches. Maintaining thermal efficiency and uniformity across the entire surface is where the real challenge lies. We had an extremely tight uniformity specification (0.1 deg C). This could only be achieved through a complex design process that entailed writing genetic algorithms to evolve and test solutions using FEA. In the end it was a combination of sophisticated impingement cooling and other techniques that did the job. That project was seriously challenging. I like projects that absolutely kick my butt. This one definitely did.
Sadly, I can't share details or the application domain.
I can tell you that we worked very hard to try and see if we could accomplish the objectives using forced air (fans). That effort involved laser-welded fins with sophisticated airflow management, techniques to break-up the boundary layer (which impedes optimal heat transfer) and powerful centrifugal fans. It worked well, yet it was large and sounded like a jet engine.
Ultimately, while still complex, fluid-based thermal management offered a far more compact solution that could exist in a room with people not having to wear hearing protection. In addition to that, with a fluid-based system you can move the hot side to a different room.
Gasses have a lot more opportunities to engage in turbulent flow and little corners of whirlpools, where heat transfer efficiency drop, than comparatively more viscous liquids.
Yes, I'm aware. My mechanical engineering undergrad program had 2 semesters of thermal fluids engineering. I'm just saying "We tried air cooling, but switched to fluid cooling" is an odd statement, since air would have been a fluid in their use case.
That video and the cooling system is insane. Insanely cool even. Thanks for posting.
I wonder, how do all the contacts in the 20,000A power distribution plate that is bolted on top of the wafer-scale die line up? The engineering involved in just making that part work must be crazy.
They don't actually have to line up precisely. It seems to be an elastomeric or "Zebra strip" connector. Basically like those little rubber strips that you see when you take the LCD out of a calculator, used to bring power and data from the PCB to the glass surface: https://en.wikipedia.org/wiki/Elastomeric_connector
Cerebras has evidently scaled them up a bit (and repatented them, according to the video).
It's not. The CS-2 they claim can do 23kW peak. The voltage is very low. If 23kW and 20kA is right, it's a 1.15V core voltage, which is pretty normal these days.
For comparison, one of the workstation AMD EPYC processors uses ~400W under peak load, and would use approximately ~320A peak current. It's only ~30x more current... modern CPUs use an enormous amount of current these days.
1.15V isn't that high. Most CPUs may idle at ~0.9V but when running full power, they throttle up much higher. Overclocking the modern Ryzen 7000 series seems to top out at ~1.2V. If this giant wafer isn't on the absolute latest process, that voltage is going to go up. So, 1.15V seems very reasonable to me.
We have EPYC chips with ~100 cores at ~200w TDP. So each core is around 2W in the AMD chips. Core voltages are ~1v with modern CPUs. So that's 2A per core.
850k cores at 20kA is very much lower than the AMD chips. Must be massively parallel, lower-performing cores. But it's quite feasible that it needs 20kA.
The CS-2 system on their website specs out at 23kW peak. So all this lines up with each other.
As far as benchmarks and utility of such systems, I am not sure if they've proven it out.
What is interesting to me is that it was easier to sling 20kA round the chip than to make a higher voltage power distribution bus and step down nearer the cores.
I guess this is a process limitation? i.e, the stuff you would need to make an appropriate step down isn't compatible with the other stuff they need?
1. Process limitations (no high voltage devices, poor analog characteristics, limited resistor choice, etc)
2. The skill set for power device/analog IC design is very different than digital design (and harder to recruit for as the talent base is relatively small).
3. On chip power converters universally suffer from poor inductor quality which trashes your efficiency (thus increasing cooling demands as well).
From a business perspective it would be quite risky and likely not cost effective.
TSMC 7nm has all the required stuff to build a DC-DC converter from 3V or 1.8V. Nobody would use on-chip inductors for high power DC-DC.
As for skillset, there are a bunch of IP companies with silicon proven designs available. I'm sure they didn't design their SERDES or PLL(s) in-house either.
Efficient DC-DC converters are area consuming due to high voltage devices abd the capacitors in general being very large. You would need SMD inductors on top of that. This means you used very precious silicon area for power delivery. It creates a lot of reliability headache since higher voltages cannot be routed closer to low-voltage stuff. If you're not crazy high in current consumption density (A/mm2) it doesn't make sense.
They also have another design advantage here. I believe they're not IO limited with their bumps.So they can use most of it for power, which is much better than any DC-DC solution.
If we presume the wafer consumes a large percentage of that (say 20kW out of the 24kW max) and that they are feeding the "wafer" with DC at 1v, then they /do/ need to feed in 20,000 amps to deliver 20kW of power at 1v.
So yes, 20kamp is a lot of current, but it is within the "power budget" the device seems to express in its marketing material.
Core voltage is going to be in the ballpark of 1V, and given the stated power consumption is 15kw that means a minimum of 15kA. So, it is indeed a lot but the math checks out.
how much of that power actually reaches the chip though? (it's hilarious that this is one chip)? this thing is mostly a water pump - I just can't - everything about it is just wild
Most of it. The water pump and other stuff consumes probably less than 500W total. There's some efficiency loss in the actual power converters, but they're likely designed to be >95% efficient (probably >98%), otherwise cooling them would be a nightmare.
Jiminy Crickets - that Cerebras is dope - dont miss the only other tech vid from Rebbecca: https://vimeo.com/lewie221 <-- About CFD.
Weirdly - CFD has been zeitgeisting me here on HN the last couple of days - I have been talking about FluidX3D - and have been attempting to compile it this AM locally on windows with failures (about to see if I can plop it into a docker)
Never thought youd be doing CFD calc to keep a stable temp flow over 1.6 trillion transistors to keep them evenly cooled - did ya?
--
In watching that above CFD vid - I was led to start thinking if CFD could be applied to the ways AI models/GPTs communicate or calculate.
I wonder if one could use an CFD anaolgies to the data flows through AI models/systems such as OpenAI.
It would be interesting to look at the OpenAI GPT Store's entanglements through the lens of CFD and determine where relations might be made for how stacking GPTs might communicate through their Tapestry.
An CFD-heads care to dive in?
I wonder if one were to treat 'token flow' in a CFD manner in which one can visualize how tokens are assigned attention scores.
GPT claims to not be able to visualize the attention score matrix for the tokens - but assuming it could - it would seem as though it should be easy to visualize attention matrices in a CFD visual.
(also - in super expensive machines , why aren't sheets of AeroGels used as gaskets if you want thermal separation.
Imagine taking an AeroGel powder and mixing it with Silicone - and having a super thin, flexible material, such as D3 - which has a melting point of 134c/273f....
So - mixing aerogel with D3 as a gasket would be good, and with D3 being a non-newtonian it works well for shocks. A space gasket as it were.
Getting closer to Jim Gray's "smoking hairy golf ball" (Jim also forecasted that chips would eventually become spherical to limit the lengths of the data paths.)
It will be interesting to see if commercial data centers re-fit with cooling water transport under the floor or above the machines (riskier). This will be a challenge for DCs without a lot of space under there, presumably they could boost the floor height after a door transition. Still how many of them have 20 - 40kW of power allocated per rack.
Its one of the few times I miss being at Google because they approached this sort of problem very creatively and with an effectively unlimited budget to try different things. I'm sure their data centers are very much different from my time there!
I had the chance to visit the new datacenter of my college and on the server exhaust side there is a radiator as tall as the rack with cold water in it. All the pipes are under the floor.
IIRC they mainly put power hungry compute nodes for the clusters in this new datacenter and I remember that servers full of GPUs had crazy power draw. The water then goes through an heat exchanger to help generate hot water to heat the campus and for the taps.
I am - literally - heating my condo with my ML training workstation right now. Works great! I have electric wall heaters, so I might as well get something out of burning 1000W of electricity.
Heating with electricity is a profound waste though when natural gas is a fraction of the price. And if you do have to heat with electricity, a heat pump is going to be far better than mere resistance.
Even purely from a perspective of self-interest, you're unlikely to make any money mining bitcoin even with free electricity, let alone if you're supposed to do it with just the power a heat pump uses.
Your bitcoin mining rig will need three times as much power as a heat pump to produce the same amount of heat. I doubt it would be economical for most people.
That isn’t anywhere close to enough based on what I pay for power. I think you probably are always better off somewhere with plentiful cheap power (that likely isn’t where you want to live)
The more I think about this the more I think there could be a real business model here. Make a device that looks and acts like a space heater but is in fact a miner loaded with your private keys. Market it as a "smart heater" that requires a wifi connection to operate. Sell it as a loss leader, or maybe even give it away for free. It doesn't have to be state-of-the-art hardware, so it can be cheap to make. Hmm...
Mining hardware is really expensive still, and they are constantly replaced by better models.
I have a massive surplus of solar in summer, and on my local auction sites some old miners became available. I considered picking them up and putting them to use using up my surplus "free" electricity.
I gave up when the mining ROI calculation came back with multiple years, even when you factor in a $0 electicity cost.
This might've worked 3-5 years ago (maybe more), but I don't see that happening now to be honest.
I can’t help but think that someday we will see chips designed like Sierpinski gaskets, with “holes” used for thermal transport and the solid parts used for computation. Such chips would behave more or less like a toroidal structure.
Now that chiplets are maturing that is a little less far-fetched.
At least in Slovenia, many new-ish houses also have a 3-phase outlet or two. I don't know what the maximum wattage of those is, but I do know it's a lot --- and on a single outlet.
I'm in Canada and have a 30A 240V breaker for the Debian ROCm Team's CI [1]. It's only rated for 80% continuously, so that's roughly 5.7 kW. In theory, the four systems currently hooked up to it could draw almost that much, but their workload only uses ~1 kW in practice.
The "largest" single outlet I have is 3F32A which caps out at 22kW. Entire house is wired for 3 phase power capped at 125A per phase - just over 86kW total
If you're in USA or Canada, your house is probably over-wired and can handle some extra load. Plugging into the electric dryer outlet is good for 5760 continuous watts. The stove outlet is 9600 continuous watts. An electrician could easily add an additional one.
Tankless electric hot water heaters can be rated for 24-36 ㎾ (max current draw 150 A)! These are wired with 3-4 8-gauge wires in parallel and connected to 40A breakers.
The only limit to household wiring would be the capacity of the distribution coming into your home.
I plumber I used to have do work for me told me a story about the first time he installed one: when he first turned it on, it blew a fuse at the substation a couple miles away.
A few years later I got to know the person whose house it was installed in. And when the homeowner was talking about it he complained that the plumber didn’t install a big enough one and he had to have it redone.
That certainly shouldn't be able to happen. The top of panel breaker should pop with a large enough load, and the mains on the street can handle quite a number of homes with supply. There's no way a single in-home unit should/could pop anything at a substation. The component at the substation likely just failed at that time.
I am wondering if there were no household loads on that circuit that had that level of in-rush current, the component was on its way out, and the new load pushed it over the edge.
We had one (Stiebel Eltron 36KW) installed earlier this year which required a service upgrade to 320A service. AMA, though responses may be delayed a day or two.
Our all-electric home's max instantaneous draw over the last a couple months peaked around 44 kW.
Just saw this, I would recommend going the heat pump route.
In retrospect, it's insane to get an electric tankless hot water heater from a climate perspective. Gas tankless makes infinitely more sense if you need limitless hot water. If you don't need limitless, heat pump would be the most emissions efficient.
I'll try to get my usage data together and update this comment.
Edit: Cost of usage electric and gas over the last five years. The big spike last winter was due to our remodel and heating the whole house while it was neither sealed nor insulated in the middle of winter with just the 10kWh furnace backup heater which ran almost non-stop. Units are USD. The two gaps in data are due to meter replacements, I think. https://imgur.com/AVTFwpP
It's kind of hard to compare the data since we re-insulated, went from gas furance to heat pump, and added a hot tub.
Yes, they are, but most single family homes here have either a 150 or 200 amp main. Most families rarely exceed 120 amps, so adding 20-35 amps of intermittent load isn't a problem.
On the other hand, tankless electrics often cause a service upgrade unless planned for new construction. Great for us electricians, less great for the homeowner.
I think you are confusing a breaker group with your total residential connection.
Assuming you are in Europe, and you have 2.5mm² cabling (which is the standard for residential applications) then you are indeed limited to 16A per group.
However, there is nothing preventing you from using multiple groups for one appliance. This is actually typical for high-power appliances, such as induction cooking.
Ultimately it is your main fuse that limits your total power consumption, which for most European countries this is typically rated at 25A (5750W), but on request you can usually have this raised to 35A, 50A or even 80A, if supply is sufficient.
The limit is space, really. Well, that and money. If you're willing to pay, you can probably convince your utility to hook you up to three phase power, which is typically reserved for industrial use.
If you want to stay with your normal residential circuit, in the US they're commonly 100 or 150A. 200A isn't uncommon, but you might have to pay for an upgrade.
That leaves you with 200A*240V=480kW minus whatever you need for normal house things.
So probably more compute than you have physical space for.
You’re off by an order of magnitude: 200 * 240 = 48000
> If you're willing to pay, you can probably convince your utility to hook you up to three phase power, which is typically reserved for industrial use.
Three-phase power isn’t only for industrial use, (in the United States) small commercial buildings will have a 208v three-phase service drop and larger commercial buildings will have a 480v service drop, or 13.8kV medium voltage drop if it’s big enough. Large enough industrial customers will have dedicated substations.
Also in the US at least residential power is way more than commercial power, so at some point (much earlier than you’d think) you stop saving money on rack space rent.
A typical (new) residential service in the US is 200A @ 240v single-phase, or 48kW. Assuming the circuit is protected by a breaker, you can use up to 38.4kW of that 48kW. If you used fuses instead of breakers, you could use the full 48kW.
The 80% derating only applies to 'continuous loads' which are defined by NEC to be anything >3 hours continuously at maximum current.
Any circuit breaker will not trip at the rated current, though. They're designed not to. So, you can run all 48kW indefinitely without tripping a circuit breaker, assuming everything else is sized appropriately (i.e., wire, interconnects, etc).
A typical US residential breaker panel is rated for between 100 and 200 amps so probably around there across all circuits without major electrical work.
We just upgraded our lines from the street from 100A to 200A (at 240V) so that pair of wires can support around 48KW. I think my stove is 20A at 240V so the fattest wire in the house protected by a breaker can safely handle around 4.8KW. On the 120 side the biggest wire I've got is 20A, so 2.4kw. Many house receptacles are only 15A max, hence your 1.8kw.
Also understand that you can only use 80% of a circuit's capacity in a continuous fashion, so the usable power for compute is a fair bit lower than it seems.
https://vimeo.com/731037615 for a discussion of wafer scale integration design issues.
The best way to cool is to go for immersive cooling.
Even better is to eliminate most of the heat of the wires with free space optics.
The core voltage for 7nm is 0.75V. The max over-voltage would ve 0.9V or so but the lifetime of the device will be limited to less than a year if they don't cool the devices to 50-60C or so in that case. BCI aging is no joke.
So I assume they are using around 0.9V and 2.2kA meaning ~2kW.
Assuming the upcoming Zen 5c was capped at 192 Core because of Bandwidth and not Thermal. We could have had 256 Core + IOD ( 70W ), if every core were to use 3.6W that is nearly 1000W for the CPU Socket.
In a 2U 2Node system, this is a potential of 1024 vCPU in a single server.
In datacenters, you're mostly limited by the power (and thus cooling). Most commercial DCs only let you use up to about 10kW per rack. For standard 40U racks it's just 250W/RU, give or take.
There are niche expensive datacenters with higher power density, but as it stands, exotic multi-kW hardware at scale makes sense if you either save a ton on per-node licensing, or you need extreme bandwidth and/or low latency.
>Most commercial DCs only let you use up to about 10kW per rack.
I think that was the case in 2020;
>By 2020, that was up to 8–10 kW per rack. Note, though, that two-thirds of U.S. data centers surveyed said that they were already experiencing peak demands in the 16–20 kW per rack range. The latest numbers from 2022 show 10% of data centers reporting rack densities of 20–29 kW per rack, 7% at 30–39 kW per rack, 3% at 40–49 kW per rack and 5% at 50 kW or greater.
We dont have 2023 numbers and we are coming to 2024. But it is clear that demands for high power density is growing. ( And hopefully at a much faster pace )
I have been told by tech writers that Google discovered that at some point electricians will refuse to route more power into a building. So even if you created a separate thermal plant, you still have issues.
Arc furnace: peak of about 250 MW to melt the steel [1]
Datacenter: seems to cap out at around 850 MW [2]
Same ballpark I guess? Probably both are limited by inexpensive power availability + other connectivity factors (road/rail, fiber).
[1]: “Therefore, a 300-tonne, 300 MVA EAF will require approximately 132 MWh of energy to melt the steel, and a "power-on time" (the time that steel is being melted with an arc) of approximately 37 minutes.” via https://en.m.wikipedia.org/wiki/Electric_arc_furnace
There’s always some limiting factor, and there’s always some (possibly crazy expensive) way to resolve it and get a bit more power until you run into the next limiting factor.
They do, Google had/has one in the Dalles in Oregon next to the dam and there are several in line Canada and Finland.
Part of the problem is consistent humidity management, but the other is that air, even really cold air, isn't dense enough to move specific heat as effectively as the same volume of a denser working fluid.
For a practical example in the other direction, look at combined cycle gas turbines which use heated air for the primary turbine and the recapture as much denser steam for the secondary.
The "north" of Canada (assuming you mean a place that has some semblance of infrastructure) doesn't get -that- cold in the summer. Yellowknife and Whitehorse both have average summer temperatures around 20C.
In any case, I think you'd still need substantial cooling infrastructure to deal with the summer. Your capital costs are going to waaaay higher. While your thermal management costs will go down, your payroll will probably be higher.
It's not. It's not about efficiency. Compute-per-watt will certainly be better in other systems. This is about pushing a small system as fast as possible because it's easier to program for a small system. A few problems are 'embarrassingly parallel', but lots have substantial overhead as parallelism increases so running each core as fast as possible is a win for some problems.
https://www.youtube.com/watch?v=pzyZpauU3Ig
The engineering to handle that power density is insane. Technically its less power per mm^2, but the chip is the size of a dinner plate.
EDIT: The video was taken down, but looks like the web archive got it:
https://web.archive.org/web/20230812020202/https://www.youtu...
As well as Vimeo (thanks morcheeba): https://vimeo.com/853557623