Not sure on the question. I'm primarily a software guy so I'd prefer to not inte...

john_minsk · 2024-06-28T07:32:19 1719559939

If I understand correctly: you want to create your own design of a water cooling for your GPU (let's say RTX4090) - test it to make sure it will work in the real world - and once test is successful send your CAD to factory and they can produce and assemble all components?

Are you doing it for the whole cooling unit from scratch or just 1 part? (like a fan or some holder)?

What is the most common reason you would order v2 from the manufacturer after you receive v1 and test it in the real world?

cjbgkagh · 2024-06-28T08:37:37 1719563857

The way I see it is that I make the CAD, send the CAD to the part supplier (Fabric8) who automatically check to make sure it’s printable, they then make the parts and post it to me. I would do the assembly myself if needed but would probably try to design things to be minimal / no assembly. The hope would be that this process would be cost effective at scale so if I need more I could just press a button.

I run a on-prem mini cluster which is water cooled but the customers need edge compute which should stay air cooled. I would probably try to make a blower style vapor chamber for nvidia gpus so I can ram air through without dealing with nvidia driver fuckery. NVidia segments the market based on heat sinks, binning and drivers so the enterprises segment has to pay far more. The 4090 blowers made by OEMs are gimped, rare, and expensive and I think intentionally so.

Not only is consumer grade cheaper but it’s way less hassle to get - the Enterprise sales pipeline is a total pain as the costs keep changing and they keep trying to push older overpriced stuff onto me like I wouldn’t notice. And thats even if they think you’re big enough to talk to. Much easier to pull consumer stuff from the market on an as needed basis.

My software supports graceful degradation so I don’t need ‘enterprise’ reliability. I don’t need high speed interconnects either. I need TFLOPS on dense matmuls, run in parallel batches. Consumer GPUs are fine for this, replace the heatsinks with a blower optimized and put in some powerful fans and take off. I could pack more into a single computer and to have a lower amortized cost and a higher density.

If the vapor chamber blower heatsink is too expensive then I might as well just buy more GPU computers and let them run slower.

john_minsk · 2024-06-28T08:01:19 1719561679

On a second read, I would like to rewrite your workflow as follows:

I would prefer to be able to:

[x] buy a GPU, -> [v] Download SimReady USD of your GPU from nVidia website.

[v] whip up some CAD

[v] import USD of nVidia card into CAD

[v] measure the geometry using CAD

[v] Design cooling element

-> Export CAD to USD

-> Import CAD as SimReady Asset into nVidia Omniverse with tests you need.

-> Once simulation is OK send final CAD to production (or better from GPU simulation into simulation of production on one of the factories or 3d printers)

In this scenario:

- SimReady USD of a GPU must already exist inside nVidia. They could make it available for simulation inside Omniverse. (or build open high level model yourself)

- Thermal simulation app is something that nVidia needs for their business now and in the future. They could share current software with public and let people like you download it, change it and run simulations with your changes (or build open high level model yourself)

I'm wondering how difficult it would be to simulate important effects on your models. Do you think the above process has potential to improve your process or did I miss some important detail of your work?

cjbgkagh · 2024-06-28T08:54:07 1719564847

Oh, with regards to the actual design, I DIY my own computational engineering software, based on implicit modeling which works well for theses sorts of complex geometries. I don’t know about the simulation, I guess I theory i could export the model into a sim. I would base the broad strokes of the design known working reference implementations. I think there is a ton of potential in custom vapor chamber stuff so maybe I’ll play with that but I think it’ll easily become overkill. I think perhaps have the vapor chamber printed and thermal glue skived heatsinks on top. Unlike normal GPU designs I can tolerate much more noise for air pressure.

For me, good enough is good enough I’m not going to be super optimized as the cost tradeoffs for such optimizations don’t work out as favorable at my scale.