ML serving is about optimization and portability

brucethemoose2 · on Feb 16, 2023

A problem thats getting bigger is accelerator diversity.

The choices for regular devs used to be

- CUDA

- (unoptimized) gpu shaders

- Some weird proprietary block that only does weird proprietary things, like mobile NPUs or the old Intel blocks.

But now we have proper matrix instructions in AMD/Intel GPUs, proper AMD/Intel NPUs in their laptops, Apple GPUs and NPUs, a growing number of reasonably affordable and increasingly ergonomic non-GPU cloud accelerators...