We're pretty excited near-term for getting to sub-second / sub-100ms interactive time on real GB workloads. That's pretty normal in GPU land. More so, where this is pretty clearly going, is using multiGPU boxes like DGX2s that already have 2 TB/s memory bandwidth. Unlike multinode cpu systems, I'd expect better scaling b/c no need to leave the node.
With GPUs, the software progression is single gpu -> multi-gpu -> multinode multigpu. By far, the hardest step is single gpu. They're showing that.
With GPUs, the software progression is single gpu -> multi-gpu -> multinode multigpu. By far, the hardest step is single gpu. They're showing that.