Thanks! The last time I cared about this I was doing SGD on GPU clusters and memory bandwidth was the limiting factor. Almost the entirety of the work was focused on chunking the problem so that a maximal number of subproblems was on the GPU.
If storage really is becoming integrated at similar speed (EG not on an order of magnitude difference like L2 vs RAM) then I think I'm wrong
If storage really is becoming integrated at similar speed (EG not on an order of magnitude difference like L2 vs RAM) then I think I'm wrong