Yes, we're saying the same thing. It takes longer for the optimizer to converge ...

staticfloat on Dec 28, 2018 | parent | context | favorite | on: How AI Training Scales

Yes, we're saying the same thing. It takes longer for the optimizer to converge when using larger batch sizes, in terms of number of samples pushed through the model. It takes less time in terms of wall clock time due to increased efficiency.