Hacker News new | past | comments | ask | show | jobs | submit login

I think you're confused about the benefits/drawbacks of increasing batch size. The training progress per optimizer step increases with batch size (otherwise the smallest batch would be always optimal). Increasing the batch size improves the quality of a single optimizer step but does not scale linearly. (i.e., larger batch training requires more samples to converge than small batch training even though fewer steps are required). Depending on the problem scale the computational efficiency of larger batches makes this tradeoff worth it because even though you need to process more samples to converge, it will take less wall clock time due to improved efficiency.



Yes, we're saying the same thing. It takes longer for the optimizer to converge when using larger batch sizes, in terms of number of samples pushed through the model. It takes less time in terms of wall clock time due to increased efficiency.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: