Hacker News new | past | comments | ask | show | jobs | submit login

Yes, we're saying the same thing. It takes longer for the optimizer to converge when using larger batch sizes, in terms of number of samples pushed through the model. It takes less time in terms of wall clock time due to increased efficiency.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: