Hacker News new | past | comments | ask | show | jobs | submit login

I think the concept makes sense. The basic insight, that the right batch size depends on the difficulty and noisiness of a task, is already used by teams. For example, the PaLM paper from last week increased its batch size throughout training.

But as far as I know, the more precise predictions of optimal batch size aren't used much, probably because it's expensive to measure accurately, or because the predictive equation isn't accurate enough to begin with. I wonder if we can "transfer" the optimal batch size from a smaller setting (smaller model or data) to the full setting, like in our paper. This would make it much more practical.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: