Hacker News new | past | comments | ask | show | jobs | submit login

Author here— Yes, this is a great question! I think there's a lot more interesting work to be done here. It would be great if we could understand e.g. why layer-wise learning rates help with large-batch training of ImageNet (https://arxiv.org/abs/1708.03888), and maybe the per-component noise scale has something to do with it.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: