Hacker News new | past | comments | ask | show | jobs | submit login

For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?




Dropping the batch size




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: