Hacker News new | past | comments | ask | show | jobs | submit login

They're different concepts with similar symptoms. Overfitting is when a model doesn't generalize well during training. Reward hacking happens after training, and it's when the model does something that's technically correct but probably not what a human would've done or wanted; like hardcoding fixes for test cases.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: