Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Open benchmarks are vulnerable to saturation. I think benchmarks should have an embargo periodic, until which only 3% of the question-answer pairs is released, with an explicit warning not to use it 3 months after being released.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: