Hacker News new | past | comments | ask | show | jobs | submit login

Table 1 in the Appendix. GSM-No-op is the one benchmark that sees significant drops for those 4 models as well (with preview dropping the least at -17%). No-op adds "seemingly relevant but ultimately inconsequential statements". So "change names, performance drops" is decidedly false for today's state of the art.



Thanks. I wrongly focused on the headline result of the paper rather than the specific claim in the comment chain about "changing name, different results".


Ah, that’s a good point thanks for the correction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: