Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They should do a 95% and 99% version of the graphs, otherwise it's hard to ascertain whether the failure cases will remain in the elusive "stuff humans can do easily but LLM's trip up despite scaling"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: