Hacker News new | past | comments | ask | show | jobs | submit login

Minor edits to well known problems do easily fool current models though. Here's one 4o and o1-mini fail on, but o1-preview passes. (It's the mother/surgeon riddle so kinda gore-y.)

https://chatgpt.com/share/6723477e-6e38-8000-8b7e-73a3abb652...

https://chatgpt.com/share/6723478c-1e08-8000-adda-3a378029b4...

https://chatgpt.com/share/67234772-0ebc-8000-a54a-b597be3a1f...




I think you didn't use the "share" function; I cannot open any of these links. Can you do it in a private browser session (so you're not logged in)?


Oops, fixed the links.

mini's answer is correct, but then it forgets that fathers are male in the next sentence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: