Hacker News new | past | comments | ask | show | jobs | submit login

I'm also confused about some of the figures' captions, which don't seem to match the results:

- "Only Sonnet-3.5 can count the squares in a majority of the images", but Sonnet-3, Gemini-1.5 and Sonnet-3.5 all have accuracy of >50%

- "Sonnet-3.5 tends to conservatively answer "No" regardless of the actual distance between the two circles.", but it somehow gets 91% accuracy? That doesn't sound like it tends to answer "No" regardless of distance.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
