Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How could we give LLMs the ability to "pay attention" to different parts of images, as needed, so they can make back-and-forth comparisons between parts of different images to solve these kinds of visual reasoning tasks?

I’ve got good news



It's even all we need


What is it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: