An LLM being able to build up interfaces that look recognizably like an UI from a real OS? That sure suggests a degree of multimodal understanding.
An LLM being able to build up interfaces that look recognizably like an UI from a real OS? That sure suggests a degree of multimodal understanding.