This is a really good discussion of density in different forms. I’ve always thought mobile UIs could have a density renaissance, would love to see folks questioning some assumptions of these devices - especially when the trend with LLMs is “wait a long time for a potentially incredibly wrong output” it feels like we’re going the wrong way.
When we first released our Chat+RAG feature, users had to wait up to 20 seconds for the response to show. (with only a loading animation).
And then we fake-streamed the response (so you're still, technically, waiting 20 seconds for first token, but now you're also waiting maybe 10 additional seconds for the stream of text to be "typed")...
And, to my enormous surprise, it felt faster to users.
(Of course after several iterations, it's actually much faster now, but the effect still applies: streaming feels faster than getting results right away)