When we first released our Chat+RAG feature, users had to wait up to 20 seconds for the response to show. (with only a loading animation).
And then we fake-streamed the response (so you're still, technically, waiting 20 seconds for first token, but now you're also waiting maybe 10 additional seconds for the stream of text to be "typed")...
And, to my enormous surprise, it felt faster to users.
(Of course after several iterations, it's actually much faster now, but the effect still applies: streaming feels faster than getting results right away)
And then we fake-streamed the response (so you're still, technically, waiting 20 seconds for first token, but now you're also waiting maybe 10 additional seconds for the stream of text to be "typed")...
And, to my enormous surprise, it felt faster to users.
(Of course after several iterations, it's actually much faster now, but the effect still applies: streaming feels faster than getting results right away)