This can currently already be done using a streaming capable LLM with a streamin...

lostmsu · 2025-01-02T09:48:45 1735811325

Any LLM is "streaming capable".

Xmd5a · 2025-01-02T13:45:09 1735825509

https://github.com/mit-han-lab/streaming-llm

On a side node, and that's what led me to the link above, I wonder if it would be possible to chain N streaming LLMs in an agent workflow and get a final output stream almost instantaneously without waiting for N-1 LLM to complete their reply.