Idea for killer app for recurrent models: low latency, low memory LLM / TTS coup...

computerex · 2025-01-02T06:05:17 1735797917

New multimodal models take raw speech input and provide raw speech output, no tts in the middle.

benob · 2025-01-02T07:39:52 1735803592

A relatively detailed description of such systems: https://arxiv.org/abs/2410.00037

Closi · 2025-01-02T09:02:43 1735808563

Seems like the future - so much meaning and context is lost otherwise.

intalentive · 2025-01-02T09:51:54 1735811514

Very cool. Logical next step. Would be interested to know what the dataset looks like.

moffkalast · 2025-01-02T10:51:01 1735815061

Youtube. Youtube is the dataset.

pico_creator · 2025-01-02T07:22:15 1735802535

This is actually the hypothesis for cartesia (state space team), and hence their deep focus on voice model specifically. Taking full advantage of recurrent models constant time compute, for low latencies.

RWKV team's focus is still however is first in the multi-lingual text space, then multi-modal space in the future.

swyx · 2025-01-02T07:40:47 1735803647

Karan from Cartesia explains SSMs+voice really well: https://www.youtube.com/watch?v=U9DPRZ0lSIQ

its one of those retrospectively obvious/genius insights that i wish i understood when i first met him

cootsnuck · 2025-01-02T06:33:46 1735799626

This can currently already be done using a streaming capable LLM with a streaming input/output TTS model.

lostmsu · 2025-01-02T09:48:45 1735811325

Any LLM is "streaming capable".

Xmd5a · 2025-01-02T13:45:09 1735825509

https://github.com/mit-han-lab/streaming-llm

On a side node, and that's what led me to the link above, I wonder if it would be possible to chain N streaming LLMs in an agent workflow and get a final output stream almost instantaneously without waiting for N-1 LLM to complete their reply.

yshui · 2025-01-02T13:57:38 1735826258

Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.

intalentive · 2025-01-02T19:59:26 1735847966

True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".

whimsicalism · 2025-01-02T15:10:50 1735830650

yes but transformers are much slower than state space models