Hacker News new | past | comments | ask | show | jobs | submit login

This is actually the hypothesis for cartesia (state space team), and hence their deep focus on voice model specifically. Taking full advantage of recurrent models constant time compute, for low latencies.

RWKV team's focus is still however is first in the multi-lingual text space, then multi-modal space in the future.




Karan from Cartesia explains SSMs+voice really well: https://www.youtube.com/watch?v=U9DPRZ0lSIQ

its one of those retrospectively obvious/genius insights that i wish i understood when i first met him




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: