This is actually the hypothesis for cartesia (state space team), and hence their deep focus on voice model specifically. Taking full advantage of recurrent models constant time compute, for low latencies.
RWKV team's focus is still however is first in the multi-lingual text space, then multi-modal space in the future.
RWKV team's focus is still however is first in the multi-lingual text space, then multi-modal space in the future.