Any autoregressive model can do what you are describing. transformers are genera... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

yshui 33 days ago | parent | context | favorite | on: RWKV Language Model

Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.

intalentive 32 days ago | [–]

True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".

whimsicalism 33 days ago | [–]

yes but transformers are much slower than state space models

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact