Hacker News new | past | comments | ask | show | jobs | submit login

Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.



True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".


yes but transformers are much slower than state space models




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: