Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
yshui
33 days ago
|
parent
|
context
|
favorite
| on:
RWKV Language Model
Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.
intalentive
32 days ago
|
next
[–]
True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".
whimsicalism
33 days ago
|
prev
[–]
yes but transformers are much slower than state space models
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: