Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Being a RNN there is another trick: caching a long prompt, because RNNs only look back one step while transformers see the whole sequence. So you can load your long context only once and reuse it many times.


Yup. This is commonly done in the community for the chat models as well (due to the huge amount of reuse for each reply)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: