Hacker News new | past | comments | ask | show | jobs | submit | sunleash's comments login

I don't see any results, it'd be more impactful and convincing if there were numbers supplementing the theory. It's not that hard to finetune existing LM on a small data and verify that it works.

I am, however, of the similar opinion that there could be better attention formulations. A paper from 2020 https://arxiv.org/abs/2005.09561 helped a lot in one of the transformers model I trained (not a vanilla LM but a specialised multi-modal graph problem).

It proposes normalised attention which if I'm not wrong should help with the quantisation problem too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: