sunleash's comments

sunleash · on July 24, 2023

I don't see any results, it'd be more impactful and convincing if there were numbers supplementing the theory. It's not that hard to finetune existing LM on a small data and verify that it works.

I am, however, of the similar opinion that there could be better attention formulations. A paper from 2020 https://arxiv.org/abs/2005.09561 helped a lot in one of the transformers model I trained (not a vanilla LM but a specialised multi-modal graph problem).

It proposes normalised attention which if I'm not wrong should help with the quantisation problem too.