The proposed replacement definitely makes more sense (and I've always found the ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

blueblimp on July 24, 2023 | parent | context | favorite | on: Attention Is Off By One

The proposed replacement definitely makes more sense (and I've always found the absence of a "failed query" to be puzzling in standard attention), but, in deep learning, things that make more sense don't always actually get better results. So I'm curious whether this has been tried and carefully evaluated.

jadbox on July 24, 2023 [–]

It would be an amusing find if "Black Swan mega-activations" actually but yet unintentionally made the model smarter...

Consider applying for YC's W25 batch! Applications are open till Nov 12.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact