The proposed replacement definitely makes more sense (and I've always found the absence of a "failed query" to be puzzling in standard attention), but, in deep learning, things that make more sense don't always actually get better results. So I'm curious whether this has been tried and carefully evaluated.