That's quadratic scaling, no?
>since the time and memory complexity of self-attention are quadratic in sequence length
That's quadratic scaling, no?